This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 12544 - <video> MEDIA CONTROLLER requires track kind for in-band tracks
Summary: <video> MEDIA CONTROLLER requires track kind for in-band tracks
Status: RESOLVED FIXED
Alias: None
Product: HTML WG
Classification: Unclassified
Component: LC1 HTML5 spec (show other bugs)
Version: unspecified
Hardware: All Windows XP
: P3 enhancement
Target Milestone: ---
Assignee: Ian 'Hixie' Hickson
QA Contact: HTML WG Bugzilla archive list
URL:
Whiteboard:
Keywords: a11y, a11ytf, media
Depends on: 9452
Blocks: 13357
  Show dependency treegraph
 
Reported: 2011-04-23 03:13 UTC by John Foliot
Modified: 2012-02-10 00:40 UTC (History)
13 users (show)

See Also:


Attachments

Description John Foliot 2011-04-23 03:13:48 UTC
At this point, it is also not possible to discover the functionality that a in-band track provides through script. A similar problem was solved for the TextTrack object by introduction of a kind attribute.
This is also necessary for the TrackList object, in particular to introduce a standard naming scheme across different media container formats for exposing the kind of data that their tracks provide.

Therefore, we request addition of a getKind(in unsigned long index) function to the TrackList object, or something of equivalent functionality.

The proposed list of values that kind should understand are:
for video:
* sign language video (in different sign languages as provided through
getLanguage())
* captions (as in: burnt-in video that may just be overlays)
* different camera angles
* video mosaic
for audio:
* audio descriptions
* language dub
* commentary (such as director's commentary)
* clear audio (see
http://www.w3.org/WAI/PF/HTML/wiki/Media_Accessibility_Requirements#Clear_audio)
Comment 1 Philip Jägenstedt 2011-04-23 09:25:17 UTC
I have no principal objection to this, but the main issue is how this actually maps to the container formats. I'm pretty sure there's nothing in WebM to indicate a kind. In the absense of such information, should a UA return the empty string for kind?
Comment 2 Mark Watson 2011-04-25 17:16:14 UTC
It is to be expected that adaptive streaming media formats will more often expose multiple in-band audio and video tracks, since with these formats it is possible to download only the media being presented.

3GPP have explicitly asked W3C [1] to define labels of this kind for in-band tracks and intend to align with W3C on the specific symbols. Specifically, they state:

"It is possible, in our system, to express that the presentation can be built from multiple streams of media, and we have identified the need to be able to explain to the client software what the purpose of the separate streams is, in particular for the case where additional streams offer accessibility provisions; a textual caption stream is one of the more obvious cases.
We are using a general labelling technique for this aspect, and other aspects, of our specification: we identify a specification, registration authority, or other source by using a URI (normally expected to be a URN), and then provide values from the set defined by that source.
Though this enables multiple organizations to define sets of names, we are hoping very much that in the case of labelling of streams that we could recommend use of a set of labels defined by the W3C, as this will help the industry converge, and make it much easier to embed one of these streams in an HTML5 context."

The MPEG DASH group is following the same approach as 3GPP, but their documentation of this discussion is not yet public.

Certainly, this information will not be present in all container formats, so there is a need for getKind() to be able to return "unknown", in which case the situation is no worse than if getKind() was not present at all.

[1] http://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_64/Docs/S4-110502.zip
Comment 3 David Singer 2011-04-25 18:56:42 UTC
In response to Philip, I think that adding the API making it possible to (a) iterate over the tracks in a multiplex source and (b) ask if they have a 'kind' (I still prefer 'role', but whatever), does NOT imply that they will always answer the question.  So it's OK to use old AVI files, for example.

The 3G liaison explicitly mentioned 'main', but I would have thought that this is the default -- if a track is not explicitly labelled as performing some other role, from the HTML5 point of view, treat it as part of the main program (you really have no other choice).

It would be trivial for me to register a user-data item for use in ISO BMFF family and QuickTime files (MP3, MOV, 3GP, 3G2, etc.), to carry a w3c 'kind', or perhaps to carry a URN, which could be from the W3C or other namespaces.
Comment 4 Ian 'Hixie' Hickson 2011-04-26 21:22:07 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Partially Accepted
Change Description: see diff given below
Rationale: I've exposed the union of what Ogg and WebM currently expose, minus the stuff in Ogg that seemed rather pie-in-the-sky ("audio/music", "audio/speech", and "audio/sfx").
Comment 5 contributor 2011-04-26 21:22:21 UTC
Checked in as WHATWG revision r6032.
Check-in comment: add getKind() since WebM does have something like this, and MPEG is apparently planning to add it, so that (with Ogg) means three of the oft-supported formats have something relevant here.
http://html5.org/tools/web-apps-tracker?from=6031&to=6032
Comment 6 Laura Carlson 2011-04-29 12:12:13 UTC
Per Martin Kliehm "although the bug-triage sub-team didn't decide on these bugs yet, John mentioned in yesterdays teleconference [1] that there are several important multimedia related bugs where he'd like to add the a11ytf keyword. There was approval". 

[1] http://www.w3.org/2011/04/28-html-a11y-minutes.html
Comment 7 Mark Watson 2011-05-13 01:15:07 UTC
Three additional track kinds are required for accessibility purposes as follows:

captions - video with open ("burned in") captions
subtitles - video with open ("burned in") subtitles
clearaudio - an alternative audio track in which sounds which are not dialog or other important non-speech information are attenuated

Video with open captions/subtitles is still in practical use when closed captions/subtitles in separate files are not available. This is often the case with old content.

Clear audio streams are intended to make dialog easier to understand for those with poor hearing. They are available for some content and specifically are available in the UK from the BBC.

There is work happening at MPEG and 3GPP to expose track kinds in their adaptive streaming standards. MPEG and 3GPP would like to utilise the same terms and meaning as we are using here to make the mapping of track types in their adaptive streaming formats to our API obvious.
Comment 8 Philip Jägenstedt 2011-05-13 07:56:33 UTC
(In reply to comment #7)
> captions - video with open ("burned in") captions
> subtitles - video with open ("burned in") subtitles

If the video track has burned in text then you can't turn it on or off, so what could a user agent do with this information? It seems like it would only be theoretically relevant in a case where you have *both* the original video and a version with open captions/subtitles, which must be extremely uncommon.
Comment 9 Mark Watson 2011-05-13 15:18:12 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > captions - video with open ("burned in") captions
> > subtitles - video with open ("burned in") subtitles
> 
> If the video track has burned in text then you can't turn it on or off, so what
> could a user agent do with this information? It seems like it would only be
> theoretically relevant in a case where you have *both* the original video and a
> version with open captions/subtitles, which must be extremely uncommon.

Yes, it's for the case where you have both, which is not uncommon. We have material of this kind where, for whatever reason, the captions are only available in this form and not as a separate file.

Obviously, it's only efficient if you have a transport system which only transports the tracks being viewed over the network, which we have and for which there are now emerging standards (e.g. DASH).
Comment 10 Mark Watson 2011-06-01 21:43:04 UTC
Please note the correct term is "clean audio" rather than "clear audio".
Comment 11 Ian 'Hixie' Hickson 2011-06-10 20:10:22 UTC
Dave, do you have an update for what MPEG is exposing here by any chance?

The current list was built by taking the kinds supported by Ogg, which was the only format I could find that exposed this kind of stuff. I'm happy to add more kinds, but we shouldn't add some that will never ever be exposed, that would just confuse authors.
Comment 12 Silvia Pfeiffer 2011-06-11 01:23:54 UTC
The liaison document posted to the a11y TF, see http://lists.w3.org/Archives/Public/public-html-a11y/2011May/0156.html , states the following:

* main – this stream is part of the main program content;
* supplementary – for a main program that is audio, a supplementary video stream might provide, for example, dynamic graphics;
* alternate – such a stream might provide a different camera viewpoint (and we strongly recommend the provision of further annotations to clarify the nature of the alternative);
* commentary;
* dub – an alternative audio stream that contains a non-original language;
* captions;
* subtitles.

Note that the a11y TF analysed the existing kinds in http://www.w3.org/WAI/PF/HTML/wiki/Track_Kinds and the requested list of additions that Mark posted was the result of that discussion:


* captions - video with open ("burned in") captions
* subtitles - video with open ("burned in") subtitles
* cleanaudio - an alternative audio track in which sounds which are not dialog or
other important non-speech information are attenuated

of which we have broad agreement on the first two and are seeking further input from those who have been using and distributing cleanaudio tracks in the UK.
Comment 13 Michael[tm] Smith 2011-08-04 05:17:10 UTC
mass-move component to LC1
Comment 14 Ian 'Hixie' Hickson 2011-09-26 19:50:29 UTC
I am blocked here on seeing spec text for other video formats that support this concept, since I intend to avoid speccing types that don't map to anything (since having such types wastes author time, as we've seen with many such features in the past).
Comment 15 Mark Watson 2011-09-26 21:34:45 UTC
(In reply to comment #14)
> I am blocked here on seeing spec text for other video formats that support this
> concept, since I intend to avoid speccing types that don't map to anything
> (since having such types wastes author time, as we've seen with many such
> features in the past).

Ian,

Please check section 5.8.5.5. of the MPEG DASH specification, available at http://www.3gpp.org/ftp/Inbox/LSs_from_external_bodies/ISO_IEC_JTC1_SG29_WG11/29n12313.zip

As I have argued above, HTML provides a perfect integration point where things that are signalled in different ways in different containers can be mapped to a common, container-independent language, which is what page authors want to see. It is not a waste of anyones time to define common-language terms for well-defined concepts that are clearly required and that are already in use, if not yet in the web context. This provides an incentive for container formats to add the necessary signalling. [but to be clear, it may well be a waste of time to define common-language terms for concepts which are not well-defined or not yet in use anywhere].

Put another way, your argument above could equally be given by a container designer who says there is no point in marking a stream with some indication which could never be passed to the HTML presentation layer. Someone has to go first and it makes more sense for this to be the common-language name in HTML, not the container-specific name.

The above comments are largely targeted at clean audio, which I believe is the one remaining category not included in either the LC HTML draft or the DASH specification.

...Mark
Comment 16 Ian 'Hixie' Hickson 2011-09-29 22:17:22 UTC
(In reply to comment #15)
> 
> Please check section 5.8.5.5. of the MPEG DASH specification, available at
> http://www.3gpp.org/ftp/Inbox/LSs_from_external_bodies/ISO_IEC_JTC1_SG29_WG11/29n12313.zip

Is this the right link? It seems to be a zip file containing a zip file containing XML files and a schema, along with an HTML file describing the package and some files in a proprietary format I can't read. Is there an HTML (or failing that, PDF) version somewhere?


> Put another way, your argument above could equally be given by a container
> designer who says there is no point in marking a stream with some indication
> which could never be passed to the HTML presentation layer.

I don't think that's a real concern. Historically people have had no problem adding features to their languages and technologies before HTML APIs could represent them. It seems unlikely that these features, if they are useful at all, would only be useful in an HTML context.
Comment 17 Mark Watson 2011-09-30 01:14:26 UTC
(In reply to comment #16)
> (In reply to comment #15)
> > 
> > Please check section 5.8.5.5. of the MPEG DASH specification, available at
> > http://www.3gpp.org/ftp/Inbox/LSs_from_external_bodies/ISO_IEC_JTC1_SG29_WG11/29n12313.zip
> 
> Is this the right link? It seems to be a zip file containing a zip file
> containing XML files and a schema, along with an HTML file describing the
> package and some files in a proprietary format I can't read. Is there an HTML
> (or failing that, PDF) version somewhere?

Yes, it's the correct file. The one you want is the large Word document. That seems to be the format use by MPEG and 3GPP for exchanging documents.

> 
> 
> > Put another way, your argument above could equally be given by a container
> > designer who says there is no point in marking a stream with some indication
> > which could never be passed to the HTML presentation layer.
> 
> I don't think that's a real concern. Historically people have had no problem
> adding features to their languages and technologies before HTML APIs could
> represent them. It seems unlikely that these features, if they are useful at
> all, would only be useful in an HTML context.

Oh, I completely agree that they would not be useful only in an HTML context. They are useful when provided in HTML *and* container formats. Again, it's a question of who goes first and again it makes more sense for the container-independent language to lead the way for things where the need for the feature by users is obvious and unquestioned, as is the case for clean audio.
Comment 18 Ian 'Hixie' Hickson 2011-10-21 02:57:37 UTC
Do you have it in a non-proprietary format by any chance? A word doc in a zip file really isn't accessible to me.
Comment 19 contributor 2012-02-09 00:20:50 UTC
Checked in as WHATWG revision r6982.
Check-in comment: Add some of DASH's values to AudioTrack.kind and VideoTrack.kind.
http://html5.org/tools/web-apps-tracker?from=6981&to=6982
Comment 20 Ian 'Hixie' Hickson 2012-02-09 00:21:49 UTC
Not sure what to do with DASH's "subtitle" and "caption" roles. They seem orthogonal to the other roles (you could have a "supplementary" video with and without subtitles, for example). In fact, given their definition of "dub" to apply to any component, it's not clear to me why you couldn't have a video track that was "main", "subtitle", and "dub" all at the same time.

I haven't currently exposed "supplementary".
Comment 21 Ian 'Hixie' Hickson 2012-02-09 00:27:38 UTC
Oh, I see, DASH allows a track to have multiple roles. Hmm. Not sure how to expose that.
Comment 22 Mark Watson 2012-02-09 00:32:41 UTC
(In reply to comment #21)
> Oh, I see, DASH allows a track to have multiple roles. Hmm. Not sure how to
> expose that.

How about allowing the role attribute to contain list of whitespace-separated keywords ?
Comment 23 contributor 2012-02-09 00:33:58 UTC
Checked in as WHATWG revision r6983.
Check-in comment: More DASH support for AudioTrack.kind and VideoTrack.kind.
http://html5.org/tools/web-apps-tracker?from=6982&to=6983
Comment 24 Ian 'Hixie' Hickson 2012-02-09 00:35:20 UTC
Mark: That would make the API maddeningly hard to use right.

EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Accepted
Change Description: see diff given below
Rationale:

I've tried to map the most useful DASH role combinations to the defined HTML values. Hopefully that's good enough for most purposes. If there are specific use cases that this doesn't handle, please file separate bugs (I don't mind if they are handled as LC1 bugs).

If there are other specifications that define specific roles that should be mapped here, please file separate bugs with links to the relevant specs (I don't mind if they are handled as LC1 bugs either).
Comment 25 Bob Lund 2012-02-09 20:39:40 UTC
(In reply to comment #24)
> Mark: That would make the API maddeningly hard to use right.
> 
> EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are
> satisfied with this response, please change the state of this bug to CLOSED. If
> you have additional information and would like the editor to reconsider, please
> reopen this bug. If you would like to escalate the issue to the full HTML
> Working Group, please add the TrackerRequest keyword to this bug, and suggest
> title and text for the tracker issue; or you may create a tracker issue
> yourself, if you are able to do so. For more details, see this document:
>    http://dev.w3.org/html5/decision-policy/decision-policy.html
> 
> Status: Accepted
> Change Description: see diff given below
> Rationale:
> 
> I've tried to map the most useful DASH role combinations to the defined HTML
> values. Hopefully that's good enough for most purposes. If there are specific
> use cases that this doesn't handle, please file separate bugs (I don't mind if
> they are handled as LC1 bugs).
> 
> If there are other specifications that define specific roles that should be
> mapped here, please file separate bugs with links to the relevant specs (I
> don't mind if they are handled as LC1 bugs either).

I agree that an enumeration of track kinds is best but there is an issue of spec timing that should be considered.

I raised the need for a new role value with the DASH spec authors to address the same issue identified in bug 13357 (new kind for pre-mixed audio descriptions) . The response was that future specs should create new schemes that define needed Role values.

If Roles (or Role combinations) are _only_ explicitly enumerated in HTML for a given set of referenced specs then it seems likely there will be  future cases that aren't captured in a particular version of HTML. It would be very useful if when the UA encountered a Role, or set of Role values, that were not mapped in HTML, a string, such as was suggested in Comment 22, was created.
Comment 26 Bob Lund 2012-02-10 00:32:45 UTC
(In reply to comment #24)
> Mark: That would make the API maddeningly hard to use right.
> 
> EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are
> satisfied with this response, please change the state of this bug to CLOSED. If
> you have additional information and would like the editor to reconsider, please
> reopen this bug. If you would like to escalate the issue to the full HTML
> Working Group, please add the TrackerRequest keyword to this bug, and suggest
> title and text for the tracker issue; or you may create a tracker issue
> yourself, if you are able to do so. For more details, see this document:
>    http://dev.w3.org/html5/decision-policy/decision-policy.html
> 
> Status: Accepted
> Change Description: see diff given below
> Rationale:
> 
> I've tried to map the most useful DASH role combinations to the defined HTML
> values. Hopefully that's good enough for most purposes. If there are specific
> use cases that this doesn't handle, please file separate bugs (I don't mind if
> they are handled as LC1 bugs).
> 
> If there are other specifications that define specific roles that should be
> mapped here, please file separate bugs with links to the relevant specs (I
> don't mind if they are handled as LC1 bugs either).

https://www.w3.org/Bugs/Public/show_bug.cgi?id=13357#c5 defines the requirement for a kind for premixed audio descriptions. Adding this kind value for that bug will make it available for use in DASH as well, which will need to meet the same requirement.
Comment 27 Ian 'Hixie' Hickson 2012-02-10 00:40:25 UTC
HTML will be continually updated, so as new values are needed they can be added.