Bug 23661 - 1 video stream requirement would restrict sign-language use cases.
Summary: 1 video stream requirement would restrict sign-language use cases.
Alias: None
Product: HTML WG
Classification: Unclassified
Component: Media Source Extensions (show other bugs)
Version: unspecified
Hardware: PC Linux
: P2 normal
Target Milestone: LC
Assignee: Adrian Bateman [MSFT]
QA Contact: HTML WG Bugzilla archive list
Keywords: a11y, a11ytf
Depends on:
Reported: 2013-10-28 22:38 UTC by Aaron Colwell
Modified: 2013-12-19 16:28 UTC (History)
5 users (show)

See Also:


Note You need to log in before you can comment on or make changes to this bug.
Description Aaron Colwell 2013-10-28 22:38:45 UTC
Placeholder bug for a concern raised by individual participants in the HTML Accessibilty Task Force during the LC period.

Quoted text from http://lists.w3.org/Archives/Public/public-html-media/2013Oct/0022.html

"The first is that it seems the first real paragraph in section 2.2 only
requires 1 video stream to be supported. This would restrict sign-language
interpretation to that actually encoded directly into the video stream,
which is problematic for localisation and in general for the provision of
accessibility enhancement by third parties."
Comment 1 Aaron Colwell 2013-10-28 23:10:59 UTC
The one video stream requirement is purely there to establish a minimal level of support for interoperability. Implementations are allowed to go above and beyond this requirement if they want to enable sign-language video streams or other multi-video stream use cases.

As far as I am aware, the HTML5 and/or HTML.next specs don't address the sign-language video stream use case for normal non-MSE playback. It seems like the behavior for this use case should be defined there first before MSE is required to support it. As the text stands right now, there is nothing that says more than 1 video stream is not allowed.
Comment 2 Adrian Bateman [MSFT] 2013-11-05 15:54:50 UTC
I am resolving this bug WORKSFORME. As Aaron states, nothing in HTML5 mandates more than one video stream and MSE does directly support multiple tracks. In fact, MSE explicitly extends the AudioTrack and VideoTrack interfaces to allow programmatic control of track kind to enable such scenarios.
Comment 3 Philippe Le Hegaret 2013-12-12 16:42:34 UTC
Note that there is a proposal circulating in the HTML a11y TF:

As well as a CfC for the proposal:
Comment 4 Adrian Bateman [MSFT] 2013-12-17 17:11:15 UTC
I believe the text proposed by the A11Y TF [1] is inaccurate. Aaron, explained the changes to allow programmatic change of the kind attribute [2] do not add new functionality - they merely allow an application to set the value that might have been provided in a manifest file.

Specifically, Aaron wrote: "The extensions to AudioTrack and VideoTrack have nothing to do with enabling accessability. They were added to allow kind information present in a DASH manifest to be reflected by these objects in the case where the kind information was not present in the initialization segments. It is not intended to add extra functionality aside from allowing the string returned from the kind attribute to be changed programatically."

It is important to remember that the scope of MSE is to allow JavaScript instead of the network to provide media stream data to a HTML5 media element. The requirements in MSE are to ensure playback of this JavaScript provided data. Any handling of tracks beyond ensuring basic playability is in not scope for MSE and should be handled by HTML5. Since MSE has a normative dependency on HTML5 any changes or clarifications to track handling in HTML5 will be inherited by MSE. In general, most track processing happens above the layer that MSE targets, which is abstracting away the network interactions.

The reason the TF agreed to resolve this issue WORKSFORME was not related to what any implementations might do today but because the TF consensus is that the MSE spec already says all it needs to say about this matter. Anything else should be done in HTML5.

This was discussed again on the TF call on 12/17 and the consensus was to stand by the previous decision. [3]

[1] http://lists.w3.org/Archives/Public/public-html-a11y/2013Dec/0051.html
[2] http://lists.w3.org/Archives/Public/public-html-media/2013Dec/0017.html
[3] http://www.w3.org/2013/12/17-html-media-minutes.html#item03
Comment 5 Philippe Le Hegaret 2013-12-19 15:23:57 UTC
For completeness:
The Resolution is that the Task Force is not yet satisfied with simply  
closing Bug 23661 as Works For Me, and requests a non-normative note in  
the MSE specification clarifying that a minimally conformant MSE  
configuration will not satisfy certain accessibility use cases, such as  
the use of a second synchronised video to provide signed captioning of the  

Note that since the proposed change would have no effect on conformance,  
this does not represent an objection to the document advancing to  
Candidate Recommendation, but is a request to reopen bug 23661 until it  
can be resolved to the mutual satisfaction of the relevant Task Forces.
Comment 6 Adrian Bateman [MSFT] 2013-12-19 16:28:51 UTC
I remain unconvinced that technical evidence has been provided to support the statement that "a minimally conformant MSE configuration will not satisfy certain accessibility use cases, such as the use of a second synchronised video to provide signed captioning of the first." Since synchronising video is out of scope for MSE and would be provided by higher level functionality in HTML5 it is not obvious to me what an implementation of MSE would have to do differently to support this use case.

If the Accessibility Task Force chooses to re-open this bug I would strongly urge them to explain the technical difference between and "minimally conformant MSE" implementation and an implementation of MSE that does satisfy the "certain accessibility use cases" they have in mind. In other words, what about the network layer abstraction provided by MSE has to change?