Bugzilla – Bug 23661
1 video stream requirement would restrict sign-language use cases.
Last modified: 2013-12-19 16:28:51 UTC
Placeholder bug for a concern raised by individual participants in the HTML Accessibilty Task Force during the LC period.
Quoted text from http://lists.w3.org/Archives/Public/public-html-media/2013Oct/0022.html
"The first is that it seems the first real paragraph in section 2.2 only
requires 1 video stream to be supported. This would restrict sign-language
interpretation to that actually encoded directly into the video stream,
which is problematic for localisation and in general for the provision of
accessibility enhancement by third parties."
The one video stream requirement is purely there to establish a minimal level of support for interoperability. Implementations are allowed to go above and beyond this requirement if they want to enable sign-language video streams or other multi-video stream use cases.
As far as I am aware, the HTML5 and/or HTML.next specs don't address the sign-language video stream use case for normal non-MSE playback. It seems like the behavior for this use case should be defined there first before MSE is required to support it. As the text stands right now, there is nothing that says more than 1 video stream is not allowed.
I am resolving this bug WORKSFORME. As Aaron states, nothing in HTML5 mandates more than one video stream and MSE does directly support multiple tracks. In fact, MSE explicitly extends the AudioTrack and VideoTrack interfaces to allow programmatic control of track kind to enable such scenarios.
Note that there is a proposal circulating in the HTML a11y TF:
As well as a CfC for the proposal:
I believe the text proposed by the A11Y TF  is inaccurate. Aaron, explained the changes to allow programmatic change of the kind attribute  do not add new functionality - they merely allow an application to set the value that might have been provided in a manifest file.
Specifically, Aaron wrote: "The extensions to AudioTrack and VideoTrack have nothing to do with enabling accessability. They were added to allow kind information present in a DASH manifest to be reflected by these objects in the case where the kind information was not present in the initialization segments. It is not intended to add extra functionality aside from allowing the string returned from the kind attribute to be changed programatically."
The reason the TF agreed to resolve this issue WORKSFORME was not related to what any implementations might do today but because the TF consensus is that the MSE spec already says all it needs to say about this matter. Anything else should be done in HTML5.
This was discussed again on the TF call on 12/17 and the consensus was to stand by the previous decision. 
The Resolution is that the Task Force is not yet satisfied with simply
closing Bug 23661 as Works For Me, and requests a non-normative note in
the MSE specification clarifying that a minimally conformant MSE
configuration will not satisfy certain accessibility use cases, such as
the use of a second synchronised video to provide signed captioning of the
Note that since the proposed change would have no effect on conformance,
this does not represent an objection to the document advancing to
Candidate Recommendation, but is a request to reopen bug 23661 until it
can be resolved to the mutual satisfaction of the relevant Task Forces.
I remain unconvinced that technical evidence has been provided to support the statement that "a minimally conformant MSE configuration will not satisfy certain accessibility use cases, such as the use of a second synchronised video to provide signed captioning of the first." Since synchronising video is out of scope for MSE and would be provided by higher level functionality in HTML5 it is not obvious to me what an implementation of MSE would have to do differently to support this use case.
If the Accessibility Task Force chooses to re-open this bug I would strongly urge them to explain the technical difference between and "minimally conformant MSE" implementation and an implementation of MSE that does satisfy the "certain accessibility use cases" they have in mind. In other words, what about the network layer abstraction provided by MSE has to change?