From HTML WG Wiki
Change Proposal: introduction of a new @kind value: "transcript"
Raised from bug https://www.w3.org/Bugs/Public/show_bug.cgi?id=12964
Full transcripts give people with disabilities a way to access audio/video content. Transcripts are often provided as a separate resource, because they're often too lengthy to be included on the same page as the audio/video they're associated with.
A mechanism that creates an association between an audio/video element and a full (off page) transcript would have many benefits. These include discoverability for assistive technology users, programmatic identification for search engine indexing, design aesthetic, and content syndication or embedding.
As part of it's initial work for Media accessibility in HTML5, the Accessibility Task Force sub-team charged with this effort produced a document, the Media Accessibility User Requirements which outlined all of the different requirements that various user groups would need with regard to the HTML5 media elements.
For transcripts, the requirements state, in part:
A transcript can either be presented simultaneously with the media material, which can assist slower readers or those who need more time to reference context, but it should also be made available independently of the media.
A full text transcript should include information that would be in both the caption and video description, so that it is a complete representation of the material, as well as containing any interactive options.
Systems supporting transcripts must:
[T-1] Support the provisioning of a full text transcript for the media asset in a separate but linked resource, where the linkage is programatically accessible to AT.
After evaluating a number of different design patterns researched by Edward O'Connor, it became obvious to members of the Accessibility Task Force that the best candidate to support the Transcripts requirement was to put the transcript information in the body of the media element, by extending the
<track> element with the introduction of a new @kind value: transcript.
<video src="video.mp4"> <track kind="transcript" src="transcript.html" srclang="en" label="English language transcript"> </track> </video>
As Edward noted in his research, it's easy to author and to update existing content to use this markup pattern. We can link to external resources as well as another node in the current document. As well, you can link to many transcripts, and you can use the existing
srclang="" attribute to hint to the UA about the language that each transcript is in. Finally, the programmatic association of the
<video> with its transcript would be maintained through a cross-document copy-and-paste operation.
Further discussion at the May Face=to-Face meetings also suggested that it was a simpler authoring pattern to teach content creators, that the pattern would lend itself more easily to WYSIWYG authoring tools (where one modal dialog assistant could allow authors to select all of the different accommodation assets at one time, in one location), and did not impose any design restrictions on content authors (a.k.a. the "D link" problem), as the common controls associated with the
<video> element would serve to expose the Transcript to all users, sighted or non-sighted alike.
Examination of some misconceptions and false assumptions
In Edward's research, he also made some assumptions and statements that warrant further investigation. He states:
Transcripts are untimed, not timed text, so this is abusing the semantics of
Response: this presumes all transcripts will not have any form of timing information associated to them. This is a false presumption; the Media Requirements state: "A transcript can either be presented simultaneously with the media material, which can assist slower readers or those who need more time to reference context" - which would require some form of timing information to achieve.
The requirements also state: "A full text transcript should include information that would be in both the caption and video description, so that it is a complete representation of the material, as well as containing any interactive options." - once again interactivity, while not specifically noted as requiring timing, likely could use timing as part of the interaction scripting.
It was also commented at the Face-to-Face that currently the
@kind="metadata" specifically does not define how that metadata should be processed by the UA, and does not forbid it from being non-timed content.
Finally, it was noted that at least one time-stamp format currently (soon to be) supported in at least one browser, TTML, has a role of transcript which authors today may use.
This design does not lead to a good experience in UAs that do not support the
<video>element, nor in UAs that support
Response: there is nothing in HTML5 that forbids authors from placing a standard link to the transcript inside the
<video> element as a form of "fallback" content: UAs that do not support
<video> will render the link as fallback content.
Assertions that UAs would not or could not support
@kind="transcript" are unfounded at this time, and there is no reason or evidence why UAs that will support other
<track> kinds will be unable to support
This design does not readily expose the transcript link to users. Should the Web author want to expose the transcript link to her page's readers, she would have to duplicate the link in markup... <snipped>
In such a scenario, the duplicate links may get out of sync as the page is maintained and the location of the transcript changes. It is likely that the visible link will be more up to date than the invisible link, thus harming AT users.
Response: given the newness of HTML5 video content overall, and the still emergent support for any kind of support for the
<track> element and
@kind attribute, it is premature to suggest that it would be difficult for users to 'discover' the transcript. UAs have an opportunity to expose the
kind="transcript" content in any fashion best suited to the UA/hardware configuration. Accessing the transcript will be no more or less difficult than accessing captions, subtitles or other
@kind values in emergent UA players, and could reuse existing patterns/designs used for other
@track content, including but not limited to media control buttons, addition to a drop-down menu along side sub-title options, etc. Further, as yet another
<track> value, advanced authors will be able to easily script other exposure/rendering solutions to meet specific design demands.
(<opinion>Given half a chance, there are many smart and talented designers who will solve this issue with elegance and panache</opinion>)
Assertions that transcripts linked on a page using the standard anchor tag would be more up-to-date than transcripts linked via
@track cannot be proven or dis-proven at this time. There is no evidence to suggest that transcripts would somehow become less up-to-date than closed-captions or subtitles associated to the same video using
@track, and in fact if current examples of WYSIWYG's handling of
@longdesc are any indication, emergent modal dialogs for inserting
<video> into web pages and Content Management Systems will likely manage this quite effectively.
If the accuracy question is one of significant concern, then it would apply to all assets linked via
@track, and not exclusively to the Transcript.
Conformance Classes Changes
- Media Accessibility User Requirements - W3C Editor's Draft 14 December 2011
- Possible methods for associating transcripts with media elements - Author: Edward O'Connor (Apple)
- Timed Text Markup Language (TTML) 1.0 - W3C Recommendation 18 November 2010
- WYSIWYG support for @longdesc today - Author: John Foliot