Warning:
This wiki has been archived and is now read-only.
ChangeProposal/Issue194 SP
Introduction of a <transcript> element
Author: Silvia Pfeiffer (Google)
Rationale
This is a proposal to address the need for video and audio transcripts through introduction of a <transcript> element. It is based on an analysis of use cases for video transcripts and recommends means of realizing these use cases in HTML5.
Contents
Summary
Issue 194 asks for a mechanism for associating a full transcript with an audio or video element. It does so by stating some requirements and a single use case, namely a link to an off-page transcript resource.
In the arguments of the different Change Proposals that have been made, many different use cases appear, not just the off-page link use case.
In this CP we argue that the different use cases for transcripts need to be listed individually and addressed in a uniform manner.
The requirements
Issue 194 lists these requirements:
(1) people with disabilities need a way to access audio/video
(2) transcripts are often provided as a separate resource - we need to include a link to it
(3) the transcript needs to be discoverable for AT users
(4) the transcript needs to be programmatically identifiable by search engines
(5) the transcript needs to be programmatically identifiable for design aesthetic
(6) the transcript needs to be programmatically identifiable for content syndication
(7) the transcript needs to be embeddable (presumably: with the video)
Further requirements have emerged in the discussion:
(8) transcripts are useful to all users (e.g. skimming the content of a lecture)
(9) transcripts/transcript links need to be exposed to all users
(10) transcripts may be exposed in video controls, because when media elements are taken fullscreen, the transcript link still needs to be visible
(11) visibility of a transcript/transcript link should not remove programmatic identifiability
(12) it should be easy for authors who are already publishing content with transcripts to retrofit their existing pages
(13) transcript link duplication should be avoided
(14) transcripts may be available in different languages - linking to them should be possible
(15) transcripts may be embedded on the page - linking to that should be possible
(16) transcripts need to be available even in browsers that do not support or do not render audio or video elements
Note: We agree with most, but not all of these arguments, as will later become obvious.
The use cases
In the Media Accessibility User Requirements document we have actually already identified existing use cases:
[T-1] Support the provisioning of a full text transcript for the media asset in a separate but linked resource, where the linkage is programatically accessible to AT.
[T-2] Support the provisioning of both scrolling and static display of a full text transcript with the media resource, e.g., in a area next to the video or underneath the video, which is also AT accessible.
More detailed, what we encounter in the wild are the following four real-world uses of transcripts:
[UC1] Interactive transcripts: Publishers that have a timed transcript (e.g. captions) provide an interactive transcript next to/underneath their videos. Well-known examples here are YouTube and TED (e.g. TED video, NY Times) and several other video player providers offer them (3ply etc).
[UC2] Linked Transcripts: Publishers that don't have timed transcripts often replace them with links to non-timed transcripts because WCAG2 Success Criteria 1.2.1 explicitly mentions this as a solution (e.g. Centerlink AU Gov site, US State Department). These are often not even in HTML.
[UC3] On-page Transcripts: Sometimes we also see non-timed transcripts published underneath the video on-page (e.g. ESL Videos, Fox News).
[UC4] Transcript-only pages: Sometimes we even have Websites that only publish the transcript of an event without actual video or audio recordings. (e.g. Gates Jobs at D5, White House).
Observations
All four real-world use cases expose the transcript or transcript link explicitly on the Web page.
This makes sense, since transcripts are useful to all users (as requirement 8 states), and need to be exposed to all users (req 9).
None of the real-world use cases expose a transcript link button in the video player itself.
There is a reason this is not done: once in the video element - and in particular when full screen - everything that is clicked on keeps you in the video viewing experience - such a link wouldn't.
None of the real-world use cases expose a transcript link in the context menu either.
Context menus don't easily translate to touch devices, so advanced functionality is avoided in context menus.
When video players provide a fullscreen experience of a transcript with the video, it is an interactive transcript.
Players that have a transcript included (e.g. Drupal BuildAModule) are not just video players any more: they have an extra area next to the video in which the transcript is provided.
Proposed Solution: Overview
To deal with transcripts as first class citizens on the Web, we propose to introduce a <transcript> element.
Similar to how <article> or <footer> denote areas of particular semantic on Web pages, transcripts are text areas of particular semantic, in particular interactive transcripts. They are a div-like or iframe-like area on the page that is linked to a video, if such a video is present on the page.
The <transcript> element should provide means to render timed text files as an interactive transcript where sections of text are linked back to video and clicking on them will move the video's playback position - similarly, a video's current playback time will influence what text in the interactive transcript is currently scrolled into view. This is a paradigm that Web developers often implement and often get wrong, so providing it by a Web browser is exciting new functionality.
The <transcript> element's content will also be available to display non-timed text, such as one or more links to off-page transcript resources, or an actual text-only transcript in one or more paragraphs.
The <transcript> element will be linked to a video or audio element if presented on the same page. It can be styled with CSS and hidden if necessary, while still being available to AT.
Details of Proposed Solution
This section is still speculative in parts. It proposes actual markup features for the <transcript> element, the details of which still need to be worked out.
Satisfying [UC1] interactive transcripts
- Interactive transcripts are common now and it is well understood what they typically look like and how they behave that we should introduce a solution.
- Interactive transcripts are based on timed text files - WebVTT and TTML files can provide such data.
- Interactive transcripts sit next to the video or audio element that they transcribe and take up extra space, so cannot be part of the video's dimensions.
- Interactive transcripts interact with their video or audio element by keeping the currently displayed text in sync with the currentTime of the element.
- Interactive transcripts also allow manipulating the video's currentTime when a user clicks on a text segment.
The basic idea is to introduce a <transcript> element with a @for attribute that provides the link back to the <video> and a @src attribute that links to a text transcript with timing. The default rendering would be a section of text on the page with a link behind each text segment that when clicked on will reposition the video's playback position to the time of the segment. In addition, playback of the video would move the display in the transcript element to have the text segment that relates to the current playback time visible and in focus. Basically it's a different rendering means of a text track file.
A markup example could be:
<video id=v1 src=video.mp4></video> <transcript for=v1 id=t1> <track src=transcript1.vtt srclang=en default> <track src=transcript2.vtt srclang=de> </transcript>
- If the video or audio element fails to load, the <transcript> should never the less display, just without any timing links.
- If the video and transcript want to be taken fullscreen together, one should add a <div> around the two and allow that to go fullscreen.
- Similarly for embedding the video with the transcript, use a around the two elements in an <iframe>.
Satisfying [UC2] linked transcripts- Linked transcripts are currently provided through a URL underneath the video.
- This is a simple and effective solution and satisfies WCAG2.
- It provides visual exposure to all users and to old UAs and ATs.
- We can further integrate it with the interactive transcript proposal to achieve programmatic linkage.
A markup example could be:
<video id=v1 src=video.mp4></video> <transcript for=v1 id=t1> <a href=transcript.doc>Transcript for the video</a> </transcript>
- It is trivial for existing publishers to change their content to this approach by just adding a <transcript> element around their existing link.
- Links to multiple languages can then be published either in the linked document or listed inside the <transcript> element.
Satisfying [UC3] on-page transcripts- On-page transcripts are currently provided through a text block underneath the video (in <p> or <div> or <textarea>).
- This is a simple and effective solution and satisfies WCAG2.
- It is possible to use the content of the <transcript> element to provide text directly on-page. That would, however, have no timing adjustments with the video.
- It provides visual exposure to all users and to old UAs and ATs.
- We can further integrate it with the interactive transcript proposal to achieve programmatic linkage.
A markup example could be:
<video id=v1 src=video.mp4></video> <transcript for=v1 id=t1> <p>This is a on-page transcript.</p> </transcript>
- It is trivial for existing publishers to change their content to this approach by just adding a <transcript> element around their existing text content.
- Different language transcripts would in this example typically be provided by loading the Web page in a different language (i.e. under a different url).
Satisfying [UC4] transcript-only pages- Transcripts can easily be created from caption and description files.
- Somstimes publishers want to just publish a transcript (and no media element) on a page.
- The transcript element can create such a transcript.
A markup example could be:
<transcript> <track src=transcript1.vtt srclang=en default> <track src=transcript2.vtt srclang=de> </transcript>
- This would automatically render the text in the timed text resources on the page, however without links to the video, since there is no video rendered.
Details to work out:- I don't actually know how to render a selection mechanism between different track elements. A menu could be rendered as part of the text block. Or instead it could be done in the video element and influence the rendering in the <transcript> element.
- It might be better to put the programmatic linkage (currently done through @for) onto the video element (then through @transcript) to avoid full page searches for related elements.
Impact
Positive Effects
- A <transcript> element will provide a native implementation of interactive transcripts and also support linked and on-page transcripts.
- AT can announce the availability of a <transcript> when reaching the video element. In particular if we choose the @transcript method.
- If a particular Website doesn't want to show the transcript on the page, they can hide the <transcript> element from visual view. The programmatic association should not suffer in this scenario.
- Is a better solution for long text alternatives than @longdesc or @aria-describedAt, since it also solves the interactive transcript need.
Negative Effects- Requires introduction of a new element.
Conformance Classes Changes- Requires support of a new element and potentially new attributes on video/audio.
Risks
- None.