From HTML WG Wiki
< ISSUE-194
Revision as of 08:03, 11 June 2012 by Spfeiffe (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Introduction of a <transcript> element

Author: Media Subgroup of HTML Accessibility Task Force

In particular: Silvia Pfeiffer (Google), John Foliot, Janina Sajka, Charles McCathieNevile

Editors: Silvia Pfeiffer (Google), John Foliot


This is a proposal to address the need for video and audio transcripts through introduction of a <transcript> element and a @transcript attribute on HTML5 media elements. It is based on an analysis of use cases for video transcripts and proposes a machine-discoverable, unified approach to realizing them with a dedicated rendering area for the transcript / transcript link.


Issue 194 asks for a mechanism for associating a full transcript with an audio or video element. It does so by stating some requirements and a single use case, namely a link to an off-page transcript resource.

In the arguments of the different suggested Change Proposals, many different use cases appear, not just the off-page link use case.

In this CP we collect the different use cases for transcripts and address them in a uniform manner.

The use cases

UC1: A full text transcript for the media asset is provided with the media resource in a separate but linked resource. (also T1 in the Media Accessibility User Requirements)

Publishers that don't have captions often replace them with links to non-timed transcripts because WCAG2 Success Criteria 1.2.1 explicitly mentions this as a solution. These linked documents are often not even in HTML.


UC2: A full text transcript for the media asset is provided as text on the same page of the media resource. (also T2 in the Media Accessibility User Requirements)

Examples of non-timed transcripts published underneath the video on-page:

UC3: A full text interactive transcript for the media asset is provided as text on the same page with the media resource and scrolls along in sync to the media resource. (also T2 in the Media Accessibility User Requirements)

Publishers that have a timed transcript (e.g. captions) provide an interactive transcript next to/underneath their videos.


Video solution providers:

This is a paradigm that Web developers often implement and often get wrong, so providing it by a Web browser is exciting new functionality.

Interactive transcripts are particularly useful to blind and vision-impaired users: they can scan through the text in the transcript with a screen-reader and click to activate video playback at a point in time that is of interest to watch the video/audio. This is similar to chapter markers, except that the full text transcript is being used to scan through the video rather than some (typically scant) chapter markers.

Interactive transcripts can be visually distracting. Therefore, browsers may provide an interactive means to allow users to hide interactive transcripts, e.g. by rendering a "minimize" control on the rendered transcript.

UC4: A full text transcript for the media asset is provided on a separate page to the media resource.

We sometimes see pages publish the transcript of an event without actual video or audio recordings.


The Requirements

R1: Discoverability - the end user (sighted or otherwise) can discover that there is a transcript available; machines (AT, search engines, syndication) can discover that there is a transcript available.

R2: Choice to consume - the option to consume or not consume the transcript remains in the control of the user.

R3: Rich text transcripts - transcripts should be able to support richer content than flat text, including WebVTT files, HTML, RTF, Daisy or other formats.

R4: Design aesthetics - the transcript display needs to be stylable for design aesthetic, including the possibility to include it in the video controls.

R5: Embeddable - the transcript needs to be embeddable, i.e. given as a separate resource, but rendered full-text on-page.

R6: Fullscreen support - the transcript needs to be able to go fullscreen with the media element

R7: Retrofitting - it should be easy for authors who are already publishing content with transcripts to retrofit their existing pages.

R8: No link duplication - transcript link duplication should be avoided.

R9: Multiple transcripts - transcripts may be available in different languages - making multiple links available should be possible.

R10: Stand alone transcripts - transcripts need to be available even in browsers that do not support or do not render audio or video elements. In fact, it should be possible to render transcripts without requiring a media element be present on the same page.

Proposed Solution: Overview

To deal with transcripts as first class citizens on the Web, we propose to introduce a <transcript> element.

Similar to how <article> or <footer> denote areas of particular semantic on Web pages, transcripts are text areas of particular semantic - also called a landmark. They are a div-like or iframe-like area on the Web page.

The <transcript> element is linked to a video or audio element, if presented on the same page, through a @transcript attribute, which contains one or more IDREFs to <transcript> elements.

Simple example:

<video transcript="foo" controls src="video.mp4"></video>
<transcript id="foo" lang="en" aria-label="English transcript"><a href="transcript.html">Transcript for video</a></transcript>

This makes the availability of the transcript discoverable from the video element and allows a visual representation of the availability of the transcript both on the video element and on the Web page.

The availability of a transcript MUST be announced by AT.

The transcript can be styled with CSS and hidden if necessary, while still being available to AT through the @transcript attribute on the video element.

Details of the Proposed Solution

The proposal is for introduction of both a <transcript> element as a block-level element and a @transcript attribute on the HTMLMediaElement.

The @transcript attribute will consist of a space separated list of IDREFs.

The <transcript> element will have the following IDL (similar to div, but with track support):

 interface HTMLTranscriptElement : HTMLElement {
   readonly attribute TextTrackList textTracks;
   TextTrack addTextTrack(DOMString kind, optional DOMString label, optional DOMString language);

The <track> element will be allowed to be used as a child of the transcript element before any flow content.

Also add "transcripts" to the list of UI controls that a browser vendor should include in default rendered media controls:

"If the attribute is present, or if scripting is disabled for the media element, then the user agent should expose a user interface to the user. This user interface should include features to begin playback, pause playback, seek to an arbitrary position in the content (if the content supports arbitrary seeking), change the volume, change the display of closed captions or embedded sign-language tracks."


Add "or transcripts" to the end of that sentence.

Satisfying [UC1] linked transcripts

  • Linked transcripts are in current practice typically provided through a visible URL underneath the video.
  • This is a simple and effective solution and satisfies WCAG2.
  • It provides visual exposure to all users and to old UAs and ATs.
  • However, this approach does not integrate with the new HTML media elements (R5), nor are such transcripts machine discoverable (R1).
  • We propose to embrace this approach through enclosing the typically used <a> element in a <transcript> element.

The proposed markup for this use case is:

<video transcript="t1" src=video.mp4 controls></video>
<transcript id=t1 lang="en" aria-label="English transcript">
  <a href="transcript.doc">Link to the English transcript</a>

It renders as a link on the page in the location of the <transcript> element. The browsers are also encouraged to render an interactive control/switch on the video element that takes the viewer to the transcript. This control may be a menu and the value of the @aria-label attribute can provide the item text. However, this may adapt to device- and platform-specific native browser controls.

  • If the <transcript> is made invisible (e.g. by rendering off-screen), this still enables the link to be present in the video player.
  • Links to multiple languages can then be published either in the linked document or listed through several <transcript> elements:
<video transcript="t1 t2" src=video.mp4></video>
<transcript id=t1 lang=en aria-label="English transcript">
  <a href="transcript.html">Transcript for the video</a>
<transcript id=t2 lang=de aria-label="German transcript">
  <a href="transkript.rtf">Transkript fuer das Video</a>

This will render two links on the page and two menu items on the video player's transcript menu.

  • Web browsers can render a menu on top of the video with the links off-page and the @aria-label attribute value as a "label" of the menu item.
  • Since there is an <a> element inside the <transcript> element, this approach renders well in old browsers that do not yet support <transcript>.

Satisfying [UC2] on-page transcripts

  • On-page transcripts are in current practice typically provided through a text block underneath or beside the video (in <p> or <div> or <textarea>).
  • This is a simple and effective solution and satisfies WCAG2.
  • However, this approach does not integrate with the new HTML media elements (R5), nor are such transcripts machine discoverable (R1).
  • We propose to embrace this approach by encapsulating the given transcript inside a landmark <transcript> element to provide text directly on-page that is machine-discoverable:
<video transcript=t1 src=video.mp4></video>
<transcript id=t1>
 <p>This is an on-page transcript of the video.</p>
  • This is backwards compatible with browsers that do not yet support the <transcript> element.
  • It provides visual exposure to all users.
  • It is trivial for existing publishers to change their content to this approach by just adding a <transcript> element around their existing text content and to link it to a HTML5 video element.
  • It is also possible to render an off-page HTML page into a <transcript> element by using an <iframe>:
<video transcript=t1 src=video.mp4></video>
<transcript id=t1>
  <iframe src="transcript.html" style="width:200px; height:200px;"></iframe>
  • Rich markup is also possible inside the <transcript> element.
  • If the video and transcript want to be taken fullscreen together, the Web developer should add a <div> around the two and a button that would make the two go fullscreen together:
 <div id="fullscreen">
  <video transcript=t1 src=video.mp4></video>
  <transcript id=t1>
   <p>This is an on-page transcript of the video.</p>
   var elem = document.getElementById("fullscreen");

  • Similarly for embedding the video with the transcript, use a <div> around the two elements and copy that to another page (or use the iFrame trick of YouTube):
 <div id="embed">
  <video transcript=t1 src=PATH/video.mp4></video>
  <transcript id=t1>
   <p>This is an on-page transcript of the video.</p>

Satisfying [UC3] interactive transcripts

  • Interactive transcripts are becoming more common now and increasingly understood what they typically look like and how they behave. For example, they are gaining acceptance within educational settings, where they re-enforce multi-modal learning practices.
  • They are frequently requested by video publishers that have created interactive transcripts as a caption file, but who are not Web developers.
  • Interactive transcripts are typically authored as timed text files in the style of WebVTT or TTML.
  • Interactive transcripts typically sit next to or below the video or audio element that they transcribe and take up extra space, so cannot be part of the video's dimensions.
  • Interactive transcripts interact with their video or audio element by keeping the currently displayed text in sync with the currentTime of the element.
  • Interactive transcripts also allow manipulating the video's currentTime when a user clicks on a text segment.
  • It is possible to re-use the <track> element inside <transcript> as a means to provide timed text for interactive transcript purposes:
<video transcript="t1" src=video.mp4></video>
<transcript id=t1 style="width:200px; height:200px" aria-label="English transcript">
 <track src=transcript1.vtt srclang=en>
<transcript id=t1 style="width:200px; height:200px" aria-label="German transcript">
 <track src=transcript2.vtt srclang=de>

This will render a full, interactive transcript into the <transcript> element and provide for the synchronization between the video and the transcript element's content position. The @kind value is not relevant in this case (for consistency purposes we could introduce @kind="transcript").

  • The default rendering would be a section of text on the page with a link associated with each text segment that when clicked on will reposition the video's playback position to the time of the segment. (This follows common practices today: 1, 2) In addition, playback of the video would move the display in the transcript element to have the text segment that relates to the current playback time visible and in focus. Basically it's a different rendering means of a text track file.
  • It is also possible to provide interactive transcripts through JavaScript, for example:
<video id="v1" transcript="t1" src=video.mp4></video>
<transcript id=t1>
 <script src="cues.js" type="text/javascript"></script>
 <span data-for="v1" data-start="0" data-end="5">This is the text for the first cue of the transcript.</span>
  • If the video or audio element fails to load, the <transcript> should never-the-less display, just without any interactive links. This takes us to UC4.

Satisfying [UC4] transcript-only pages

  • Transcripts can easily be created from caption and description files.
  • Sometimes publishers want to just publish a transcript (and no media element) on a page, or they may publish just a link to a media element, but not a full video element.
  • The transcript element can create such a transcript with just text content and without a link to a media element.
  • Example without a media element:
<transcript lang=en>
  <a href="transcript.doc">Here is a transcript of the discussed podcast.</a>
  • Example with a download video link next to the transcript:
<a href="">Download the video here</a>
<transcript style="width:200px; height:200px">
 <track src=transcript1.vtt srclang=en default>
<transcript style="width:200px; height:200px">
 <track src=transcript2.vtt srclang=de>
  • Example with a download video link as part of the transcript:
 This is a transcript of a <a href="SOURCE/video.html">video</a>.
 <a href="">Download the video here</a>
  • Example with a transcript that may itself contain a link to a video:
<transcript id=t1>
  <iframe src="transcript.html" style="width:200px; height:200px"></iframe>


Positive Effects

  • A <transcript> element is a landmark element that provides for machine-discoverable transcripts.
  • It supports all use cases, in particular linked and on-page transcripts and can also provide a native implementation of interactive transcripts using <track> elements.
  • AT can announce the availability of a <transcript> when reaching the video element.
  • If a particular Website doesn't want to show the transcript on the page, they can hide the <transcript> element from visual view. The programmatic association should not suffer in this scenario.

Negative Effects

  • Requires introduction of a new element and a new attribute.

Advantages in comparison to a plain IDREFs @transcript attribute only

A @transcript attribute on the video/audio element alone cannot satisfy some of the needs for a generic solution for transcripts:

  • There is no explicit semantic element that allows for the identification of transcripts - it all has to go through the video/audio element. Transcripts being semantically hidden and not explicitly called out on the page make them second-class citizens on the Web. This is particularly problematic for Use Case 4 where there is no media element on the page, but possibly a download link close or even part of the transcript element.
  • A <transcript> element can receive an aria landmark so you can jump directly to them with AT - this is not possible for the indirect identification of transcripts through a video/audio element and impossible without a media element.
  • There is a multitude of markup possibilities (IDREF to anchor, div, iframe, p, textarea, article, footer etc elements) but not a uniform solution. This is particularly problematic when providing default styling both by the browser and by the Web app developer. It is also much harder to find when manually editing or when scripting.
  • Marking <a>, <div>, <p>, &;lt;iframe>, or any other element as a transcript-containing element is unclean and ugly - a <transcript> element is elegant and extensible for further new transcript-related features.
  • There is no obvious means to provide a label for the transcript/transcript link that could be provided in a menu in the video element. For a <transcript> element we can use @aria-label (or something else more adequate).
  • A @transcript pointing to a div or p element is essentially the same as pointing @aria-describedby at a div or p - these uses may become confused if transcripts are not more explicitly dealt with.
  • Automated rendering of interactive transcripts from a TextTrack file is not possible with such a solution. However, interactive transcripts are a core requirement for blind users to be able to interact in a meaningful way with audio and video - DAISY devices are the living proof. Even if browsers won't implement automated rendering for TextTracks yet, at minimum the browser should expose parsed <track> elements as TextTracks to Web developers so they can render them themselves into a <transcript> element. Ultimately, it would be better to provide such interactive transcripts natively without the need for JavaScript.
  • Even if the user forgets to mark up the video with an @transcript attribute, a <transcript> element will still be announced as transcript content to AT, thus being more robust.
  • Transcript-specific features can be introduced more easily with an explicit <transcript> element, e.g. automatically download with the video page for offline viewing.
  • Using an explicit <transcript> element actually encourages publishers to keep their transcripts visible on page and include them in their published content to the advantage of all users. It makes it a first class feature.

Conformance Classes Changes

  • Requires support of a new element and potentially new attributes on video/audio.


  • None.

Obsoletes the following Change Proposals:

This CP remains: Reuse the rel and for attributes to programmatically associate transcripts with media elements (Edward O'Connor)

Related: Bug 12964 - <video>: Declarative linking of full-text transcripts to video and audio elements