From HTML WG Wiki
Jump to: navigation, search

Mint a transcript attribute for the programmatic association of transcripts with media elements

In order to programmatically associate media elements with transcripts, we should use a transcript="" attribute which may take zero or more IDREFs to elements elsewhere in the document.

This is for ISSUE-194 (full-transcript).


When transcripts of media files are available, they are useful to all users. Users of Assistive Technology (AT) obviously benefit from transcripts, but transcripts are also useful to other users.

Transcripts need to be programattically associated with media elements in order for a UA to expose the presence of the transcript in its media controls, in a context menu, or in some other way, and also so that AT can expose the transcript to its users.

Consider a video of a college lecture. Students can save time by reading the transcript instead of watching the video. It's also much easier for students to search or to skim for specific content in the transcript than to do so with the media file itself. Given this, it is important for transcript links to be readily exposed to all users.

The simplest way to ensure that the transcript link is readily exposed to all users (including users of older UAs and ATs) is to encourage or even mandate that authors include this link directly in the visible text of the document, or directly as (part of) the constituent text of the document. Relying on UAs to expose transcript links in a context menu could be problematic on touch devices (which lack context menus). Relying on UAs to expose such links in their default video controls means that users suffer when Web site authors use custom video controls and fail to expose the transcript in their custom controls.

We can associate the media element with visible transcripts (or links to them) somewhere else in the document. To do this, we would add a transcript="" attribute to the media elements which would take a space-separated set of IDREFs. For each such IDREF, if the ID is that of an <a>, <area>, or <iframe> element, the document pointed to by the href="" (in the <a> and <area> cases) or src="" (in the <iframe> case) attribute is taken to be the transcript of the media. If the element with the given ID is not an <a>, <area>, or <iframe> element, the element itself is taken to be the media's transcript.

Sample markup:

<video src=video.mp4 transcript="foo bar"></video>
<p>Transcripts are available in
<a href=foo.html hreflang=en
<a href=bar.html hreflang=de

or, in the same-document case,

<video src=video.mp4 transcript="foo"></video>
<div id=foo>Transcript goes here</div>

This design fulfills the basic need for programattic association of transcripts with media elements, and it's possible to link to same-document transcripts as well as external resources.

This technique is fairly straightforward to author; it is no harder than the existing <label for>/<input id> pattern. This technique closely matches existing content which contains transcript links, so it's exceptionally easy to update existing content which publishes transcripts to use this markup pattern.

This design readily exposes the transcript link to users, which helps it work really well in UAs that do not support the <video> element, and also in UAs that support <video> but not transcript="".

You can link to many transcripts, and you can use the existing hreflang="" attribute to hint to the UA about the language that each transcript is in. Because the association is from the media element to the transcript elements, it's especially easy for UAs to find all of the media element's transcripts (without having to process the entire DOM).

The programmatic association of the <video> with its transcript might not be maintained through a cross-document copy-and-paste operation, though this is primarily a function of the distance in the DOM between the media element and the element representing the transcript, and not the actual form of programmatic association.

We avoid duplicating the link to the transcript, so this technique does not suffer from the bit-rot problems that some other approaches have.

Requirements from the TranscriptElement proposal

In the Introduction of a transcript element Change Proposal, ten requirements for transcripts are presented. Let's examine the merits of each requirement and how the mechanism in this Change Proposal fares in each case.

R1 Discoverability

This is a requirement that transcripts be both human-discoverable and machine-discoverable. Our mechanism fulfills this requirement.

R2 Choice to consume

This requires that users have the ability to control whether or not they consume a transcript. Our mechanism fulfills this requirement.

R3 Rich text transcripts

This is a requirement that transcripts may be expressed in various rich text formats (such as HTML), and not just in plain text. Our mechanism fulfills this requirement.

R4 Design aesthetics

This has two sub-requirements: one, that how transcripts are displayed be styleable by authors, and two, that it must be possible to expose transcripts in custom video controls. Our mechanism fulfills both of these requirements.

R5 Embeddable

This requires that it be possible for transcripts to be expressed as an external document, while also embedded into the document which contains the media element. Our mechanism fulfills this requirement (with <iframe>).

R6 Fullscreen support

This requires that it be possible for transcripts to "go fullscreen with the media element." To the extent that I can make sense of this requirement, our mechanism fulfills it. That is, there is nothing in our mechanism that forbids or prevents this.

R7 Retrofitting

"It should be easy for authors who are already publishing content with transcripts to retrofit their existing pages." Our mechanism fulfills this requirement. In fact, this requirement was a key motivation of the design of the mechanism advocated in this Change Proposal.

R8 No link duplication

Our mechanism fulfills this requirement. In fact, this requirement was a key motivation of the design of the mechanism advocated in this Change Proposal.

R9 Multiple transcripts

Our mechanism fulfills this requirement. In fact, this requirement was a key motivation of the design of the mechanism advocated in this Change Proposal.

R10 Stand alone transcripts

Our mechanism fulfills this requirement. Which is to say, it is possible for UAs to render transcript documents which are not programmatically associated with media elements. See also UC4 below.

Use cases from the TranscriptElement proposal

In the Introduction of a transcript element Change Proposal, four use cases for transcripts are presented.

UC1 Linked transcripts

This use case is addressed by this proposal.

UC2 Same-document transcripts

This use case is also addressed by this proposal.

UC3 Interactive transcripts

Interactive transcripts, while an interesting area of research and experimental implementation, are not ready for standardization at this time. Instead of prematurely constraining innovation in interactive transcripts, we should encourage developers to try a variety of approaches to them. In a few years, when such experimentation has borne fruit, we can re-evaluate interactive transcripts as a potential Web platform feature to standardize.

This proposal does not purport to address this use case.

UC4 Transcripts without associated media elements

If there is no media element with which to programmatically associate a transcript, there is no need to use a mechanism for programmatically associating transcripts with media elements. If transcript documents have semantic structure beyond that available in HTML itself, Microformats, Microdata, or RDFa could be used to express such additional semantics.

This proposal does not purport to address this use case.

Other objections to the TranscriptElement proposal

There are a variety of problems with the TranscriptElement proposal. Some of these problems have already been identified, and some others are identified in this section.

Missing or underspecified details

The TranscriptElement proposal lacks sufficient detail to be interoperably implemented in user agents and accessibility tools. Here's one example: the proposal claims that, "similar to how <article> or <footer> denote areas of particular semantic on Web pages, transcripts are text areas of particular semantic - also called a landmark. They are a div-like or iframe-like area on the Web page." But the <div> and <iframe> elements differ in almost every respect—the behavior or processing requirements of an element that is simultaneously <div>-like and <iframe>-like are entirely unclear.

Verbosity of markup

For UC1 and UC2, the TranscriptElement's proposed markup is significantly more verbose than the markup advocated by this proposal. Compare how this proposal addresses UC1:

<video src=video.mp4 transcript="t1 t2"></video>
Transcripts are available in
<a href=transcript.html hreflang=en
<a href=transkript.rtf hreflang=de

with how the TranscriptElement proposal addresses the same use case:

<video transcript="t1 t2" src=video.mp4></video>
<transcript id=t1 lang=en aria-label="English transcript">
  <a href="transcript.html">Transcript for the video</a>
<transcript id=t2 lang=de aria-label="German transcript">
  <a href="transkript.rtf">Transkript fuer das Video</a>

The latter markup in general requires twice as many elements as the former markup. It would be simpler for the media element to directly point at the elements contained by <transcript>, with no loss of functionality. As a rule, the more complex the markup pattern, the more likely authors are to make mistakes when implementing it. We should strive to find the simplest markup patterns that satisfy our requirements.

Counter-objections to the TranscriptElement proposal's "Advantages in comparison to a plain IDREFs @transcript attribute only"

Several of the objections to this proposal found in this section of the TranscriptElement proposal relate to it not addressing UC3 or UC4. This is a feature of this proposal, not a bug.

The TranscriptElement proposal claims that "there is no explicit semantic element that allows for the identification of transcripts - it all has to go through the video/audio element." This is true. But it goes on to claim, without justification, that transcripts programmatically associated with media elements via transcript="" are "semantically hidden" and "not explicitly called out on the page", neither of which is the case.

TranscriptElement's <transcript> element is a sectioning root; in this proposal, transcript="" can point at the existing sectioning elements, so page authors have the ability to author content such that transcripts are visible in the page outline.

TranscriptElement claims that "there is no obvious means to provide a label for the transcript/transcript link that could be provided in a menu in the video element." But it goes on to say that, "for a <transcript> element we can use @aria-label (or something else more adequate)." There is no reason why aria-label="" could not be used in this case as well as for <transcript>.

It claims that "a @transcript pointing to a div or p element is essentially the same as pointing @aria-describedby at a div or p - these uses may become confused if transcripts are not more explicitly dealt with." But these are not essentially the same. There is at least one essential difference between transcript="" and aria-describedby="": an element referenced by transcript="" has been explicitly identified by the page author as being a transcript, whereas an element referenced by aria-describedby="" has not been.


N.B. The spec changes described below are intended to fully describe the sorts of changes necessary, but the exact form of the changes to be made are left to the discretion of the editor(s). (This is not a diff that can be blindly applied to the specification. Should the editor(s) find this description difficult to apply unambiguously, the author of this Change Proposal volunteers to work with them and the WG to resolve any such ambiguities identified.)

New section on transcripts

Add a section defining the transcript="" attribute.

Transcripts for media elements may be provided, either directly in the text of the page, indirectly by linking to an external document with an <a> or <area> element, or by transclusion with an <iframe> element. To programmatically associate such a transcript with a media element, a transcript="" attribute on the media element may be used.

The media element can be associated with zero or more transcripts, known as the media element's transcripts, by using the transcript attribute.

Except where otherwise specified by the following rules, a media element has no transcript.

The transcript attribute may be specified to indicate a transcript with which the media element is to be associated. If the attribute is specified, the attribute's value, when split on spaces, must be a list of IDs of elements in the same Document as the media element. If the attribute is specified and there is an element in the Document whose ID is equal to one of the entries in the transcript attribute, then that element is one of the media element's transcripts.

Modifications to the-video-element and the-audio-element

In #the-video-element, update the note beginning with the sentence "In particular, this content is not intended to address accessibility concerns". Specifically, change the sentence

For users who would rather not use a media element at all, transcripts or other textual alternatives can be provided by simply linking to them in the prose near the video element.

to reference this new mechanism.

Make a similar edit to the same note in #the-audio-element.


Positive Effects

  • By programattically associating transcripts with media elements, we enable users, both assistive technology users and otherwise, to more easily access transcripts.
  • It's easy to update existing content to use this markup pattern, so it's easy for authors to adopt this technique.
  • We avoid duplicating the link to the transcript, thus preventing the link presented to AT users to fall out-of-sync with the link presented to others.
  • You can link to many transcripts, and you can use the existing hreflang="" attribute to hint to the UA about the language that each transcript is in.
  • It's possible to link to same-document transcripts as well as external resources.
  • It degrades well in UAs that don't support the <video> or <audio> elements, as well as in UAs that support <video> and <audio>, but have not yet been updated to support programmatically associated transcripts.

Negative Effects

  • It's more difficult to programmatically associate a transcript link than it is to simply include the link in prose near a media element. Therefore it's reasonable to expect content authors to not bother with the programmatic association. (This is true for all methods of programmatically associating a transcript with a media element.)

Conformance Classes Changes

  • The transcript="" attribute is allowed on <audio> and <video> elements.


  • UAs might not implement this mechanism, thus causing us to drop it from the specification in due course.
  • Authors might not adopt this mechanism.