Mint a transcript attribute for the programmatic association of transcripts with media elements
In order to programmatically associate media elements with transcripts, we should use a
transcript="" attribute which may take zero or more IDREFs to elements elsewhere in the document.
This is for ISSUE-194 (full-transcript).
- 1 Mint a transcript attribute for the programmatic association of transcripts with media elements
- 1.1 Rationale
- 1.1.1 Rationale common to both proposals
- 1.1.2 Design of a transcript attribute which takes multiple IDREFs
- 1.1.3 Comparing the two proposals
- 220.127.116.11 R1 Discoverability
- 18.104.22.168 R2 Choice to consume
- 22.214.171.124 R3 Rich text transcripts
- 126.96.36.199 R4 Design aesthetics
- 188.8.131.52 R5 Embeddable
- 184.108.40.206 R6 Fullscreen support
- 220.127.116.11 R7 Retrofitting
- 18.104.22.168 R8 No link duplication
- 22.214.171.124 R9 Multiple transcripts
- 126.96.36.199 R10 Stand alone transcripts
- 188.8.131.52 Surviving cross-document copy-and-paste operations
- 1.2 Details
- 1.3 Impact
- 1.1 Rationale
There are two proposals for adding a media transcript feature to HTML: this proposal (henceforth "our proposal", "this proposal", or "the IDREF proposal") and the Introduction of a @transcript=URL attribute proposal (henceforth "the URL proposal"). As these proposals are quite similar (and are based on the same research) I have separated out rationale which is equally applicable to both proposals, and have separately provided rationale for the differences that remain between the two proposals.
Rationale common to both proposals
When transcripts of media files are available, they are useful to all users. Users of Assistive Technology (AT) obviously benefit from transcripts, but transcripts are also useful to other users.
Consider a video of a college lecture. Students can save time by reading the transcript instead of watching the video. It's also much easier for students to search or to skim for specific content in the transcript than to do so with the media file itself. Given this, it is important for transcript links to be readily exposed to all users.
Transcripts need to be programattically associated with media elements in order for a UA to expose the presence of the transcript in its media controls, in a context menu, or in some other way, and also so that AT can expose the transcript to its users.
Both proposals aim to address two use cases:
UC2 same-document transcripts: a full text transcript is provided as text on the same page of the media resource.
Examples of transcripts published underneath the video on-page:
Design of a transcript attribute which takes multiple IDREFs
The simplest way to ensure that the transcript link is readily exposed to all users (including users of older UAs and ATs) is to encourage or even mandate that authors include this link directly in the visible text of the document, or directly as (part of) the constituent text of the document. Relying on UAs to expose transcript links in a context menu could be problematic on touch devices (which lack context menus). Relying on UAs to expose such links in their default video controls means that users suffer when Web site authors use custom video controls and fail to expose the transcript in their custom controls.
We can associate the media element with visible transcripts (or links to them) somewhere else in the document. To do this, we would add a
transcript="" attribute to the media elements which would take a space-separated set of IDREFs. For each such IDREF, if the ID is that of an
<iframe> element, the document pointed to by the
href="" (in the
<area> cases) or
src="" (in the
<iframe> case) attribute is taken to be the transcript of the media. If the element with the given ID is not an
<iframe> element, the element itself is taken to be the media's transcript.
<video src=video.mp4 transcript="foo bar"></video> <p>Transcripts are available in <a href=foo.html hreflang=en id=foo>English</a> and <a href=bar.html hreflang=de id=bar>German</a>.
or, in the same-document case,
<video src=video.mp4 transcript="foo"></video> <div id=foo>Transcript goes here</div>
This design fulfills the basic need for programattic association of transcripts with media elements, and it's possible to link to same-document transcripts as well as external resources.
This technique is fairly straightforward to author; it is no harder than the existing
<input id> pattern. This technique closely matches existing content which contains transcript links, so it's exceptionally easy to update existing content which publishes transcripts to use this markup pattern.
This design readily exposes the transcript link to users, which helps it work really well in UAs that do not support the
<video> element, and also in UAs that support
<video> but not
Comparing the two proposals
In the (withdrawn) Introduction of a transcript element Change Proposal, ten requirements for transcripts were defined. Let's examine the merits of each requirement and how the mechanism in the two remaining Change Proposals fare.
This is a requirement that transcripts be both human-discoverable and machine-discoverable. Our mechanism fulfills this requirement. In the URL proposal, authors may use a direct URL in the
transcript="" attribute. If authors do this, sighted users will not be able to discover the transcript in several circumstances:
- in existing User Agents (which do not implement a transcript mechanism),
- in future User Agents which expose
transcript=""to AT but do not provide transcript access in their default media controls,
- and in future User Agents which expose
transcript=""to AT and expose transcript access in their default media controls, on sites which use custom media controls that do not provide transcript access.
Summary: the IDREF proposal fulfills R1 better than the URL proposal.
R2 Choice to consume
This requires that users have the ability to control whether or not they consume a transcript. Both proposals fulfill this requirement.
R3 Rich text transcripts
This is a requirement that transcripts may be expressed in various rich text formats (such as HTML), and not just in plain text. Both proposals fulfill this requirement.
R4 Design aesthetics
This has two sub-requirements: one, that how transcripts are displayed be styleable by authors, and two, that it must be possible to expose transcripts in custom video controls. Both proposals fulfill each of these requirement.
This requires that it be possible for transcripts to be expressed as an external document, while also embedded into the document which contains the media element. Both proposals fulfill this requirement (with
R6 Fullscreen support
This requires that it be possible for transcripts to "go fullscreen with the media element." To the extent that I can make sense of this requirement, both mechanisms fulfill it. That is, there is nothing in either mechanism that forbids or prevents this.
"It should be easy for authors who are already publishing content with transcripts to retrofit their existing pages." Our mechanism fulfills this requirement. In fact, this requirement was a key motivation of the design of the mechanism advocated in this Change Proposal.
The URL proposal less closely matches existing author behavior, so increases the authorial effort required to retrofit existing pages.
Summary: the IDREF proposal fulfills R7 better than the URL proposal.
Our mechanism fulfills this requirement. In fact, this requirement was a key motivation of the design of the mechanism advocated in this Change Proposal.
In the URL proposal, authors who wish to provide a visible transcript link will most likely duplicate the link. We should avoid duplicating the link to the transcript, because such duplicated data tends to bit-rot, thus harming accessibility. [Çelik, Doctorow]
Summary: the IDREF proposal fulfills R8 better than the URL proposal.
R9 Multiple transcripts
Transcripts may be available in several languages; the mechanism we come up with should straightforwardly allow authors to link to multiple transcripts. Our mechanism fulfills this requirement. In fact, this requirement was a key motivation of the design of the mechanism advocated in this Change Proposal.
You can link to many transcripts, and you can use the existing
hreflang="" attribute to hint to the UA about the language that each transcript is in. Because the association is from the media element to the transcript elements, it's especially easy for UAs to find all of the media element's transcripts (without having to process the entire DOM).
The URL proposal allows for the provisioning of multiple transcripts, but does so in a way that fails to provide language, title, or other such metadata for each transcript. This may harm the user's ability to choose the correct transcript.
Summary: the IDREF proposal fulfills R9 better than the URL proposal.
R10 Stand alone transcripts
Our mechanism fulfills this requirement. Which is to say, it is possible for UAs to render transcript documents which are not programmatically associated with media elements.
In the URL proposal, authors may use a direct URL in the
transcript="" attribute. If authors do this, the transcript link will not be available in browsers that do not support or do not render audio or video elements.
Summary: the IDREF proposal fulfills R10 better than the URL proposal.
Surviving cross-document copy-and-paste operations
The programmatic association of the
<video> with its transcript might not be maintained through a cross-document copy-and-paste operation, though this is primarily a function of the distance in the DOM between the media element and the element representing the transcript, and not the actual form of programmatic association.
To completely avoid the copy/paste problem, the elements pointed to by
transcript="" could be contained within the media element's subtree. Also, sites which provide for the embedding of media often proivde a
<textarea> for easily copying their embed markup. Such sites can include markup using whatever mechanism we decide on, thus reducing the impact of the copy-paste problem even further.
Summary: the URL proposal handles the copy-paste scenario somewhat better than the IDREF proposal.
N.B. The spec changes described below are intended to fully describe the sorts of changes necessary, but the exact form of the changes to be made are left to the discretion of the editor(s). (This is not a diff that can be blindly applied to the specification. Should the editor(s) find this description difficult to apply unambiguously, the author of this Change Proposal volunteers to work with them and the WG to resolve any such ambiguities identified.)
New section on transcripts
Add a section defining the transcript="" attribute.
Transcripts for media elements may be provided, either directly in the text of the page, indirectly by linking to an external document with an
<area> element, or by transclusion with an
<iframe> element. To programmatically associate such a transcript with a media element, a
transcript="" attribute on the media element may be used.
The media element can be associated with zero or more transcripts, known as the media element's transcripts, by using the transcript attribute.
Except where otherwise specified by the following rules, a media element has no transcript.
The transcript attribute may be specified to indicate a transcript with which the media element is to be associated. If the attribute is specified, the attribute's value, when split on spaces, must be a list of IDs of elements in the same Document as the media element. If the attribute is specified and there is an element in the Document whose ID is equal to one of the entries in the transcript attribute, then that element is one of the media element's transcripts.
Modifications to the-video-element and the-audio-element
#the-video-element, update the note beginning with the sentence "In particular, this content is not intended to address accessibility concerns". Specifically, change the sentence
For users who would rather not use a media element at all, transcripts or other textual alternatives can be provided by simply linking to them in the prose near the
to reference this new mechanism.
Make a similar edit to the same note in
- By programattically associating transcripts with media elements, we enable users, both assistive technology users and otherwise, to more easily access transcripts.
- It's easy to update existing content to use this markup pattern, so it's easy for authors to adopt this technique.
- We avoid duplicating the link to the transcript, thus preventing the link presented to AT users to fall out-of-sync with the link presented to others.
- You can link to many transcripts, and you can use the existing
hreflang=""attribute to hint to the UA about the language that each transcript is in.
- It's possible to link to same-document transcripts as well as external resources.
- It degrades well in UAs that don't support the
<audio>elements, as well as in UAs that support
<audio>, but have not yet been updated to support programmatically associated transcripts.
- It's more difficult to programmatically associate a transcript link than it is to simply include the link in prose near a media element. Therefore it's reasonable to expect content authors to not bother with the programmatic association. (This is true for all methods of programmatically associating a transcript with a media element.)
Conformance Classes Changes
transcript=""attribute is allowed on
- UAs might not implement this mechanism, thus causing us to drop it from the specification in due course.
- Authors might not adopt this mechanism.
Local Variables: mode: text mode: longlines End: