Warning:
This wiki has been archived and is now read-only.

ISSUE-194/Research

From HTML WG Wiki
Jump to: navigation, search


Possible methods for associating transcripts with media elements

Author: Edward O'Connor (Apple)

Summary

This is a survey of the various possible approaches for associating transcripts with media elements.

This is for ISSUE-194 (full-transcript).

Requirements and desirables

When transcripts of media files are available, they are useful to all users. Users of Assistive Technology (AT) obviously benefit from transcripts, but transcripts are also useful to other users.

Transcripts need to be programattically associated with media elements in order for a UA to expose the presence of the transcript in its media controls, in a context menu, or in some other way, and also so that AT can expose the transcript to its users.

Consider a video of a college lecture. Students can save time by reading the transcript instead of watching the video. It's also much easier for students to search or to skim for specific content in the transcript than to do so with the media file itself. Given this, it is important for transcript links to be readily exposed to all users.

The simplest way to ensure that the transcript link is readily exposed to all users (including users of older UAs and ATs) is to encourage or even mandate that authors include this link directly in the visible text of the document. Relying on UAs to expose transcript links in a context menu is problematic on touch devices. Relying on UAs to expose such links in their default video controls means that users suffer when Web site authors use custom video controls and fail to expose the transcript in their custom controls.

That said, it must be possible for UAs to expose transcripts in its media controls, because when media elements are taken fullscreen, the transcript link will not be visible.

The programmatic association of a transcript link with a media element should not be affected by the visibility of the link element itself, because there may be design constraints, house styles, or other restrictions that prevent authors from using a visible link.

It should be easy for UAs to find all of the media element's transcripts.

It should be easy for authors who are already publishing content with transcripts to retrofit their existing pages with whatever mechanism we come up with.

We should avoid duplicating the link to the transcript, because such duplicated data tends to bit-rot, thus harming accessibility. [Çelik, Doctorow]

Transcripts may be available in several languages; the mechanism we come up with should straightforwardly allow authors to link to multiple transcripts. This might not be that bad in practice because, when a mechanism only allows linking to one transcript document, authors could always link to the other language variants from the initial one they link to.

It is common for videos to be embedded on pages which also contain the video's transcript directly, so it should be possible to link to a portion of the current document, and not just to an external resource.

The markup we come up with should behave/degrade well in UAs that don't support the <video> or <audio> elements. It should also degrade well in UAs that support <video> and <audio>, but have not yet been updated to support our transcript-associating mechanism.

It would be nice if the solution we come up with for transcripts today could be easily adapted to other (non-transcript) media element description needs in the future. A more general mechanism could be used, for instance, to link from a movie to the IMDB or Wikipedia entries about that movie.

It would also be nice if you could copy a media element from one document, paste it into another document, and have the transcript association "stick" with the media element.

All things being equal, it should be as easy as possible for authors to use this feature.

Possible designs

There are four basic approaches to programmatically associating a transcript with a media element:

  1. Linking to the transcript document from an attribute of the media element
  2. Linking from the media element to something elsewhere in the document body
  3. Linking from something else in the document body to the media element
  4. Providing a link to the transcript document within the body of the media element

Let's consider variations on these approaches in light of our use cases, requirements, and other desirable qualities. In the following, we also consider allowing for some form of implicit association, and consider the null case as well.

1A - mint a direct transcript attribute

[N.B. This is the design advocated for in the Introduce a new attribute: @transcript Change Proposal.]

In this design, we mint a new transcript="" attribute, applicable to <video> and <audio> elements, which takes as its value a link to a transcript of the media file represented by the <video> or <audio> element.

Sample markup:

<video src=video.mp4 transcript=transcript.html></video>

It's easy to author and to update existing content to use this markup pattern. We can link to external resources as well as another node in the current document.

This design does not lead to a good experience in UAs that do not support the <video> element, nor in UAs that support <video> but not transcript="".

You can only link to one transcript, so this design fails to explicitly handle multiple transcripts in multiple languages. (N.B. the caveat above.)

This design does not readily expose the transcript link to users. Should the Web author want to expose the transcript link to her page's readers, she would have to duplicate the link in markup like so:

<video src=video.mp4 transcript=transcript.html></video>
…
Watch the video or read the <a href=transcript.html>transcript</a>.

In such a scenario, the duplicate links may get out of sync as the page is maintained and the location of the transcript changes. It is likely that the visible link will be more up to date than the invisible link, thus harming AT users.

The programmatic association of the <video> with its transcript would be maintained through a cross-document copy-and-paste operation.

This attribute is purpose-built for transcripts and transcripts only, so we can't reuse it in the future for non-transcript description needs.

1B - aria-describedat

This is the same as option 1A, except instead of minting a new attribute, we use the aria-describedat="" attribute currently being worked on by the WAI-ARIA group.

Sample markup:

<video src=video.mp4 aria-describedat=transcript.html></video>

Case 1B is very similar to case 1A. It suffers from the fact that there's no indication that the transcript is a transcript: it simply says that the video is described by the linked document. It would be difficult for UAs or ATs to expose the fact that a transcript is available. Also, the work on aria-describedat is ongoing, and aria-describedat is not yet ready for use in host languages.

2A - mint an indirect transcript attribute which takes an IDREF

Instead of pointing directly to an external transcript from an attribute on the media element, we could instead associate the media element with a visible transcript (or a link to it) somewhere else in the document. One way to do this would be to add a transcript="" attribute to the media elements which would take an IDREF. If the ID is that of an <a> or <area> element, the document pointed to by <a href> is taken to be the transcript of the media. If the element with the given ID is not an <a> or <area> element, the element itself is taken to be the media's transcript.

Sample markup:

<video src=video.mp4 transcript="foo"></video>
<a href=transcript.html hreflang=en
   id=foo>English language transcript</a>

or, in the same-document case,

<video src=video.mp4 transcript="foo"></video>
<div id=foo>Transcript goes here</div>

It's a bit harder to author than the other options we've examined thus far.

It's easy to update existing content to use this markup pattern.

This design readily exposes the transcript link to users, which helps it work really well in UAs that do not support the <video> element, and also in UAs that support <video> but not transcript="".

You can only link to one transcript, so this design fails to explicitly handle multiple transcripts in multiple languages. That said, an author could point to a same-document element which contains links to each transcript, like so:

<video src=video.mp4 transcript="foo"></video>
…
<p id=foo>Transcripts available in
<a href=transcript.html hreflang=en>English</a> and
<a href=transcript.html hreflang=de>German</a>.
</p>

The programmatic association of the <video> with its transcript might not be maintained through a cross-document copy-and-paste operation.

This attribute is purpose-built for transcripts and transcripts only, so we can't reuse it in the future for non-transcript description needs.

2B - mint an indirect transcript attribute which takes many IDREFs

This is the same as option 2A, except we allow the transcript="" attribute to take a space-separated set of IDREFs.

Sample markup:

<video src=video.mp4 transcript="foo bar"></video>
<p>Transcripts are available in
<a href=foo.html hreflang=en
   id=foo>English</a>
and
<a href=bar.html hreflang=de
   id=bar>German</a>.

It's easy to update existing content to use this markup pattern.

This design readily exposes the transcript link to users, which helps it work really well in UAs that do not support the <video> element, and also in UAs that support <video> but not transcript="".

You can link to many transcripts, and you can use the existing hreflang="" attribute to hint to the UA about the language that each transcript is in.

The programmatic association of the <video> with its transcript might not be maintained through a cross-document copy-and-paste operation.

This attribute is purpose-built for transcripts and transcripts only, so we can't reuse it in the future for non-transcript description needs.

2C - aria-describedby

This is the same as option 2A, except instead of minting a new attribute, we use the aria-describedby="" attribute from WAI-ARIA.

Sample markup:

<video src=video.mp4 aria-describedby="foo"></video>
<a href=transcript.html hreflang=en
   id=foo>English language transcript</a>

or, in the same-document case,

<video src=video.mp4 aria-describedby="foo"></video>
<div id=foo>Transcript goes here</div>

Case 2C is very similar to case 2A. It suffers from the fact that there's no indication that the transcript is a transcript: it simply says that the video is described by the linked document. It would be difficult for UAs or ATs to expose the fact that specifically a transcript is available.


The only approach would be to use metadata markup through, e.g. microdata or RDFa.

Sample markup:

<div itemscope>
  <video src=video.mp4 aria-describedby="foo"></video>
  <a href=transcript.html hreflang=en itemprop="transcript" id=foo>English language transcript</a>
</div>

or, in the same-document case,

<div itemscope>
  <video src=video.mp4 aria-describedby="foo"></video>
  <div id=foo itemprop="transcript">Transcript goes here</div>
</div>

3A - mint a transcript-of attribute

This scenario is the same as in 2A, except instead of pointing from the media element to the transcript link, we point from the transcript link to the media element.

Sample markup:

<video src=video.mp4 id=the-video></video>
<a href=transcript.html hreflang=en
   transcript-of=the-video>English language transcript</a>

It's easy to update existing content to use this markup pattern.

This design readily exposes the transcript link to users, which helps it work really well in UAs that do not support the <video> element, and also in UAs that support <video> but not transcript="".

You can link to many transcripts, and you can use the existing hreflang="" attribute to hint to the UA about the language that each transcript is in.

It's as easy to author as the previous design, at least in the external document case. For same-document transcripts, however, this design introduces an extra indirection. Consider this example:

<video src=video.mp4 id=the-video></video>
…
<a href=#transcript hreflang=en
   transcript-of=the-video>The transcript is below.</a>
…
<div id=transcript>Here's the transcript.</div>

The programmatic association of the <video> with its transcript might not be maintained through a cross-document copy-and-paste operation.

This attribute is purpose-built for transcripts and transcripts only, so we can't reuse it in the future for non-transcript description needs.

3B - reuse the for and rel attributes

This variation is like 3A except, instead of minting a new attribute, we use the existing for="" attribute to point from the transcript link to the media element.

In order to indicate that the link is specifically a transcript for the given media element, we mint a new transcript value for the rel="" attribute.

Sample markup:

<video src=video.mp4 id=the-video></video>
<a rel=transcript href=transcript.html hreflang=en
   for=the-video>English language transcript</a>

This design has all of the advantages of the previous design, though it is slightly harder to author since it requires two attributes to be used in concert.

By changing the rel="" value, we can reuse this design in the future for linking to other, non-transcript descriptions of media elements.

4A - reuse the track element

Instead of linking to another element in the page, we could put the transcript information in the body of the media element. One way to do that would be to reuse the track element. Transcripts are untimed, not timed text, so this is abusing the semantics of <track>.

Sample markup:

<video src=video.mp4>
  <track kind=transcript src=transcript.html srclang=en
  label="English language transcript"></track>
</video>

It's easy to author, and to update existing content to use this markup pattern. We can link to external resources as well as another node in the current document.

You can link to many transcripts, and you can use the existing srclang="" attribute to hint to the UA about the language that each transcript is in.

This design does not lead to a good experience in UAs that do not support the <video> element, nor in UAs that support <video> but not transcript="".

This design does not readily expose the transcript link to users. Should the Web author want to expose the transcript link to her page's readers, she would have to duplicate the link in markup like so:

<video src=video.mp4>
  <track kind=transcript src=transcript.html srclang=en
  label="English language transcript"></track>
</video>
…
Watch the video or read the <a href=transcript.html>transcript</a>.

In such a scenario, the duplicate links may get out of sync as the page is maintained and the location of the transcript changes. It is likely that the visible link will be more up to date than the invisible link, thus harming AT users.

The programmatic association of the <video> with its transcript would be maintained through a cross-document copy-and-paste operation.

4B - mint a transcript element

We can improve upon the previous design by minting a new <transcript> element instead of abusing the semantics of <track>.

Sample markup:

<video src=video.mp4>
  <transcript src=transcript.html srclang=en
  label="English language transcript"></transcript>
</video>

This suffers from all of the problems 4A suffers from (except that it doesn't abuse <track>). Unlike 4A, this element is purpose-built for transcripts and transcripts only, so we can't reuse it in the future for non-transcript description needs.

4C - reuse the link element

Instead of abusing <track> or minting a new <transcript> element, we could reuse the existing <link> element (and rel="" like in 3B).

Sample markup:

<video src=video.mp4>
  <link rel=transcript href=transcript.html hreflang=en
  title="English language transcript">
</video>

This improves upon 4B in one way: by changing the rel="" value, we can reuse this design in the future for linking to other, non-transcript descriptions of media elements. This design suffers from all of the problems 4B suffers from.

4D - reuse the a element

Instead of reusing <link>, we could reuse an <a> or <area> element within the media element's fallback content.

Sample markup:

<video src=video.mp4>
  <a rel=transcript href=transcript.html
  hreflang=en>English language transcript</a>
</video>

This is very much like 4C, but it behaves better in UAs that do not support the <video> element (because the transcript link is part of the fallback content used in that case).

This design does not lead to a good experience in UAs that support <video> but not <a rel=transcript>.

5 - implicit association by document position

This design is similar to 3B, but instead of repurposing for="" to explicitly associate the transcript link with the media element, we could simply say that any <a> or <area> element in the paragraphs preceding or following the media element gets associated with the media element. We still use rel=transcript to flag the element as being a transcript.

Sample markup:

<video src=video.mp4></video>
<a rel=transcript href=transcript.html hreflang=en
   >English language transcript</a>

or

<a rel=transcript href=transcript.html hreflang=en
   >English language transcript</a>
<video src=video.mp4></video>

It's easy to update existing content to use this markup pattern.

This design readily exposes the transcript link to users, which helps it work really well in UAs that do not support the <video> element, and also in UAs that support <video> but not transcript="".

You can link to many transcripts, and you can use the existing hreflang="" attribute to hint to the UA about the language that each transcript is in.

For same-document transcripts, this design introduces an extra indirection. Consider this example:

<video src=video.mp4></video>
…
<a href=#transcript hreflang=en>The transcript is below.</a>
…
<div id=transcript>Here's the transcript.</div>

The programmatic association of the <video> with its transcript might not be maintained through a cross-document copy-and-paste operation.

By changing the rel="" value, we can reuse this design in the future for linking to other, non-transcript descriptions of media elements.

On a page with many media elements, this design introduces ambiguity about which transcript is associated with which media element. Consider this example:

<video src=video1.mp4></video>
<a rel=transcript href=foo.html hreflang=en
   >English language transcript foo</a>
<video src=video2.mp4></video>
<a rel=transcript href=bar.html hreflang=en
   >English language transcript bar</a>
<video src=video3.mp4></video>

Which video is foo.html a transcript of? Despite the appealing simplicity of this design, this ambiguity is a serious, maybe fatal, flaw.

6 - no programmatic association

Let's also consider the null case: we could explicitly choose to not add programmatic association of media elements with transcripts.

Sample markup:

<video src=video.mp4></video>
<a href=transcript.html hreflang=en
   >English language transcript</a>

or

<a rel=transcript href=transcript.html hreflang=en
   >English language transcript</a>
<video src=video.mp4></video>

The transcript isn't programattically associated with the media element, so UAs won't be able to expose the presence of the transcript in its media controls, in a context menu, or in some other way. That said, the document prose near the media element can make it clear to users that the link is a transcript for the media element.

It's easy to update existing content to use this markup pattern. In fact, in cases where authors publish transcripts for media elements, this is what they are already doing.

This design readily exposes the transcript link to users, which helps it work really well in UAs that do not support the <video> element.

You can link to many transcripts, and you can use the existing hreflang="" attribute to hint to the UA about the language that each transcript is in.

This design readily allows for same-document transcripts like so:

<video src=video.mp4></video>
…
<div id=transcript>
<h1>Transcript</h1>
…
</div>

References

Inline.