User:JFoliot/Issue194 Recap

From HTML WG Wiki
Jump to: navigation, search

ISSUE-194: Provide a mechanism for associating a full transcript with an audio or video element.

ISSUE-194: Provide a mechanism for associating a full transcript with an audio or video element


Full transcripts give people with disabilities a way to access audio/video content. Transcripts are often provided as a separate resource, because they're often too lengthy to be included on the same page as the audio/video they're associated with.

A mechanism that creates an association between an audio/video element and a full (off page) transcript would have many benefits. These include discoverability for assistive technology users, programmatic identification for search engine indexing, design aesthetic, and content syndication or embedding.

The Requirements

R1: Discoverability - the end user (sighted or otherwise) can discover that there is a transcript available; machines (AT, search engines, syndication) can discover that there is a transcript available.

R2: Choice to consume - the option to consume or not consume the transcript remains in the control of the user.

R3: Rich text transcripts - transcripts should be able to support richer content than flat text, including WebVTT files, HTML, RTF, Daisy or other formats.

R4: Design aesthetics - the transcript display needs to be stylable for design aesthetic, including the possibility to include it in the video controls.

R5: Embeddable - the transcript needs to be embeddable, i.e. given as a separate resource, but rendered full-text on-page.

R6: Fullscreen support - the transcript needs to be able to go fullscreen with the media element

R7: Retrofitting - it should be easy for authors who are already publishing content with transcripts to retrofit their existing pages.

R8: No link duplication - transcript link duplication should be avoided.

R9: Multiple transcripts - transcripts may be available in different languages - making multiple links available should be possible.

R10: Stand alone transcripts - transcripts need to be available even in browsers that do not support or do not render audio or video elements. In fact, it should be possible to render transcripts without requiring a media element be present on the same page.

  • One significant concern that surfaced during our discussions was the 'orphaning' of the link to the transcript, which is a real possibility when copying and pasting a <video> element that only links to an IDREF (which might be to a link to the Transcript).

Documented Problems (Excerpted from ISSUE-194/NoChange)

Some of the most fundamental unresolved issues about picking a solution are as follows:

Problem 1: single or multiple links

It is unclear as yet whether we need a solution with a single link or with multiple links.

A single link makes it possible to easily expose it to AT (accessibility technology such as a screenreader). For example, a screen reader could announce "Transcript available, hit CTRL+ENTER to view" if there is a single link. Such an announcement is almost impossible with multiple links.

A single link can also easily be included with a visual indicator in the video element. For example, there could be a little icon overlayed on the video in the top left corner which is visible when the video is in focus or moused-over. This could be included into the shadow DOM and thus could be styled by publishers if they disagree with this default rendering. Such a visual indicator is not possible with multiple links - or would require introduction of a menu of links. This would almost certainly require inclusion into the video controls as a menu similar to subtitle tracks.

Finally, the support of multiple links may not be necessary at all, since it is always possible to provide a single link that goes to a HTML page that contains all the alternative transcripts. A nice design of such a page would load the alternative text via JavaScript into that one page rather than linking off into another set of resources, thus essentially providing for a solution with multiple different transcripts behind a single link.

While the authors of this change proposal may not agree on whether multiple links or a single link should be associated with the video, we do agree that the discussion on this topic has not been held yet and we need more time to have this discussion.

Problem 2: difference to longdesc

It is unclear as yet whether there is a fundamental difference between a long description for a video and a transcript.

The use of a long description is as a supplement to a short description for accessibility users. As such, it is meant to provide a complete description of the resource under consideration for the advantage of a vision-impaired user.

The best possible long description that we can provide for a video is a "transcript" - where the transcript is meant to include a textual transcription of all the words being said (i.e. a dialog transcript) and all the locations and action happening (i.e. a scene description).

What this means is that for an accessibility user, the one long text description that is of interest is the transcript. If we had a set of links that programmatically linked different types of transcripts (and other long descriptions) to the video, the only one that the accessibility user should look at is the one that is most inclusive and is thus the transcript. Thus, if there was a way to both link a transcript and a long description to a video, and we had a transcript available, that transcript would be linked in the long description and the transcript link. If we didn't have a transcript, but a different long description document available, the transcript link would be empty and the long description link would have the link. Therefore, the transcript link does not provide any additional information and is therefore redundant.

There is, however, a semantic difference between different types of transcripts and long descriptions and other text documents that are regarded as text alternatives for video. If we have such a set of different documents, they should be exposed to all users underneath the video in a section that should be linked through @aria-describedby. Here, the question is whether that is sufficient or whether we need any additional means for programmatically linking multiple text alternatives to video. Is there indeed a use case for associating semantic labels like "transcript" or "script" or "longdesc" or ... with individual links to related text documents beyond what @aria-describedby and microdata provide?

While the authors of this change proposal may not agree on whether long descriptions and transcripts need to be separately programmatically associated with the video or not, we do agree that the discussion on this topic has not been held yet and we need more time to have this discussion. See the post to the Accessibility Task Force mailing list, which has had barely any replies yet.

In addition, any solution that is provided for the long description problem for images may or may not be appropriate for the use cases required for a transcript. Since a replacement for @longdesc is under discussion for, the transcript problem should be resolved in, too.

Problem 3: the visual presentation need

It is unclear as yet how a transcript link would be visually exposed in the browser.

This is particularly true for some of the options that were analysed, while others have a clear visual presentation underneath the video yet still ask for visual exposure in the video player (possibly the controls) so the availability of a transcript is discoverable in fullscreen video, too.

HTML5 does not prescribe visual presentation of attributes and elements. However, the lack of a generally accepted way for how to present it visually has been the key cause of the failing of the @longdesc attribute. We do not want to repeat this exercise.

Therefore, unless browsers will take a step towards showing how they will visually present the availability of transcripts and that they are committed to doing so, e.g. by showing experimental branches with such a feature, success of transcript links is questionable.

The authors of this change proposal therefore agree that experiments should be done before any specifications are made in this space.

The use cases (excerpted from ChangeProposal/Issue194_SP)

In the Media Accessibility User Requirements document we have actually already identified existing use cases:

[T-1] Support the provisioning of a full text transcript for the media asset in a separate but linked resource, where the linkage is programatically accessible to AT.

[T-2] Support the provisioning of both scrolling and static display of a full text transcript with the media resource, e.g., in a area next to the video or underneath the video, which is also AT accessible.

More detailed, what we encounter in the wild are the following four real-world uses of transcripts:

[UC1] Interactive transcripts: Publishers that have a timed transcript (e.g. captions) provide an interactive transcript next to/underneath their videos. Well-known examples here are YouTube and TED (e.g. TED video, NY Times) and several other video player providers offer them (3ply etc).

[UC2] Linked Transcripts: Publishers that don't have timed transcripts often replace them with links to non-timed transcripts because WCAG2 Success Criteria 1.2.1 explicitly mentions this as a solution (e.g. Centerlink AU Gov site, US State Department). These are often not even in HTML.

[UC3] On-page Transcripts: Sometimes we also see non-timed transcripts published underneath the video on-page (e.g. ESL Videos, Fox News).

[UC4] Transcript-only pages: Sometimes we even have Websites that only publish the transcript of an event without actual video or audio recordings. (e.g. Gates Jobs at D5, White House).

Change Proposals

Introduction of a <transcript> element
( (11 June 2012)

Author: Media Subgroup of HTML Accessibility Task Force (In particular: Silvia Pfeiffer (Google), John Foliot, Janina Sajka, Charles McCathieNevile)

This is a proposal to address the need for video and audio transcripts through introduction of a <transcript> element and a @transcript attribute on HTML5 media elements. It is based on an analysis of use cases for video transcripts and proposes a machine-discoverable, unified approach to realizing them with a dedicated rendering area for the transcript / transcript link.

Obsoletes the following Change Proposals:

This CP remains:

Mint a transcript attribute for the programmatic association of transcripts with media elements
( (12 July 2012)

In order to programmatically associate media elements with transcripts, we should use a transcript="" attribute which may take zero or more IDREFs to elements elsewhere in the document.


From an accessibility perspective we cannot go past Last Call with a known defect, and we've stated since the summer of 2010 that a requirement for a programmatic linkage of the transcript to the media elements was required:

"[T-1] Support the provisioning of a full text transcript for the media asset in a separate but linked resource, where the linkage is programmatically accessible to AT."

In our working draft Checklist (established at the same time as the user-requirements document), we further mapped this requirement to a current WCAG 2 "A" level requirement (along with an RFC 2119 "SHOULD"):

"1.2.1 Prerecorded Video-only: Either an alternative for time-based media or an audio track is provided that presents equivalent information for prerecorded video-only content."

Straw Poll

This questionnaire was open from 2012-07-11 to 2012-07-26. Results of Questionnaire ISSUE-194: How to provide a mechanism for associating a full transcript with an audio or video element? - Straw Poll for Objections

Now What?