Full scratchpad of notes available at Issues and proposals for Guideline 1.2 - "media-equiv"
[or Provide alternatives for images and multimedia - depends on how we answer the question, "do we require that scripts and programmatic objects have alternatives?" if not, then we shouldn't say "non-text content" and limit this to images and multimedia.]
[How address talking heads? e.g., the US presedential debates. Which would they fall under?]
A magnifying glass icon is used to link to the search page of a Web site. The screen reader identifies the button as a link and speaks the text alternative, "Search."
[still using terms "short label" and "longer description" in rewrite of 1.1?]
A bar chart compares how many widgets were sold in June, July, and August. The short label says, "Figure one - Sales in June, July and August." The longer description identifies the type of chart or graph, provides a high-level summary of the data comparable to that available from the chart or graph, and provides the data in a table or other accessible format.
An animation illustrates tying a knot. The short label says, "How to
tie a square knot." The longer explanation describes the hand movements
needed to tie the knot.
[replace with examples 6 and 7 below]
The link to an audio file says, "Chairman's speech to the assembly." A link to a text transcript is provided immediately after the link to the audio clip.
The link to an audio file says, "Beethoven's 5th Symphony performed by the Vienna Philharmonic Orchestra."
Transcript of audio from the first few minutes of, "Teaching Evolution Case Studies, Bonnie Chen" (copyright WGBH and Clear Blue Sky Productions, Inc.)
Describer: A title, "Teaching Evolution Case Studies. Bonnie Chen." Now, a teacher shows photographs.
Bonnie Chen: These are all shot at either the Everglades...for today you just happen to be a species of wading bird that has a beak like this."
Describer: wooden tongue depressors
[will non-Americans know "wooden tongue depressors?"]
A video clip shows how to tie a knot. The captions read, "(music)
USING ROPE TO TIE KNOTS
WAS AN IMPORTANT SKILL
FOR THE LIKES OF SAILORS, SOLDIERS, AND WOODSMEN."
From Sample Transcript Formatting by Whit Anderson
An animation shows how a car engine works. There is no audio and the animation is part of a tutorial that describes how an engine works. All that is needed is a description of the image. From "How car engines work: Internal combustion" [Use this instead of Example 3?]
video-only: Need to clarify that it is not a Web page (to address issue 792)
audio description - Additional audio narration that explains important details that cannot be understood from the main soundtrack alone. During pauses in dialog, audio description provide information about actions, characters, scene changes and on-screen text to people who are blind or visually impaired.
.captions - A synchronized transcript of dialogue and important sound effects. Captions provide access to multimedia for people who are deaf or hard of hearing.
what exceptions/... for rebroadcast of TV signals.
JW Solution: if content is rebroadcast from another medium or resource that complies to broadcast requirements of accessibility (independent of these guidelines), the rebroadcast satisfies the checkpoint if complies with the other guidelines.
The proposed SC1 #7 addresses this, "If multimedia content is rebroadcast from another medium, the accessibility features[required by policy for that medium?] are intact."
We can debate which language is better. If this proposal is adopted, we can close this isssue.
Exception: if content is rebroadcast from another medium or resource that complies to broadcast requirements for accessibility (independent of these guidelines), the rebroadcast satisfies the checkpoint if it complies with the other guidelines.
JH - Not sure if I understand this right - what about news websites where they show clips from previous television broadcasts that were captioned on television but not on the web. For example, I would like to see CNN or ABC or FOX caption their web video clips.
Issue 668 and Issue 792 express similar confusion.
The proposed SC1 #7 breaks this into a separate success criterion instead of an exception of another. If this proposal is adopted, we can close this isssue.
Issue 981 and Issue 1084 are duplicate issues.
The proposal does not contain the phrase primarily non-vocal." If the proposal is accepted we can close this issue.
CKW: This checkpoint (1.2) is about synchronization of equivalents. Does it make sense to have a criteria about text equivalents that are not synchronized? Perhaps this fits best in Checkpoint 1.1?
The proposal combines 1.1 and 1.2 and does use the term "synchronized equivalents." If the proposal is accepted we can close this issue.
"if an audio-only or video-only presentation requires a user to respond interactively at specific times in the presentation, then a time-synchronized equivalent (audio, visual or text) presentation is provided."
Editorial Note: CKW: we weren't sure what this meant.
Propose that instead of "respond interactively at specific times in the presentation" we should address "applications" by refering to Guideline 4.2. Proposed wording, "Non-text content that is part of an application, such as video games that contain interactive animation, should follow Guideline 4.2" If this adopted, we can close this issue.
Olivier Thereaux writes:
1.2: seems a bit on the verge of technology independent. Would be nice if the requirements could be simplified and most of the existing wording could be moved to examples/techniques.
Believe that if the proposal is adopted it is less technology-independent and is hopefully simpler by moving some of the details to definitions and techniques, and assuming policy makers will clarify exceptions rather than documenting all of them here. Therefore, if adopted we can close this issue.
The U.S. Access Board writes:
This is almost identical to the similar section 508 requirement. However, the Board selected the term multimedia presentations instead of the term “media”. A single media, such as a video only event, would require text equivalents. Describing the output from a web cam is a difficult issue. We do agree with the WAI suggestions for handling web cam output (provide a link to equivalent information in text when possible). This is excellent advice. For example a traffic web cam could easily have links to text based traffic condition reports. Some may ask, why do people that can't see want traffic reports? The information can be very helpful when using taxis and when walking. It may be essential for the blind pedestrian to know ahead of time that one or more traffic signals in an area are not functioning.
Also, regarding captioning, clarification is needed on whether this requires music with lyrics to display the lyrics as captions on the screen?
Believe that the proposal addresses these questions:
However, clarification is still needed about music with lyrics. Is it a sensory experience or does it convey information? Clarification is needed before we can close this issue.
Greg Lowney writes:
8. Guideline 1.2 "an audio description is provided"
8.a. [MEDIUM PRIORITY] Is an audio description alone sufficient? It doesn't help a person who is deaf-blind, who would instead benefit from having a textual description of the visuals, which in turn could be reformatted into Braille or presented on a Braille display. If you decide not to require textual descriptions for any checkpoints, then I recommend at least making and explaining the explicit decision to do so.
Transcripts of audio is Level 1, collated transcripts are Level 3. If the proposal is adopted, believe we can close this issue.
James Craig asks: Guideline 1.2, Example 3... Why wouldn't a synchronized "descriptive audio" track be required for a silent animation?
From Andi Snow-Weaver at IBM: Example 3 is a different version of Example 3 under guideline 1.1.
Not sure that James' question is answered. Verification is needed before we can close this issue.
If we delete Example 3 in favor of Example 8, we should address IBM's issue.
There are many comments included in this issue. Will address each separately. These 5 comments are from IBM.
1. (previously # 2) Synchronized captions are provided for all significant dialogue and sounds (no need to specify "in time-dependent material". This guideline only applies to time-dependent material.)
2. (previously # 1) In audio-visual media, synchronized audio descriptions of visual events that are not evident from the audio are provided.
The 1st comment is that captions should come before audio descriptions. In the proposal, captions are Level 1 and audio descriptions are Level 2 - clearly putting captions before audio descriptions. The second comment is a rewording of the success criteria. Due to other issues, the criteria have been significantly reworded thus these suggestions are subsumed.
3. Remove. Points 1 and 2 above should specify "synchronized" because the guideline itself specifies "synchronized". The exception is not needed because if the content is audio only and not time sensitive, it does not meet the definition of time-dependent and this guideline does not apply to it.
"Synchronized captions" is redundant; per definition, captions are synchronized. Similarly with audio descriptions. The proposal deletes the exception - if adopted can close this issue.
4. What is the need for a specific success criteria addressing realtime video with audio? Success criteria above already require captions whether it is realtime or not. Are we trying to say audio descriptions for visual events are NOT required for real time video with audio? If so, then the audio description success criteria can be reworded as "In audio-visual media that is not realtime, synchronized audio descriptions of visual events that are not evident from the audio are provided."
Yes, we are not requiring audio descriptions of realtime multimedia. A level 2 criterion is proposed that says, "Audio descriptions are provided for multimedia." This proposal does require realtime captions of multimedia at level 2 (separate criterion), if adopted verify that can close the issue.
5. [If the Web content is real-time, non-interactive video (for example, a Webcam view of surrounding conditions such as weather information),] Either remove this success criteria or change the definition of time-dependent presentation in the glossary. Web content that is non-interactive video only does not meet the current definition of time-dependent presentation.
The proposal does not use the phrase "time-dependent." If the proposal is adopted, can close this issue.
These comments are from the Access Board.
L1 SC 3 [Exception: A text transcript or other non-audio equivalent does not need to be synchronized with the multimedia presentation if all four of the following...] It appears that this exception has been made unnecessarily complex.It might be easier to have an exception under a discussion of how to handle audio only webcasts. By definitions, an audio only presentation is a single not multimedia event. So it is simple to say multimedia presentations must be captioned. Audio presentations may be accompanied by a script unless the production is interactive, i.e. expecting the audience to react during the presentation.
Need to verify with the Access Board, but hope that the proposal addresses these comments and can be closed.
L1 SC 4 Exception:
If the content is a music program that is primarily non-vocal, then captions are not required.
Comment: This could be quite a source for confusion. If a performance is vocal, is it required that all the lyrics be captioned? This seems to be what is indicated
There still might be confusion if people feel that a music program is "intended to create a specific sensory experience" or if it "conveys information."
.US Access Board comments on L1 SC5:
Comment: By changing media to multimedia in the statement associated "guideline 1.2" the web cam issue becomes much simpler to handle as it is not a multimedia presentation. this means that you could move this discussion to the section addressing text equivalents for non-text elements.
Agreed. Believe this is addressed in the proposal, although instead of using "media" to refer to webcams, use "live video-only." Verify that the issues is addressed.
US Access Board comments on L1 SC6 [If a presentation that contains only audio or only video requires users to respond interactively...]:
Comment: This statement goes back to the issue of providing non-text equivalents for audio only presentations.
As far as providing audio equivalents for video only creates quite a conundrum as explained below.
First, if the presentation is multimedia, it is reasonable to require audio descriptions of non verbal content.
However, what is a video only presentation. Most web pages are video only presentations. The output from live images is already covered. If you take the requirement, as written, to its logical conclusion, all web pages that don't have imbedded speech must have audio output added to the page. This of course is not practical or desirable.
It seems that there is confusion here between developing guidelines for web pages and guidelines for television broadcasts. The major difference being of course that you generally can't attach a screen reader to a television.
"video-only" defintion should clarify that it does not imply that all Web pages are video-only presentations. Verify that this would close the issue.
US Access Board comments on final exception:
If content that is rebroadcast from another medium or resource meets accessibility requirements for that medium, then the rebroadcast satisfies this checkpoint if it complies with other applicable sections of WCAG 2.0
Comment: This is unclear as to what conditions are being addressed.
The proposal says, "If multimedia content is rebroadcast from another medium, the accessibility features [required by policy for that medium?] are intact" - does that clarify? If not, we could include more information in the General Techniques. Verify that this would close the issue.
IBM writes: Level 2 success criteria - The Level 1 success criteria already require synchronized captions for real-time audio with video. What is the additional requirement here? The editorial note does not seem to apply to anything at this level. The note is talking about audio descriptions but the success criteria is about captions.
The difference is that Level 1 requirements are for prerecorded multimedia while the level 2 are for live events. "prerecorded" and "live" modify uses of "multimedia." Verify that this would close the issue (if adopted).
Benefits - There is a Note in the informative section that has hidden success criteria in it. ("Where possible (especially for education and training materials), provide content so that it does not require tracking multiple simultaneous events with the same sense, or, give the user the ability to freeze the video so that captions can be read without missing the video.") This should either be removed or made into a Level 3 success criteria.
On the other hand, Issue 982 (Simultaneous reading and watching required) says:
This point  should be a Required Success Criteria. Captions that need to be read at the same time as watching action on the screen do not provide an equivalent user experience.
 the presentation does not require the user to read captions and the visual presentation simultaneously in order to understand the content.
Joe Clark writes in issue 1028, "Meanwhile, of course captions require you to read and watch at the same time. This isn’t telepathy."
The Benefits section is not addressed in this proposal. Both of these issues are open and need discussion.
Guideline 1.2 Provide synchronized media equivalents for time-dependent presentations.
Comment: Excellent except it might help to say multimedia as that term will help clarify issues of what should be covered as explained below.
Have used the term "multimedia" instead of "time-dependent presentations." If adopted, can close this issue.
An audio script is recommended here and <em>ease of access</em> to this script should be stressed.
Need to better understand what RNID is asking for. Is it that the transcript is easier to use or that it is easier to find or ? Need to follow-up - can't close at this time.
The note for the first required success criteria  highlights a difficulty but does not present a solution. We would like to see this document place more emphasis on content providers to think about how they can make live broadcast/time-dependent content more accessible to deaf and hard of hearing people. For example, in modern subtitling, computer programs are used where the stenographer simply has to press one button to print a particularly common phrase, such as a description of a common pattern of play in sports commentary. Such solutions should be encouraged in the Best Practice of this guideline, without necessarily making them a condition of conformance.
 When adding audio description to existing materials, the amount of information conveyed through audio description is constrained by the amount of space available in the existing audio track unless the audio/video program is periodically frozen to insert audio description. However, it is often impossible or inappropriate to freeze the audio/visual program to insert additional audio description.
A level 3 criterion is proposed that says "extended audio descriptions are provided for prerecorded multimedia." The comment addresses extended audio descriptions which are not widely used, yet. Determined it best to separate "audio descriptions" from "extended audio descriptions" since they use different techniques and technologies, plus one is more established than the other. Another approach would be to add something about extended audio descriptions to the definition of audio description.
However, the comment talks about real-time captions and those are handled in separate success criteria.
Verify that if the proposal is adopted this issue could be closed.
The definition of “media equivalents” given here is not sufficiently generic. No mention is made, for example, of sign language avatars (this definition is repeated in the Glossary).
The following points are overlooked by this checkpoint and we feel that it would be appropriate to add a reference to them in this “Best Practice” section:
1. Where subtitles are displayed, the designer should ensure sufficient contrast between foreground text and the background behind it (ideally, the user should be given the option to display a caption box behind the subtitles which has a colour that sufficiently contrasts the colour of the text).
2. A minimum size and recommended font for subtitles should be provided (the Royal National Institute of the Blind recommends a minimum of 16 point Helvetica or Arial font).
3. A minimum audio quality requirement should be specified for all audio description.
4. If a sign language interpreter is to be displayed on-screen, either as streamed video of a human interpreter or in the form of an avatar showing a virtual human, then the layout of the site should allow for this without the avatar window overlapping in such a way that essential functionality or information is being hidden. Based on RNID research, we would recommend that an on-screen interpreter should, at minimum, be displayed in the Common Intermediate Format (CIF) of 352x288 pixels and 25 frames per second.
The phrase, "media equivalents" is no longer used. The suggestions for "Best Practice" should be elaborated in the General Techniques or in multimedia-specific techniques (e.g., SMIL techniques). There may also be different recommendations if it is closed versus open captions. Note that Joe Clark has said to use any font except Arial (excluding Comic Sans - ala Matt May). Also note Joe's concerns about sign language
Propose closing the issue and moving the Best Practice suggestions to General Techniques issues.
From the June 2003 WD:
If the web content is real-time and audio-only and not time-sensitive and not interactive a transcript or other non-audio equivalent is sufficient. [...]
If the web content is real-time non-interactive video (e.g., a webcam of ambient conditions), either provide an equivalent... (e.g., an ongoing update of weather conditions) or link to an equivalent... (e.g., a link to a weather website).
Joe Clark writes:
This guideline concerns captioning of web multimedia. Its plain reading requires a transcript of all real-time audio broadcasts. That is, every single Internet radio station would require transcription.
Meanwhile, if you have any kind of webcam at all, you need to scrounge up some other site you can link to that is somehow the “equivalent” of the webcam’s image.
The proposal says that live audio-only and live video-only presentations only need a text alternative of some sort (a label or description). Verify that the proposal addresses this issue.
Joe Clark writes:
“Collating” a “script” of audio descriptions and captions has been done exactly once in recorded memory — for a demonstration project that is not documented online. It is impossible in practice due to the lack of interchange formats for captioning, audio description, and related fields. Nor are there that many examples of multimedia that are captioned and described. It’s even rare, when you consider the full range of TV programs and movies, to find both captioning and description in those well-established fields.
Live captioning is extraordinarily difficult for online media (nearly impossible using standards-compliant code), while live description has been attempted a mere five times in the broadcast sphere. Meanwhile, of course captions require you to read and watch at the same time. This isn’t telepathy.
WCAG 2.0 text from June 2003 draft:
1. a text document that merges all audio descriptions and captions into a collated script (that provides dialog, important sounds and important visual information in a single text document) is provided.
2.captions and audio descriptions are provided for all live broadcasts which provide the same information.
3.the presentation does not require the user to read captions and the visual presentation simultaneously in order to understand the content.
The proposal puts the collated transcript at Level 3. Understanding that it isn't common practice to provide a collated transcript, much less audio descriptions, it is the ideal solution for people who are deaf and blind.
Real-time captions for live broadcasts is a common enough practice that this is a Level 2. Real-time audio descriptions are included at Level 3 with a question since it is not clear how realistic this will be in the next 5 years (rule of thumb test...).
The last comment relates to the open issue 794 and the discrepancy between removing it, making it a Level 3 criterion, or making it Level 1.
Joe writes: Real-life experiences and examples are called for here.
In writing the proposal, I reviewed several live examples and wrote new examples (for the 1.2 guideline not 1.1) based on real-life content. If adopted, we can close this issue.
SC Level 1 would disallow video conferencing since few video providers have the capability to include real-time captions.
Video conferencing is an interactive application thus falls to Guideline 4.2. Verify that can close this issue.
SC Level 1, item 6: "If a presentation that contains only audio or only video requires users to respond interactively at specific times during the presentation, then a synchronized equivalent presentation (audio, visual or text) is provided. [I]"
Need to define "respond interactively".
Do not use the phrase "respond interactively" in the proposal. Propose replacing with, "Non-text content that is part of an application, such as video games that contain interactive animation, should follow Guideline 4.2". Verify that this closes the issue.
Joe Clark writes:
>Level 1 Success Criteria for Guideline 1.2
>1. Captions are provided for prerecorded multimedia. (Editorial note: Propose that we don't create exceptions, but that policy makers create exceptions. Refer to Telecom Act of 1996 which defines "broadcast hours" for which captions are required as well as a staggered time frame for requiring captions for programs that aired before 1 January 1998)
We'll need at least scoping requirements for cases of:
* very little or a whole lot of multimedia
* multimedia that will be posted only for a short time
* multimedia that is posted as an example or counterexample of accessible content (including learning examples)
* entire phase-in schedules
An example of such a phase-in is the Telecom act of 1996 that mandates the number of broadcast hours that need to be captioned. It increases to 100% by 1 Jan 2006. (2-6 a.m. is not included, thus 20 of 24 hours is 100%.) 30% of programs aired before 1 Jan 1998 must be captioned by 1 Jan 2003. 75% by 1 Jan 2008. Other examples of policy exist at: @@@ (link to summary)
Propose that WCAG 2.0 not attempt to create a phase-in schedule. Instead, we look at a scoping mechanism that would allow developers to exclude multimedia that hasn't been captioned or described and leave phase-in schedules to policy makers. However, there is a possibility that scoping could be used to ignore accessibility requirements and it doesn't make sense to me for someone to claim their site is accessible when it is not. We should stick to what we know: technology and only focus on creating technology requirements in WCAG 2.0. Leave the policy to policy makers. Until we have a scoping mechanism for conformance and several real-world examples showing how to use it, this issue remains open.
Joe Clark writes:
>2. Audio descriptions are provided for prerecorded multimedia.
>(Editorial note: Again, we shouldn't create policy. Policy makers should create it)
We can't get past the fact that some video programming (but not entire genres) does not require description. We can't issue a blanket guideline of that sort.
Attempted in this proposal to provide more guidance, but do not think we can make objective, testable statements about when to provide descriptions or not. Thus, have tried to strike a balance between too prescriptive and too abstract. Recent Working Group discuss supports this concept, but need to see scoping and policy recommendations proposals as well as worked examples before we can close this issue.
Joe Clark writes:
>3. Transcripts are provided for prerecorded audio-only content that contain dialog
>(Editorial note: this should not apply to music with lyrics. Should this be an exception or is it clear enough?)
And if it's a radio show with an interview segment, then a song, then another interview?
This is not explicitly called out in the proposal. Propose that examples in General techniques provide more guidance. Can't close this issue yet.
Joe Clark writes:
>4. A text alternative is provided for live audio-only content by following Guideline 1.1.
>(Editorial note: an internet radio stream would only need to provide a description of the intent/character of the station, *not* every song they play)
I suppose you mean dialogue-only audio. This essentially requires real-time captioning. Sometimes a post-facto transcript will do, but more importantly, even mightly WGBH can't provide standards-compliant ways of real-time-captioning audio. Where *do* the captions appear?
Don't make a requirement that is technically impossible to achieve right now.
This requires a text alternative which is *not* real-time captioning. It is a description or label.
Reworded in the current proposal to, "For live audio-only or live video-only entertainment, such as internet radio or webcams, text alternatives describe the purpose of the presentation or alternative real-time content is linked to, such as traffic reports for a traffic webcam; or, [clarify that real-time content does not imply real-time captions]". Verify that this clarifies and closes the issue.
Joe Clark writes:
>5. A text alternative is provided for live video-only content by following Guideline 1.1.
>(Editorial note: webcams would only need a text alternative associated with the concept that the cam is pointing at, *not* every image that is captured)
I think this is going to need a much better formulation. Aren't we requiring captioning and, in some cases, description? You can use <embed> in compliant methods, but <embed>, unlike <object>, does not include text equivalents.
And do you mean "equivalent" or "alternative"?
"Text alternative" is not requiring captions or audio descriptions, it is requiring a text description or label. We mean "alternative" not "equivalent" per recent WCAG WG discussions where we determined that "equivalents" aren't actually equivalent but an alternative. Verify that we can close this issue.
Joe Clark writes:
>Level 2 Success Criteria for Guideline 1.2
>1. Real-time captions are provided for live multimedia
We already have that above.
The proposed Level 1 criterion specific to prerecorded multimedia and a Level 2 for live. Verify that we can close this issue.
Joe Clark writes:
>Audio descriptions provide access to multimedia for people who are blind or visually impaired by adding narration that describes actions, characters, scene changes and on-screen text that can not be determined from listening only to the soundtrack. This narration is limited to pauses in dialog and provided in the spoken language of the audio.
Additional audio narration that explains important details that cannot be understood from the main soundtrack alone.
Appreciate the simple definition, but think it is important that we provide more details for audience - particularly people who are not familiar with accessibility. A revised definition is proposed that is simpler but still contains some of the other details that Joe proposed that we leave out. If proposal is adopted, we can close this issue.
Joe Clark writes:
>Captions provide access to multimedia for people who are deaf or hard of hearing by converting a program's dialogue, sound effects, and narration into words that appear on the screen. Captions are rendered in the written language of the audio.
Synchronized transcript of dialogue and important sound effects.
Adopted. Kept sentence about benefit to people who are deaf or hard of hearing. If proposal is adopted, we can close this issue.