10941 – <video> Media elements need control-independent "pause" for presenting lengthy descriptions/captions

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 10941 - <video> Media elements need control-independent "pause" for presenting lengthy descriptions/captions

Summary: <video> Media elements need control-independent "pause" for presenting length...

Status:	RESOLVED REMIND

Alias:	None

Product:	HTML WG
Classification:	Unclassified
Component:	HTML5 spec (show other bugs)
Version:	unspecified
Hardware:	All All

Importance:	P2 normal
Target Milestone:	---
Assignee:	This bug has no owner yet - up for the taking
QA Contact:	HTML WG Bugzilla archive list

URL:
Whiteboard:	suggested fix in comment 7
Keywords:	a11y, a11ytf, media

Depends on:
Blocks:

Reported:	2010-10-01 07:07 UTC by Masatomo Kobayashi
Modified:	2013-06-15 00:40 UTC (History)
CC List:	12 users (show)

See Also:

Attachments

Description Masatomo Kobayashi 2010-10-01 07:07:18 UTC

Section affected: http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#playing-the-media-resource

When closed captions, audio descriptions, sign languages are presented by associated media or timed text resources, the playback will often need to be automatically, temporarily paused to present lengthy descriptions/captions (e.g., extended descriptions). This "paused" state should have a different mode from the normal paused state so that clicking on the play/pause button does not cause unexpected behaviors.

That is, for example, the media.pausedInternally attribute and media.pauseInternally() method are needed in addition to the normal media.paused attribute and media.pause() method, where pauseInternally() pauses/resumes the playback and changes the value of the .pausedInternally attribute (only when .paused = false) but does not affect the value of the normal .paused attribute, i.e., the state of play/pause button.

Comment 1 Kornel Lesinski 2010-10-01 09:43:57 UTC

I think this could be done by setting media.playbackRate to 0.

Comment 2 Masatomo Kobayashi 2010-10-01 13:01:19 UTC

I suppose that the playback rate of associated tracks is synchronized with that of the media resource. If so, setting media.playbackRate to 0 will also pause descriptions/captions.

Comment 3 Silvia Pfeiffer 2010-10-01 14:20:26 UTC

The pauseonexit event takes care of this, see http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#timed-track-cue-pause-on-exit-flag

Comment 4 Philip Jägenstedt 2010-10-02 14:13:10 UTC

I suppose what you want is for the playback to pause if the voice synthesis hasn't finished, and then continue.

Comment 5 Masatomo Kobayashi 2010-10-04 04:43:08 UTC

Yes, that is one thing, and extended audio descriptions, where the playback is paused all the while presenting an description, is another thing.
For both cases, pause-on-exit seems to work with some modifications to the process of handling playback position changes.

Comment 6 Philip Jägenstedt 2010-10-04 13:45:47 UTC

(In reply to comment #5)
> Yes, that is one thing, and extended audio descriptions, where the playback is
> paused all the while presenting an description, is another thing.
> For both cases, pause-on-exit seems to work with some modifications to the
> process of handling playback position changes.

What you want is for playback to pause conditionally, if the voice synthesis hasn't finished, and the continue automatically when voice synthesis has finished. Since pause-on-exit always pauses and never unpauses, it's not a good fit. Anyway, the requirements are pretty clear, but it's less clear how to integrate it with voice synthesis, since that's unlikely to be made part of the browser. Are there standard voice synth interfaces on the major desktop platforms that can tell the application when it is done synthing a particular piece of text?

Comment 7 Ian 'Hixie' Hickson 2010-10-05 00:29:14 UTC

There are two cases here:

1. A browser that supports audio descriptions natively, and needs to stop playback of the video and primary audio tracks while a secondary audio description track plays back content.

I'm not familiar with any content like this (I've rarely seen audio descriptions on real content at all, but the few times I've seen it, e.g. on Digital TV in the UK, the video was never stopped in this fashion), so it's hard to comment on this, but if there are user agents that want to implement this, I could make the spec handle it by changing this requirement:

"When a media element is potentially playing and its Document is a fully active Document, its current playback position must increase monotonically at playbackRate units of media time per unit time of wall clock time."

...to cover this case, e.g. by adding " and is not in a description-pause state" after "potentially playing", and then define "description-pause state" as being for the aforementioned case.

Are there user agents interested in implementing this? (Are there any media formats that support audio tracks having content with non-zero length to be played at an instant in time on the primary timeline?)

2. A future time where we have multiple media elements all synchronised on one timeline, coordinated by some object that can start and stop individual tracks.

If this is the case being discussed here, then we should make sure to take this use case into account when designing the controller API, but it is probably premature to make changes to the spec at this point for that case, since there's no way to sync tracks so far.

Which case are we talking about here? Or is it a third case I haven't considered?

Comment 8 Masatomo Kobayashi 2010-10-06 11:39:02 UTC

(In reply to comment #6)
> Are there standard voice synth interfaces on the major desktop
> platforms that can tell the application when it is done synthing a particular
> piece of text?

I'm not familiar with other platforms, but Microsoft Speech API provides a mechanism to signal the application when speech is done.

Comment 9 Masatomo Kobayashi 2010-10-06 11:48:52 UTC

(In reply to comment #7)
> Which case are we talking about here? Or is it a third case I haven't
> considered?

At least for now, I'm talking about the first case. A good example movie with extended descriptions (provided by WGBH) is found at http://web.mac.com/eric.carlson/w3c/NCAM/extended-audio.html . The background of this technology is introduced at http://www.w3.org/WAI/PF/HTML/wiki/Media_Accessibility_Requirements#Extended_video_descriptions .

SMIL is known as means to provide this type of descriptions ( http://www.w3.org/TR/WCAG-TECHS/SM1 and http://www.w3.org/TR/WCAG-TECHS/SM2 ), but SMIL seems to be too much for this purpose. So we are testing a simpler way to present extended descriptions (using an "extended" flag) in our prototype auditory browser ( http://www.eclipse.org/actf/downloads/tools/aiBrowser/ ).

My original intention to post this bug was that APIs like media.pauseInternally() will allow to provide extended descriptions using JavaScript even if they are not natively supported by browsers. However, of course it will be better if extended descriptions are natively supported.

Comment 10 Silvia Pfeiffer 2010-10-11 02:00:06 UTC

(In reply to comment #7)
> There are two cases here:
> 
> 1. A browser that supports audio descriptions natively, and needs to stop
> playback of the video and primary audio tracks while a secondary audio
> description track plays back content.
> 
> I'm not familiar with any content like this (I've rarely seen audio
> descriptions on real content at all, but the few times I've seen it, e.g. on
> Digital TV in the UK, the video was never stopped in this fashion), so it's
> hard to comment on this, but if there are user agents that want to implement
> this, I could make the spec handle it by changing this requirement:
> 
> "When a media element is potentially playing and its Document is a fully active
> Document, its current playback position must increase monotonically at
> playbackRate units of media time per unit time of wall clock time."
> 
> ...to cover this case, e.g. by adding " and is not in a description-pause
> state" after "potentially playing", and then define "description-pause state"
> as being for the aforementioned case.
> 
> Are there user agents interested in implementing this? (Are there any media
> formats that support audio tracks having content with non-zero length to be
> played at an instant in time on the primary timeline?)

I think it's this case for now, where we particularly focus on text provided through WebSRT or an in-band text track. Thus, it is possible to provide this in existing containers and the pausing of playback would need to be managed by media players in collaboration with the screen reader.

I think user agents will solve the caption case first before tending to audio descriptions, and extended audio descriptions only after that. Extended audio descriptions seems to require close interaction between the screen reader and the video player - not sure this is something that has been required before for accessibility APIs.


> 2. A future time where we have multiple media elements all synchronised on one
> timeline, coordinated by some object that can start and stop individual tracks.
> 
> If this is the case being discussed here, then we should make sure to take this
> use case into account when designing the controller API, but it is probably
> premature to make changes to the spec at this point for that case, since
> there's no way to sync tracks so far.

If we focus not only on text, but on an audio description provided as audio, it becomes indeed more complex again. I think the example at http://web.mac.com/eric.carlson/w3c/NCAM/extended-audio.html shows a media resource where the main audio/video has been authored to pause as long as necessary for the recorded audio description cues to finish. However, it is actually merged with the main audio track, which I believe is the only way to realize it in current container formats.

When synchronizing multiple media elements, we can make this dynamic again and should probably introduce a controller API for it then, which is related to Bug 9452. But you are right - it's a bit premature for that.

Comment 11 Silvia Pfeiffer 2010-10-11 02:08:31 UTC

One more thing that I just realised: sometimes it is very obvious that a description cue will not fit into a given time frame - in particular if there isn't actually any time available to put it in and it is basically associated with a time point rather than a segment.

In this case, it would be possible to record into the WebSRT file the requested pause duration for the screenreader for the cue and have the player pause for that duration as it reaches the cue. This solution is based on heuristics as to how long an average screenreader will take to finish reading a cue and can turn out rather inaccurate, but it allows solving the situation without having to wait for an event from the screenreader to stop / continue playback.

Actually, this is the way in which I will be implementing this functionality in a JavaScript demo for now, which of course means introducing a non-standard marker into WebSRT. Gah, damned custom extensions... :-(

Comment 12 Silvia Pfeiffer 2010-10-11 02:42:50 UTC

(In reply to comment #11)
> One more thing that I just realised: sometimes it is very obvious that a
> description cue will not fit into a given time frame - in particular if there
> isn't actually any time available to put it in and it is basically associated
> with a time point rather than a segment.
> 
> In this case, it would be possible to record into the WebSRT file the requested
> pause duration for the screenreader for the cue and have the player pause for
> that duration as it reaches the cue. This solution is based on heuristics as to
> how long an average screenreader will take to finish reading a cue and can turn
> out rather inaccurate, but it allows solving the situation without having to
> wait for an event from the screenreader to stop / continue playback.
> 
> Actually, this is the way in which I will be implementing this functionality in
> a JavaScript demo for now, which of course means introducing a non-standard
> marker into WebSRT. Gah, damned custom extensions... :-(

This raises an interesting option: we could add a parameter to the pauseonexit event that tells it to pause for how long. Then, such a parameter might actually make sense to include in WebSRT.

Comment 13 Ian 'Hixie' Hickson 2010-10-14 06:48:24 UTC

EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Partially Accepted
Change Description: none yet
Rationale:

Without actual implementation experience, it's hard to know if the suggestion in comment 7 would actually solve the problem.

I agree that we will need to address this. My recommendation would be to have implementations of audio descriptions experiment with this to find what the best user experience is. Once we have a clear solution, then we can update the specification accordingly.

We don't want to specify something prematurely, because doing so would risk disenfranchising the very people we are trying to help, by requiring suboptimal behaviour.

I've marked this REMIND so that I can check on it regularly to see if any real-world experience has been obtained. Feel free to reopen this if a browser vendor implements <video> with audio descriptions and has found a solution to this problem (either now or in the future).

Comment 14 Michael Cooper 2010-10-26 15:27:25 UTC

Bug triage sub-team thinks this is a A11Y TF priority. Assigning to Silvia to continue work on this. Needs to be synchronized with the other media work in the media sub-group.

Comment 15 Silvia Pfeiffer 2013-06-06 07:02:18 UTC

Update with some new information.

There are several approaches possible here:

1.
The Web Speech API make speech synthesis an integral part of Web browsers [1]. That API (once implemented) can synthesize <track>s of @kind=descriptions , thus JS developers can implement support for description tracks, including pausing the video and restarting when the voicing is finished.

Once we have some experience implementing description support in this way, we may include this functionality in browsers and add any additional features that this requires, such as events to raise and states to report.

2.
As for how to do this with audio-only descriptions and video, I'd like to see some implemented examples in JS first before we even attempt to make the browser auto-pause video when audio files are past certain times etc.

3.
I can imagine an implementation with a <video> element and a WebVTT track that has speech phrases instead of text in the cues (e.g. in a data-uri). That would essentially lead back to the same need as in 1. where pausing and resume is determined by the duration of a cue and the time it takes to finish the audio.

[1] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html#tts-section

Comment 16 James Craig 2013-06-15 00:40:41 UTC

Will an admin create a new IndieUI keyword and add it to this bug?

This is related to: IndieUI-ACTION-16
https://www.w3.org/WAI/IndieUI/track/actions/16