28511 – [WebVTT] Captions on the audio element

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 28511 - [WebVTT] Captions on the audio element

Summary: [WebVTT] Captions on the audio element

Status:	RESOLVED FIXED

Alias:	None

Product:	TextTracks CG
Classification:	Unclassified
Component:	WebVTT (show other bugs)
Version:	unspecified
Hardware:	PC All

Importance:	P2 normal
Target Milestone:	---
Assignee:	This bug has no owner yet - up for the taking
QA Contact:	Web Media Text Tracks CG

URL:
Whiteboard:	widereview
Keywords:

Depends on:
Blocks:

Reported:	2015-04-19 02:54 UTC by Silvia Pfeiffer
Modified:	2018-02-03 14:36 UTC (History)
CC List:	4 users (show)

See Also:

Attachments

Description Silvia Pfeiffer 2015-04-19 02:54:15 UTC

Feedback from the HTML Accessibility Task force on WebVTT as per http://lists.w3.org/Archives/Public/public-tt/2015Apr/0049.html

We believe this section misrepresents the situation. It is incorrect to
tell the user "There is nothing to render." When captions are provided
for audio content there is indeed something to render, even if the
available user agent is incapable of rendering it.

Comment 1 Silvia Pfeiffer 2015-04-19 03:13:18 UTC

This refers to the following sentence in the spec:
"If the media element is an audio element, or is another playback mechanism with no rendering area, abort these steps. There is nothing to render."

The way in which the HTML specification deals with audio files that requires rendered captions is to add the audio resource to a <video> element and then add the tracks there. This has been the way in which the HTML specification has been written and is not something that the WebVTT specification can change.

So, the solution to the need for rendered captions on an audio *resource* is to use a <video> *element*. The rest of the markup is identical between <audio> and <video> element.

Comment 2 David Singer 2015-04-20 14:15:37 UTC

suggest that while there is something to render, there is nowhere to render it (hence, abort).  if captioning of audio is desired, the media element needs a rendering area, and hence needs to be <video>

Comment 3 John Foliot 2015-04-20 14:26:25 UTC

(In reply to David Singer from comment #2)
> suggest that while there is something to render, there is nowhere to render
> it (hence, abort).  if captioning of audio is desired, the media element
> needs a rendering area, and hence needs to be <video>

Hi David,

If one were to write < audio controls> (as opposed to just < audio>), then the user-agent would render something - it would render controls. I must disagree that we must ask authors to use < video> when they have an audio track that also provides captions for the end user, as it is both counter-intuitive for authoring, and factually incorrect as well. We should ensure that the code we ask for matches what the majority of authors would produce natively.

Reopening until we find consensus, which may be that this section (which is/could-be as much about the user-agent and rendering captions in *any* format) be removed from a spec about a time-stamp format.

Comment 4 David Singer 2015-04-20 16:40:34 UTC

(In reply to John Foliot from comment #3)
> (In reply to David Singer from comment #2)
> > suggest that while there is something to render, there is nowhere to render
> > it (hence, abort).  if captioning of audio is desired, the media element
> > needs a rendering area, and hence needs to be <video>
> 
> Hi David,
> 
> If one were to write < audio controls> (as opposed to just < audio>), then
> the user-agent would render something - it would render controls. 

There is still no content rendering area.  All I am saying is that it is an authoring error to have captions available and use the <audio> element, as they cannot be rendered.  The sentence needs adjusting from "There is nothing to render." to "There is no visual area in which to render captions."  And then we probably need a note to say that providing captions and not supplying a content rendering area is probably an authoring mistake.

Comment 5 Silvia Pfeiffer 2015-06-07 04:48:53 UTC

How about "There is no display area into which to render and thus nothing to do for this algorithm."

I've prepared https://github.com/w3c/webvtt/pull/191

Comment 6 John Foliot 2015-06-07 18:32:01 UTC

(In reply to Silvia Pfeiffer from comment #5)
> How about "There is no display area into which to render and thus nothing to
> do for this algorithm."

Hi Silvia,

I think the real issue is the earlier part of the statement: "If the <a>media element</a> is an <a><code>audio</code></a> element, or is another playback mechanism with no rendering area..."

...which presumes/suggests that an audio element would never have a rendered playback region. 

It does, or at least it could, especially if/when the content author is also providing scripted controls* to interact with - those controls need to be outputted to a rendering region of sorts as well - so assuming that a display region would never be present is a bit of a stretch to my mind. 

[* use case/examples: http://designshack.net/wp-content/uploads/featured-html5-audio-player-ui.png, https://youtu.be/yEQcHEfJKmQ?list=PLBsCKuJJu1paAkH0V0pHcrFvZxRFIPIaG]

And while "captions" are generally thought of as visual assets, need they be? What of deaf/blind users? Access to textual equivalents is a critical requirement for that user-group, regardless of the originating media type. BONUS: Captions can also provide a powerful search capability, allowing users and search engines to search the caption text to locate a specific video or an exact point in a video. (ref: http://w3c.github.io/pfwg/media-accessibility-reqs/#captioning)

For these reasons, spec text that suggests that <audio> + captions are a non-starter are both factually and practically incorrect, and we should encourage rather than (by omission) discourage their creation and production**. 

I am in agreement with the second half - your proposed clarification of the IF/THEN statement.

Thoughts?

[** use-case: captions/transcript of an interview from an archived radio news show - http://www.npr.org/api/transcript.php] 


> 
> I've prepared https://github.com/w3c/webvtt/pull/191

Comment 7 John Foliot 2015-06-07 18:34:55 UTC

(In reply to John Foliot from comment #6)
> (In reply to Silvia Pfeiffer from comment #5)
>
> BONUS: Captions can also provide a powerful search capability, allowing
> users and search engines to search the caption text to locate a specific
> video or an exact point in a video. (ref:
> http://w3c.github.io/pfwg/media-accessibility-reqs/#captioning)

s/video/media asset

:-)

Comment 8 Silvia Pfeiffer 2015-06-08 01:58:44 UTC

Hi John,

(In reply to John Foliot from comment #6)
> I think the real issue is the earlier part of the statement: "If the
> <a>media element</a> is an <a><code>audio</code></a> element, or is another
> playback mechanism with no rendering area..."
> 
> ...which presumes/suggests that an audio element would never have a rendered
> playback region.

That is exactly how HTML specifies an audio element: it is an element without a visual rendering region for audio or audio-related content and it will never have a rendering region because it's about handling the audio samples. It may have controls, but they are not a visual rendering region.

If you want to display captions on a audio *resource*, don't use a audio *element* - use a video element. Or have a Web developer create such rendering by hand.

If you have an issue with that, the WebVTT spec is the wrong place to change it.


> And while "captions" are generally thought of as visual assets, need they
> be? What of deaf/blind users? Access to textual equivalents is a critical
> requirement for that user-group, regardless of the originating media type.

Correct, but this section is specifically about visual rendering nad only for native browser rendering.

What we could do is rename the "Rendering" title of that section into "Native Browser Rendering" or something similar.


> For these reasons, spec text that suggests that <audio> + captions are a
> non-starter are both factually and practically incorrect, and we should
> encourage rather than (by omission) discourage their creation and
> production**.

This is a file format specification and the section under discussion an algorithm for visual rendering. It is not an authoring guide and there is nothing in the spec that implies what you are reading into it. It is merely a technically accurate algorithmic description.

Comment 9 Philip Jägenstedt 2015-06-08 14:11:31 UTC

(In reply to Silvia Pfeiffer from comment #8)
> What we could do is rename the "Rendering" title of that section into
> "Native Browser Rendering" or something similar.

I don't think we should do that, unless we also have a "Rendering for things aren't native browsers" we'd leave rendering undefined for some implementations. The goal should be to get as close to the same rendering across all implementations as possible.

Comment 10 David Singer 2015-06-09 16:30:11 UTC

I really don't think that an audio element has a content rendering region. But maybe we can be clear, and say explicitly that captions on an <audio> element may still be valuable (e.g. if they are used for searching, indexing, or made available to the user through some other modality), and that IF an audio stream needs visual captioning, then it should be placed in a <video> element and given a content rendering area?

Otherwise we'll get continued confusion, I fear.

Comment 11 Silvia Pfeiffer 2015-06-09 19:13:32 UTC

(In reply to Philip Jägenstedt from comment #9)
> (In reply to Silvia Pfeiffer from comment #8)
> > What we could do is rename the "Rendering" title of that section into
> > "Native Browser Rendering" or something similar.
> 
> I don't think we should do that, unless we also have a "Rendering for things
> aren't native browsers" we'd leave rendering undefined for some
> implementations. The goal should be to get as close to the same rendering
> across all implementations as possible.

There are actually several things at play here:

Firstly: all non-browser media players can't really follow the "Rendering" section, since they don't do CSS boxes. So, rendering has always been undefined for such implementations. We're better off actually adding something explicit about doing equivalent rendering or so.

I've updated the pull request with two notes that should explain both this and the problem that John identified. See whether they work for you.


As a further problem, I don't actually think that our rendering section fully deals with all kinds of cues that we need to be dealing with, specifically chapters and descriptions. I've registered bug 28783 to deal with that.

Comment 12 Silvia Pfeiffer 2015-06-16 11:30:36 UTC

Fixed with two new paragraphs, see pull request.

Comment 13 John Foliot 2015-06-16 14:53:38 UTC

(In reply to Silvia Pfeiffer from comment #12)
> Fixed with two new paragraphs, see pull request.

Thanks (I think) Silvia. URL for the pull request?

Comment 14 Silvia Pfeiffer 2015-06-16 21:21:56 UTC

(In reply to John Foliot from comment #13)
> (In reply to Silvia Pfeiffer from comment #12)
> > Fixed with two new paragraphs, see pull request.
> 
> Thanks (I think) Silvia. URL for the pull request?

Unchanged. See above.

Comment 15 John Foliot 2018-02-03 14:36:29 UTC

APA Response: Thank you.