<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>28511</bug_id>
          
          <creation_ts>2015-04-19 02:54:15 +0000</creation_ts>
          <short_desc>[WebVTT] Captions on the audio element</short_desc>
          <delta_ts>2018-02-03 14:36:29 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>TextTracks CG</product>
          <component>WebVTT</component>
          <version>unspecified</version>
          <rep_platform>PC</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard>widereview</status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Silvia Pfeiffer">silviapfeiffer1</reporter>
          <assigned_to name="This bug has no owner yet - up for the taking">dave.null</assigned_to>
          <cc>john.foliot</cc>
    
    <cc>philipj</cc>
    
    <cc>silviapfeiffer1</cc>
    
    <cc>singer</cc>
          
          <qa_contact name="Web Media Text Tracks CG">public-texttracks</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>119657</commentid>
    <comment_count>0</comment_count>
    <who name="Silvia Pfeiffer">silviapfeiffer1</who>
    <bug_when>2015-04-19 02:54:15 +0000</bug_when>
    <thetext>Feedback from the HTML Accessibility Task force on WebVTT as per http://lists.w3.org/Archives/Public/public-tt/2015Apr/0049.html

We believe this section misrepresents the situation. It is incorrect to
tell the user &quot;There is nothing to render.&quot; When captions are provided
for audio content there is indeed something to render, even if the
available user agent is incapable of rendering it.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>119658</commentid>
    <comment_count>1</comment_count>
    <who name="Silvia Pfeiffer">silviapfeiffer1</who>
    <bug_when>2015-04-19 03:13:18 +0000</bug_when>
    <thetext>This refers to the following sentence in the spec:
&quot;If the media element is an audio element, or is another playback mechanism with no rendering area, abort these steps. There is nothing to render.&quot;

The way in which the HTML specification deals with audio files that requires rendered captions is to add the audio resource to a &lt;video&gt; element and then add the tracks there. This has been the way in which the HTML specification has been written and is not something that the WebVTT specification can change.

So, the solution to the need for rendered captions on an audio *resource* is to use a &lt;video&gt; *element*. The rest of the markup is identical between &lt;audio&gt; and &lt;video&gt; element.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>119694</commentid>
    <comment_count>2</comment_count>
    <who name="David Singer">singer</who>
    <bug_when>2015-04-20 14:15:37 +0000</bug_when>
    <thetext>suggest that while there is something to render, there is nowhere to render it (hence, abort).  if captioning of audio is desired, the media element needs a rendering area, and hence needs to be &lt;video&gt;</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>119695</commentid>
    <comment_count>3</comment_count>
    <who name="John Foliot">john.foliot</who>
    <bug_when>2015-04-20 14:26:25 +0000</bug_when>
    <thetext>(In reply to David Singer from comment #2)
&gt; suggest that while there is something to render, there is nowhere to render
&gt; it (hence, abort).  if captioning of audio is desired, the media element
&gt; needs a rendering area, and hence needs to be &lt;video&gt;

Hi David,

If one were to write &lt; audio controls&gt; (as opposed to just &lt; audio&gt;), then the user-agent would render something - it would render controls. I must disagree that we must ask authors to use &lt; video&gt; when they have an audio track that also provides captions for the end user, as it is both counter-intuitive for authoring, and factually incorrect as well. We should ensure that the code we ask for matches what the majority of authors would produce natively.

Reopening until we find consensus, which may be that this section (which is/could-be as much about the user-agent and rendering captions in *any* format) be removed from a spec about a time-stamp format.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>119696</commentid>
    <comment_count>4</comment_count>
    <who name="David Singer">singer</who>
    <bug_when>2015-04-20 16:40:34 +0000</bug_when>
    <thetext>(In reply to John Foliot from comment #3)
&gt; (In reply to David Singer from comment #2)
&gt; &gt; suggest that while there is something to render, there is nowhere to render
&gt; &gt; it (hence, abort).  if captioning of audio is desired, the media element
&gt; &gt; needs a rendering area, and hence needs to be &lt;video&gt;
&gt; 
&gt; Hi David,
&gt; 
&gt; If one were to write &lt; audio controls&gt; (as opposed to just &lt; audio&gt;), then
&gt; the user-agent would render something - it would render controls. 

There is still no content rendering area.  All I am saying is that it is an authoring error to have captions available and use the &lt;audio&gt; element, as they cannot be rendered.  The sentence needs adjusting from &quot;There is nothing to render.&quot; to &quot;There is no visual area in which to render captions.&quot;  And then we probably need a note to say that providing captions and not supplying a content rendering area is probably an authoring mistake.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>120768</commentid>
    <comment_count>5</comment_count>
    <who name="Silvia Pfeiffer">silviapfeiffer1</who>
    <bug_when>2015-06-07 04:48:53 +0000</bug_when>
    <thetext>How about &quot;There is no display area into which to render and thus nothing to do for this algorithm.&quot;

I&apos;ve prepared https://github.com/w3c/webvtt/pull/191</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>120788</commentid>
    <comment_count>6</comment_count>
    <who name="John Foliot">john.foliot</who>
    <bug_when>2015-06-07 18:32:01 +0000</bug_when>
    <thetext>(In reply to Silvia Pfeiffer from comment #5)
&gt; How about &quot;There is no display area into which to render and thus nothing to
&gt; do for this algorithm.&quot;

Hi Silvia,

I think the real issue is the earlier part of the statement: &quot;If the &lt;a&gt;media element&lt;/a&gt; is an &lt;a&gt;&lt;code&gt;audio&lt;/code&gt;&lt;/a&gt; element, or is another playback mechanism with no rendering area...&quot;

...which presumes/suggests that an audio element would never have a rendered playback region. 

It does, or at least it could, especially if/when the content author is also providing scripted controls* to interact with - those controls need to be outputted to a rendering region of sorts as well - so assuming that a display region would never be present is a bit of a stretch to my mind. 

[* use case/examples: http://designshack.net/wp-content/uploads/featured-html5-audio-player-ui.png, https://youtu.be/yEQcHEfJKmQ?list=PLBsCKuJJu1paAkH0V0pHcrFvZxRFIPIaG]

And while &quot;captions&quot; are generally thought of as visual assets, need they be? What of deaf/blind users? Access to textual equivalents is a critical requirement for that user-group, regardless of the originating media type. BONUS: Captions can also provide a powerful search capability, allowing users and search engines to search the caption text to locate a specific video or an exact point in a video. (ref: http://w3c.github.io/pfwg/media-accessibility-reqs/#captioning)

For these reasons, spec text that suggests that &lt;audio&gt; + captions are a non-starter are both factually and practically incorrect, and we should encourage rather than (by omission) discourage their creation and production**. 

I am in agreement with the second half - your proposed clarification of the IF/THEN statement.

Thoughts?

[** use-case: captions/transcript of an interview from an archived radio news show - http://www.npr.org/api/transcript.php] 


&gt; 
&gt; I&apos;ve prepared https://github.com/w3c/webvtt/pull/191</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>120789</commentid>
    <comment_count>7</comment_count>
    <who name="John Foliot">john.foliot</who>
    <bug_when>2015-06-07 18:34:55 +0000</bug_when>
    <thetext>(In reply to John Foliot from comment #6)
&gt; (In reply to Silvia Pfeiffer from comment #5)
&gt;
&gt; BONUS: Captions can also provide a powerful search capability, allowing
&gt; users and search engines to search the caption text to locate a specific
&gt; video or an exact point in a video. (ref:
&gt; http://w3c.github.io/pfwg/media-accessibility-reqs/#captioning)

s/video/media asset

:-)</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>120792</commentid>
    <comment_count>8</comment_count>
    <who name="Silvia Pfeiffer">silviapfeiffer1</who>
    <bug_when>2015-06-08 01:58:44 +0000</bug_when>
    <thetext>Hi John,

(In reply to John Foliot from comment #6)
&gt; I think the real issue is the earlier part of the statement: &quot;If the
&gt; &lt;a&gt;media element&lt;/a&gt; is an &lt;a&gt;&lt;code&gt;audio&lt;/code&gt;&lt;/a&gt; element, or is another
&gt; playback mechanism with no rendering area...&quot;
&gt; 
&gt; ...which presumes/suggests that an audio element would never have a rendered
&gt; playback region.

That is exactly how HTML specifies an audio element: it is an element without a visual rendering region for audio or audio-related content and it will never have a rendering region because it&apos;s about handling the audio samples. It may have controls, but they are not a visual rendering region.

If you want to display captions on a audio *resource*, don&apos;t use a audio *element* - use a video element. Or have a Web developer create such rendering by hand.

If you have an issue with that, the WebVTT spec is the wrong place to change it.


&gt; And while &quot;captions&quot; are generally thought of as visual assets, need they
&gt; be? What of deaf/blind users? Access to textual equivalents is a critical
&gt; requirement for that user-group, regardless of the originating media type.

Correct, but this section is specifically about visual rendering nad only for native browser rendering.

What we could do is rename the &quot;Rendering&quot; title of that section into &quot;Native Browser Rendering&quot; or something similar.


&gt; For these reasons, spec text that suggests that &lt;audio&gt; + captions are a
&gt; non-starter are both factually and practically incorrect, and we should
&gt; encourage rather than (by omission) discourage their creation and
&gt; production**.

This is a file format specification and the section under discussion an algorithm for visual rendering. It is not an authoring guide and there is nothing in the spec that implies what you are reading into it. It is merely a technically accurate algorithmic description.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>120825</commentid>
    <comment_count>9</comment_count>
    <who name="Philip Jägenstedt">philipj</who>
    <bug_when>2015-06-08 14:11:31 +0000</bug_when>
    <thetext>(In reply to Silvia Pfeiffer from comment #8)
&gt; What we could do is rename the &quot;Rendering&quot; title of that section into
&gt; &quot;Native Browser Rendering&quot; or something similar.

I don&apos;t think we should do that, unless we also have a &quot;Rendering for things aren&apos;t native browsers&quot; we&apos;d leave rendering undefined for some implementations. The goal should be to get as close to the same rendering across all implementations as possible.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>120850</commentid>
    <comment_count>10</comment_count>
    <who name="David Singer">singer</who>
    <bug_when>2015-06-09 16:30:11 +0000</bug_when>
    <thetext>I really don&apos;t think that an audio element has a content rendering region. But maybe we can be clear, and say explicitly that captions on an &lt;audio&gt; element may still be valuable (e.g. if they are used for searching, indexing, or made available to the user through some other modality), and that IF an audio stream needs visual captioning, then it should be placed in a &lt;video&gt; element and given a content rendering area?

Otherwise we&apos;ll get continued confusion, I fear.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>120854</commentid>
    <comment_count>11</comment_count>
    <who name="Silvia Pfeiffer">silviapfeiffer1</who>
    <bug_when>2015-06-09 19:13:32 +0000</bug_when>
    <thetext>(In reply to Philip Jägenstedt from comment #9)
&gt; (In reply to Silvia Pfeiffer from comment #8)
&gt; &gt; What we could do is rename the &quot;Rendering&quot; title of that section into
&gt; &gt; &quot;Native Browser Rendering&quot; or something similar.
&gt; 
&gt; I don&apos;t think we should do that, unless we also have a &quot;Rendering for things
&gt; aren&apos;t native browsers&quot; we&apos;d leave rendering undefined for some
&gt; implementations. The goal should be to get as close to the same rendering
&gt; across all implementations as possible.

There are actually several things at play here:

Firstly: all non-browser media players can&apos;t really follow the &quot;Rendering&quot; section, since they don&apos;t do CSS boxes. So, rendering has always been undefined for such implementations. We&apos;re better off actually adding something explicit about doing equivalent rendering or so.

I&apos;ve updated the pull request with two notes that should explain both this and the problem that John identified. See whether they work for you.


As a further problem, I don&apos;t actually think that our rendering section fully deals with all kinds of cues that we need to be dealing with, specifically chapters and descriptions. I&apos;ve registered bug 28783 to deal with that.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>121073</commentid>
    <comment_count>12</comment_count>
    <who name="Silvia Pfeiffer">silviapfeiffer1</who>
    <bug_when>2015-06-16 11:30:36 +0000</bug_when>
    <thetext>Fixed with two new paragraphs, see pull request.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>121099</commentid>
    <comment_count>13</comment_count>
    <who name="John Foliot">john.foliot</who>
    <bug_when>2015-06-16 14:53:38 +0000</bug_when>
    <thetext>(In reply to Silvia Pfeiffer from comment #12)
&gt; Fixed with two new paragraphs, see pull request.

Thanks (I think) Silvia. URL for the pull request?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>121138</commentid>
    <comment_count>14</comment_count>
    <who name="Silvia Pfeiffer">silviapfeiffer1</who>
    <bug_when>2015-06-16 21:21:56 +0000</bug_when>
    <thetext>(In reply to John Foliot from comment #13)
&gt; (In reply to Silvia Pfeiffer from comment #12)
&gt; &gt; Fixed with two new paragraphs, see pull request.
&gt; 
&gt; Thanks (I think) Silvia. URL for the pull request?

Unchanged. See above.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>129072</commentid>
    <comment_count>15</comment_count>
    <who name="John Foliot">john.foliot</who>
    <bug_when>2018-02-03 14:36:29 +0000</bug_when>
    <thetext>APA Response: Thank you.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>