24416 – Ambiguous support for native in-band captioning.

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 24416 - Ambiguous support for native in-band captioning.

Summary: Ambiguous support for native in-band captioning.

Status:	RESOLVED WORKSFORME

Alias:	None

Product:	HTML WG
Classification:	Unclassified
Component:	HTML5 spec (show other bugs)
Version:	unspecified
Hardware:	Other other

Importance:	P2 major
Target Milestone:	---
Assignee:	Silvia Pfeiffer
QA Contact:	HTML WG Bugzilla archive list

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2014-01-27 23:39 UTC by Graham
Modified:	2014-06-01 04:08 UTC (History)
CC List:	5 users (show)

See Also:

Attachments

Description Graham 2014-01-27 23:39:58 UTC

Sometimes native captioning is supported by a decoder for regulatory reasons and cannot always physically be made available to the user agent and/or application. In addition native inband captions like CEA-708 are often highly optimized and converting these to a User Agent format like WebVTT can cause performance and captioner intent loss. Also, important cue information like endtime might only be available with the start of the next caption and trick play operation or continuous live streaming can corrupt cue timing accuracy compared to native captioning.

The application or user agent would therefore benefit from making use of optimized native captioning. The controls feature offers one way to do this at the user agent level however from JS only the TextTrack mode='showing' is provided. Yet, setting mode='showing' has other meanings, i.e. the captions that are stored in cues are being displayed, some of which may already have been created or modified out-of-band. In addition no knowledge of the native capability of the platform is being registered to the application in order to change this behavior.

We need a way for the user agent to convey to JS support for native captions and for the application to have a method to turn them on or off.

Also, if the native cues can be passed to application level in 'hidden' mode how can we convey that timing accuracy is not guaranteed but the data can be used by the application for other non-critical timing sensitive uses (like caption content search)? In this case, what type of 'hidden' TextTrackCue would we use..dataCue?

Comment 1 Silvia Pfeiffer 2014-01-27 23:55:22 UTC

Please register one issue per bug.

(In reply to Graham from comment #0)
> Sometimes native captioning is supported by a decoder for regulatory reasons
> and cannot always physically be made available to the user agent and/or
> application. In addition native inband captions like CEA-708 are often
> highly optimized and converting these to a User Agent format like WebVTT can
> cause performance and captioner intent loss. Also, important cue information
> like endtime might only be available with the start of the next caption and
> trick play operation or continuous live streaming can corrupt cue timing
> accuracy compared to native captioning. 

Live captioning features are still under discussion in other bugs.

> The application or user agent would therefore benefit from making use of
> optimized native captioning.

What is "optimized native captioning"?

> The controls feature offers one way to do this
> at the user agent level however from JS only the TextTrack mode='showing' is
> provided. Yet, setting mode='showing' has other meanings, i.e. the captions
> that are stored in cues are being displayed, some of which may already have
> been created or modified out-of-band. In addition no knowledge of the native
> capability of the platform is being registered to the application in order
> to change this behavior. 
> 
> We need a way for the user agent to convey to JS support for native captions
> and for the application to have a method to turn them on or off. 

If none of the browser-provided captioning functionality works for you (WebVTT, the <track> element, TextTracks, inband tracks, etc) you are always free to run your own captioning feature fully through JavaScript.


Assuming that "display native captioning" means that some other application on the platform has caption support, then there is no such thing as "a browser displaying native captions" - unless you use the <embed> tag, which doesn't make use of <track> and TextTrack features.

> Also, if the native cues can be passed to application level in 'hidden' mode
> how can we convey that timing accuracy is not guaranteed but the data can be
> used by the application for other non-critical timing sensitive uses (like
> caption content search)? In this case, what type of 'hidden' TextTrackCue
> would we use..dataCue?

You can use DataCue to expose any inband timed data, yes. But the browser won't magically expose it - you have to get browsers to implement it.

Comment 2 Graham 2014-01-28 00:41:53 UTC

(In reply to Silvia Pfeiffer from comment #1)

> 
> What is "optimized native captioning"?
> 

As described in the first paragraph, optimized native captions relieves the application from having to handle the captions or the user agent from having to convert them, both of which can suffer from performance loss and captioner's intent.


> If none of the browser-provided captioning functionality works for you
> (WebVTT, the <track> element, TextTracks, inband tracks, etc) you are always
> free to run your own captioning feature fully through JavaScript.
>

Are you suggesting to run a captioning feature fully through JavaScript without creating an API to control or test this feature? How would an application know that a user agent can support closed captions in this native fashion without this?
 
> 
> Assuming that "display native captioning" means that some other application
> on the platform has caption support, then there is no such thing as "a
> browser displaying native captions" - unless you use the <embed> tag, which
> doesn't make use of <track> and TextTrack features.
> 
> > Also, if the native cues can be passed to application level in 'hidden' mode
> > how can we convey that timing accuracy is not guaranteed but the data can be
> > used by the application for other non-critical timing sensitive uses (like
> > caption content search)? In this case, what type of 'hidden' TextTrackCue
> > would we use..dataCue?
> 
> You can use DataCue to expose any inband timed data, yes. But the browser
> won't magically expose it - you have to get browsers to implement it.

This is being raised from a browser implementer, TV manufacturer and TV standards perspective as opposed to an web application developer perspective. We need a standard way of allowing in-band captions to be handled natively that doesn't conflict with the existing specification and allows the application to assess that these captions are being handled even though they are not cued.

Comment 3 Silvia Pfeiffer 2014-01-28 00:49:25 UTC

(In reply to Graham from comment #2)
> (In reply to Silvia Pfeiffer from comment #1)
> 
> > 
> > What is "optimized native captioning"?
> > 
> 
> As described in the first paragraph, optimized native captions relieves the
> application from having to handle the captions or the user agent from having
> to convert them, both of which can suffer from performance loss and
> captioner's intent.

So you're just passing the captions through to the browser as parts of the video pixels? That means: the browser doesn't know you have captions and doesn't expose them to JavaScript either.


> > If none of the browser-provided captioning functionality works for you
> > (WebVTT, the <track> element, TextTracks, inband tracks, etc) you are always
> > free to run your own captioning feature fully through JavaScript.
> 
> Are you suggesting to run a captioning feature fully through JavaScript

sure.

> without creating an API to control or test this feature?

You would do everything in JavaScript, i.e. you also control your test suite.

> How would an
> application know that a user agent can support closed captions in this
> native fashion without this?

The User Agent (i.e. the browser) knows nothing about your caption support. It will be all part of your Web application. Thus, if you application (written in JavaScript) supports closed caption rendering, then it will also know that it does so.


> > Assuming that "display native captioning" means that some other application
> > on the platform has caption support, then there is no such thing as "a
> > browser displaying native captions" - unless you use the <embed> tag, which
> > doesn't make use of <track> and TextTrack features.
> > 
> > > Also, if the native cues can be passed to application level in 'hidden' mode
> > > how can we convey that timing accuracy is not guaranteed but the data can be
> > > used by the application for other non-critical timing sensitive uses (like
> > > caption content search)? In this case, what type of 'hidden' TextTrackCue
> > > would we use..dataCue?
> > 
> > You can use DataCue to expose any inband timed data, yes. But the browser
> > won't magically expose it - you have to get browsers to implement it.
> 
> This is being raised from a browser implementer, TV manufacturer and TV
> standards perspective as opposed to an web application developer
> perspective. We need a standard way of allowing in-band captions to be
> handled natively that doesn't conflict with the existing specification and
> allows the application to assess that these captions are being handled even
> though they are not cued.

Why do you need to expose anything to JavaScript then?

Comment 4 Graham 2014-01-28 22:57:21 UTC

(In reply to Silvia Pfeiffer from comment #3)
> (In reply to Graham from comment #2)
> > (In reply to Silvia Pfeiffer from comment #1)
> > 
> > > 
> > > What is "optimized native captioning"?
> > > 
> > 
> > As described in the first paragraph, optimized native captions relieves the
> > application from having to handle the captions or the user agent from having
> > to convert them, both of which can suffer from performance loss and
> > captioner's intent.
> 
> So you're just passing the captions through to the browser as parts of the
> video pixels? That means: the browser doesn't know you have captions and
> doesn't expose them to JavaScript either.

Well the application needs to know that captions are available in the stream so that it can turn them on or else resort to out-of-band method to provide captioning. In addition the application needs to know when the captions that are available in the stream but are not presentable as cues so don't rely upon 'hidden'. 

I will spell it out with an example. An MPEG-2 TS video stream has 708 captions inband from which the user agent  creates a TextTrack with kind caption. The captions however cannot be presented as cues to the user agent so the application needs to rely upon the native capability and not use mode='hidden'.One way would be for a new 'kind' called 'caption-native-only'. Here, setting the mode to 'showing' turns them on and setting the mode to 'hidden' or 'disabled' turns them off.  
Another device has native caption support and can provide TextTrack cues transcoded as VTTCue. Setting the mode to 'showing' results in the device deciding whether to use the native stream caption decoder capability or to use the generated VTTCue captions. In 'hidden' the VTTCues are still generated for the application to either for their own caption rendering library or other application purposes.

Comment 5 Silvia Pfeiffer 2014-02-16 01:22:43 UTC

(In reply to Graham from comment #2)
> (In reply to Silvia Pfeiffer from comment #1)
> 
> > 
> > What is "optimized native captioning"?
> > 
> 
> As described in the first paragraph, optimized native captions relieves the
> application from having to handle the captions or the user agent from having
> to convert them, both of which can suffer from performance loss and
> captioner's intent.


I don't know of any user agent that converts a caption format. Also, I am not aware of any user agents that make use of native caption rendering. Can you give me some concrete examples?


> > If none of the browser-provided captioning functionality works for you
> > (WebVTT, the <track> element, TextTracks, inband tracks, etc) you are always
> > free to run your own captioning feature fully through JavaScript.
> >
> 
> Are you suggesting to run a captioning feature fully through JavaScript
> without creating an API to control or test this feature?

Yes. Many existing video players on the Web provide this, e.g. JWPlayer (http://www.longtailvideo.com/support/jw-player/28845/adding-video-captions/) or Sublime (http://docs.sublimevideo.net/subtitles) or video.js (http://blog.videojs.com/post/35883671470/version-3-2-update) and many proprietary video players.


> How would an
> application know that a user agent can support closed captions in this
> native fashion without this?

What "application" are we talking about? The Web page is the application and thus knows what it is using and displaying and makes use of the JS functionality. Also, this is not "native", but JS provided.


> This is being raised from a browser implementer, TV manufacturer and TV
> standards perspective as opposed to an web application developer
> perspective. We need a standard way of allowing in-band captions to be
> handled natively that doesn't conflict with the existing specification and
> allows the application to assess that these captions are being handled even
> though they are not cued.

Right. The spec is written in such a way that in-band captions are handled exactly the same way as captions provided through the <track> element. There is no conflict. These captions are cued the same way as captions coming through the <track> element.

The browser will parse the media file, extract the captions with their start and end time, add them to a TextTrack object and - if the track is activated by the user - display them.

Comment 6 Silvia Pfeiffer 2014-05-19 08:50:34 UTC

Re-reading this request, I think I now understand what you mean.

You would like browsers to be able to take a media file that has a caption track and render the media file with the captions without exposing these captions through a TextTrack API. You'd still like to give JS some control over turning such a caption track on or off.

Here's how you (or rather: such a UA) would do it.

In the HTMLMediaElement, expose two video tracks in the @videoTracks attribute: one with @kind="main" and one with @kind="captions". To the JS dev and also to the Web page user such a video looks like a video with burnt-in captions. Therefore it makes sense to provide that choice in the user interface.

Now the UA in the controls interface, as well as the JS dev in their track handling can build track selection that includes this information. Seeing as only one video track is ever @selected, that should work for you.

Is that sufficient?

I could explain this use case in the next in-band track spec [1] if you like.


[1] http://rawgit.com/silviapfeiffer/HTMLSourcingInbandTracks/master/index.html

Comment 7 Silvia Pfeiffer 2014-06-01 04:08:13 UTC

EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are
satisfied with this response, please change the state of this bug to CLOSED. If
you have additional information and would like the Editor to reconsider, please
reopen this bug. If you would like to escalate the issue to the full HTML
Working Group, please add the TrackerRequest keyword to this bug, and suggest
title and text for the Tracker Issue; or you may create a Tracker Issue
yourself, if you are able to do so. For more details, see this document:

   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Rejected
Rationale:
Your use case was added to http://rawgit.com/w3c/HTMLSourcingInbandTracks/master/index.html , which is in the process of becoming a W3C spec. The HTML spec already has all the necessary functionality to support your use case.