23113 – Deal with mode for TextTrackCues that are not VTTCues

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 23113 - Deal with mode for TextTrackCues that are not VTTCues

Summary: Deal with mode for TextTrackCues that are not VTTCues

Status:	RESOLVED WORKSFORME

Alias:	None

Product:	HTML WG
Classification:	Unclassified
Component:	HTML5 spec (show other bugs)
Version:	unspecified
Hardware:	PC All

Importance:	P2 editorial
Target Milestone:	---
Assignee:	Silvia Pfeiffer
QA Contact:	HTML WG Bugzilla archive list

URL:
Whiteboard:
Keywords:	a11y

Depends on:	21851
Blocks:
	Show dependency tree / graph

Reported:	2013-08-31 06:41 UTC by Silvia Pfeiffer
Modified:	2015-06-22 05:38 UTC (History)
CC List:	9 users (show)

See Also:

Attachments

Description Silvia Pfeiffer 2013-08-31 06:41:53 UTC

Reported bu Graham Clift in bug 21851#c35:

In order to address the issue where a textTrackCue object may or may not be rendered by the user agent I would like to propose the following:

The definition of the mode attribute on the TextTrack interface is modified such that any attempts to change the mode to 'showing' fails unless there there is a cueFormatHint(as per http://www.w3.org/html/wg/wiki/User:Spfeiffe/TextTrackChange#.283 prop (3) ) 
present that is recognized by the User Agent as a TextTrack with renderable TextTrackCues based on this hint. 

In this way the application can detect whether to attempt CC display of texttracks  unrecognized by the user agent by utilizing a JS method.

Comment 1 Philip Jägenstedt 2013-09-04 14:00:58 UTC

Isn't inBandMetadataTrackDispatchType/cueFormatHint specifically a signal for JavaScript rendering, something that the UA will expose but ignore itself?

In any case, would it not also solve the problem to allow the spec that defines the interface (like VTTCue) also define whether or not setting mode='showing' is possible?

Comment 2 Graham 2013-09-11 18:27:25 UTC

I think the mode='showing' approach is really a band-aid that helps but doesn't solve all the problems. So maybe the starting point here is to define some of the problems that I believe we need to solve:

Problem1: App needs to know whether the UA is already showing these captions by itself. Currently the app can set mode='showing' and not know the result in the UA.

Problem2: App needs to know whether the UA is capable of showing these captions if requested to turn them on. There is no way interrogate the UA capabilities in order to, say,allow for loading a captioning display library in advance. 

Problem3: App needs to know whether the platform supports native captioning in one type and and whether it is capable of and whether it does convert these to another format when presenting to JS. For example: A UA might be able to leverage the platform's native display of CEA-708 captions when the video is presented fullscreen but not in partial screen mode. Support for captioning in the partial screen mode might be done by converting captions to WebVTT or TTML cues in order to make use of the UA's native captioning ability.

Comment 3 Philip Jägenstedt 2013-09-12 08:29:09 UTC

(In reply to Graham from comment #2)
> I think the mode='showing' approach is really a band-aid that helps but
> doesn't solve all the problems. So maybe the starting point here is to
> define some of the problems that I believe we need to solve:
> 
> Problem1: App needs to know whether the UA is already showing these captions
> by itself. Currently the app can set mode='showing' and not know the result
> in the UA.
> 
> Problem2: App needs to know whether the UA is capable of showing these
> captions if requested to turn them on. There is no way interrogate the UA
> capabilities in order to, say,allow for loading a captioning display library
> in advance. 

I don't think either of these problem will appear in current implementations, since everyone who exposes VTTCue will also render it when mode=='showing'. If we just avoid adding interfaces which some browsers will render and some won't, then just looking at the interface will be enough.

> Problem3: App needs to know whether the platform supports native captioning
> in one type and and whether it is capable of and whether it does convert
> these to another format when presenting to JS. For example: A UA might be
> able to leverage the platform's native display of CEA-708 captions when the
> video is presented fullscreen but not in partial screen mode. Support for
> captioning in the partial screen mode might be done by converting captions
> to WebVTT or TTML cues in order to make use of the UA's native captioning
> ability.

The fullscreen API just makes a website occupy the full screen but is still just rendering as usual, so it seems unlikely that the ability to render captions would be affected by it. As for querying which kinds of captions are supported, just looking for the existence of its cue interface seems sufficient, so e.g. if window.TTMLCue exist then the browser knows how to parse and render TTML.

Comment 4 Graham 2013-09-12 16:12:46 UTC

(In reply to Philip Jägenstedt from comment #3)
> (In reply to Graham from comment #2)
> > I think the mode='showing' approach is really a band-aid that helps but
> > doesn't solve all the problems. So maybe the starting point here is to
> > define some of the problems that I believe we need to solve:
> > 
> > Problem1: App needs to know whether the UA is already showing these captions
> > by itself. Currently the app can set mode='showing' and not know the result
> > in the UA.
> > 
> > Problem2: App needs to know whether the UA is capable of showing these
> > captions if requested to turn them on. There is no way interrogate the UA
> > capabilities in order to, say,allow for loading a captioning display library
> > in advance. 
> 
> I don't think either of these problem will appear in current
> implementations, since everyone who exposes VTTCue will also render it when
> mode=='showing'. If we just avoid adding interfaces which some browsers will
> render and some won't, then just looking at the interface will be enough.
>
So are you suggesting then that if the UA cannot render the inband captions then it should not expose them to the application? That would prevent the scenario where a JavaScript library handles the raw caption data when a UA cannot handle the format. 
 
> > Problem3: App needs to know whether the platform supports native captioning
> > in one type and and whether it is capable of and whether it does convert
> > these to another format when presenting to JS. For example: A UA might be
> > able to leverage the platform's native display of CEA-708 captions when the
> > video is presented fullscreen but not in partial screen mode. Support for
> > captioning in the partial screen mode might be done by converting captions
> > to WebVTT or TTML cues in order to make use of the UA's native captioning
> > ability.
> 
> The fullscreen API just makes a website occupy the full screen but is still
> just rendering as usual, so it seems unlikely that the ability to render
> captions would be affected by it. As for querying which kinds of captions
> are supported, just looking for the existence of its cue interface seems
> sufficient, so e.g. if window.TTMLCue exist then the browser knows how to
> parse and render TTML.
(Just be be clear, I gave the fullscreen API as an example of the problem only.) In my experience some UA's do treat rendering video fullscreen uniquely to allow for higher performing native video pipeline processing capabilities of the device. In these cases higher performing native captioning can be leveraged also.

Comment 5 Philip Jägenstedt 2013-09-13 07:07:03 UTC

(In reply to Graham from comment #4)
> (In reply to Philip Jägenstedt from comment #3)
> > (In reply to Graham from comment #2)
> > > I think the mode='showing' approach is really a band-aid that helps but
> > > doesn't solve all the problems. So maybe the starting point here is to
> > > define some of the problems that I believe we need to solve:
> > > 
> > > Problem1: App needs to know whether the UA is already showing these captions
> > > by itself. Currently the app can set mode='showing' and not know the result
> > > in the UA.
> > > 
> > > Problem2: App needs to know whether the UA is capable of showing these
> > > captions if requested to turn them on. There is no way interrogate the UA
> > > capabilities in order to, say,allow for loading a captioning display library
> > > in advance. 
> > 
> > I don't think either of these problem will appear in current
> > implementations, since everyone who exposes VTTCue will also render it when
> > mode=='showing'. If we just avoid adding interfaces which some browsers will
> > render and some won't, then just looking at the interface will be enough.
> >
> So are you suggesting then that if the UA cannot render the inband captions
> then it should not expose them to the application? That would prevent the
> scenario where a JavaScript library handles the raw caption data when a UA
> cannot handle the format. 

Yes, that is indeed my position. We've been discussing this a lot in a big thread on the public-html list recently. In <http://lists.w3.org/Archives/Public/public-html/2013Sep/0004.html> I explain why I think it's a bad idea to expose cues which can in principle be rendered without actually implementing the rendering.

Note that this is not to say that *metadata* cues which are by design application-specific shouldn't be exposed, they clearly should be. It's only things like the SSA in Matroska example that I think are a bad idea.

Comment 6 Silvia Pfeiffer 2013-09-16 03:45:56 UTC

(In reply to Philip Jägenstedt from comment #5)
> In
> <http://lists.w3.org/Archives/Public/public-html/2013Sep/0004.html> I
> explain why I think it's a bad idea to expose cues which can in principle be
> rendered without actually implementing the rendering.
> 
> Note that this is not to say that *metadata* cues which are by design
> application-specific shouldn't be exposed, they clearly should be. It's only
> things like the SSA in Matroska example that I think are a bad idea.

FAICT your only reason to object is that script may rely on rendering SSA by itself even after the browser might have already implemented an interface to render it. I think that's not a TextTrackCue specific problem, but one that applies to all new features of browsers.

If the browser exposes such cues as UnparsedCues initially, JS will implement the rendering. Later, the browser supports SSACues and will not expose UnparsedCues any longer, so JS will not kick in. So, this situation doesn't actually create a problem.

Comment 7 Silvia Pfeiffer 2013-09-16 04:11:07 UTC

(In reply to Philip Jägenstedt from comment #1)
> Isn't inBandMetadataTrackDispatchType/cueFormatHint specifically a signal
> for JavaScript rendering, something that the UA will expose but ignore
> itself?

@kind=metadata and @inBandMetadataTrackDispatchType both indicate that there is no native browser rendering.

We still have to resolve whether @inBandMetadataTrackDispatchType is allowed to be combined with other @kind values. Eric's suggestion was not to - and I tend to agree.


> In any case, would it not also solve the problem to allow the spec that
> defines the interface (like VTTCue) also define whether or not setting
> mode='showing' is possible?

That doesn't work: "showing" doesn't mean it's displaying something - it actually means that the track is active and potentially showing something [1]. So, making a metadata track "showing" just means to make the TextTrackCueList available.

[1] http://www.w3.org/html/wg/drafts/html/master/embedded-content-0.html#dom-texttrack-mode



However, we should probably explain in [1] that even for tracks with @mode="showing", it's possible that cues are being displayed, but not listed in the live TextTrackCueList, as explained by Eric [2].

[1] http://www.w3.org/html/wg/drafts/html/master/embedded-content-0.html#dom-texttrack-mode
[2] http://lists.w3.org/Archives/Public/public-html/2013Sep/0014.html

Comment 8 Philip Jägenstedt 2013-09-16 09:10:16 UTC

(In reply to Silvia Pfeiffer from comment #6)
> (In reply to Philip Jägenstedt from comment #5)
> > In
> > <http://lists.w3.org/Archives/Public/public-html/2013Sep/0004.html> I
> > explain why I think it's a bad idea to expose cues which can in principle be
> > rendered without actually implementing the rendering.
> > 
> > Note that this is not to say that *metadata* cues which are by design
> > application-specific shouldn't be exposed, they clearly should be. It's only
> > things like the SSA in Matroska example that I think are a bad idea.
> 
> FAICT your only reason to object is that script may rely on rendering SSA by
> itself even after the browser might have already implemented an interface to
> render it. I think that's not a TextTrackCue specific problem, but one that
> applies to all new features of browsers.
> 
> If the browser exposes such cues as UnparsedCues initially, JS will
> implement the rendering. Later, the browser supports SSACues and will not
> expose UnparsedCues any longer, so JS will not kick in. So, this situation
> doesn't actually create a problem.

You're right, I didn't consider the implications of the interface changing. That makes it possible for a script to be pedantic and check the interface before handling a cue. A less pedantic script would still either end up with double rendering (if both interfaces have a .text property) or rendering "undefined" or similar (if they don't share the property name).

Comment 9 Silvia Pfeiffer 2013-09-16 11:35:45 UTC

(In reply to Philip Jägenstedt from comment #8)
> 
> You're right, I didn't consider the implications of the interface changing.
> That makes it possible for a script to be pedantic and check the interface
> before handling a cue. A less pedantic script would still either end up with
> double rendering (if both interfaces have a .text property) or rendering
> "undefined" or similar (if they don't share the property name).

We should encourage JS devs to check the interface that in-band text track cues return. Maybe we can add and example - that usually has good influence on best practice.

Comment 10 Graham 2013-09-16 15:38:02 UTC

Assuming that we use this approach of hiding cues from JS when @mode=showing can you explain method can be employed to rely on UA to render the captions yet have the application use the captions for other purposes like keyword logging for interactive use cases or parental controls? For example inappropriate keyword replacement, counting use of a particular word in a speech, interactive ad triggers and so on?

Comment 11 Silvia Pfeiffer 2013-09-17 04:04:30 UTC

(In reply to Graham from comment #10)
> Assuming that we use this approach of hiding cues from JS when @mode=showing
> can you explain method can be employed to rely on UA to render the captions
> yet have the application use the captions for other purposes like keyword
> logging for interactive use cases or parental controls? For example
> inappropriate keyword replacement, counting use of a particular word in a
> speech, interactive ad triggers and so on?

I assume you are referring to the situation that Eric explained where cues are not exposed to JS? That should be an exception and is not the case with typical browsers IIUC. It only happens on versions of the OS where the system frameworks do not have the necessary API to override cue rendering. It's definitely outside the reach of HTML.

Comment 12 steve faulkner 2015-06-12 14:44:58 UTC

Silvia has this been resolved?

Comment 13 Silvia Pfeiffer 2015-06-12 22:17:41 UTC

The current situation, from what I can tell, is that the only way in which cues are currently exposed in browsers is through VTTCue, no matter whether they come from inband tracks or from a <track> element. Browsers that expose inband captions right now fill a VTTCue object to do so. Thus, this bug is moot, since other types of cues cannot be created or shown.

I would recommend closing this bug.

Comment 14 Michael[tm] Smith 2015-06-22 05:38:14 UTC

Closing per comment 13