21851 – revert removal of text and getCueAsHTML members; address constructor changes

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 21851 - revert removal of text and getCueAsHTML members; address constructor changes

Summary: revert removal of text and getCueAsHTML members; address constructor changes

Status:	CLOSED FIXED

Alias:	None

Product:	HTML WG
Classification:	Unclassified
Component:	HTML5 spec (show other bugs)
Version:	unspecified
Hardware:	All All

Importance:	P2 normal
Target Milestone:	---
Assignee:	Silvia Pfeiffer
QA Contact:	HTML WG Bugzilla archive list

URL:
Whiteboard:
Keywords:	CR

Duplicates (2):	21080 21627 (view as bug list)
Depends on:	22903
Blocks:	23113
	Show dependency tree / graph

Reported:	2013-04-26 14:30 UTC by Glenn Adams
Modified:	2013-09-26 04:28 UTC (History)
CC List:	13 users (show)

See Also:

Attachments

Description Glenn Adams 2013-04-26 14:30:54 UTC

In [1], the text and getCueAsHTML members were removed from TextTrackCue. In addition, the previously defined constructor was removed.

The effect of these changes is that:

(1) it is no longer possible to create or add a text track cue to a text track by programmatic means (JS) when not using VTT;

(2) it is no longer possible to access a text form of the cue by programmatic means (JS) when not using VTT;

(3) it is no longer possible to access an HTML form of the cue by programmatic means (JS) when not using VTT;

The first of these points is related to bug 21080, where it was pointed out that, as previously defined, the TextTrackCue made incorrect assumptions about use of VTT, and needed to be modified to be more generically specified. An alternative was also offered there of adding a new operation as follows (to TextTrack), effectively:

partial interface TextTrack {
  TextTrackCue createCue(float startTime, float endTime, DOMString text)
}

Instead of adopting this proposal, the editor has chosen to remove the TextTrackCue, thus removing entirely the ability to create a TextTrackCue in the non-VTT scenarios.

At present, the consumer electronics industry in concert with content service providers, have already made use of and expect to continue to make use of the text member of TextTrackCue for accessing non-VTT track information in the form of text and/or HTML document fragments. For example, see [2]. It is further expected that getCueAsHTML will be used in non-VTT cases as well, and since its definition in HTML 5.0 is sufficiently generic to support this use case, then it should be retained along with the text member.

Accordingly, the change to remove these members should be reverted, and, for the constructor, either a more generic definition of the constructor should be provided (that is not dependent on VTT) or the proposal in bug 21080 to add a createCue member should be adopted.

It should also be noted that the already effected changes [1] effectively create a backwards incompatible substantive change between HTML 5.0 and HTML 5.1 without any discussion, resolution, or plan for this change.

[1] https://github.com/w3c/html/commit/586ae3996fdce5d9f71cbe57a08759fce7b26d8f
[2] http://www.cablelabs.com/specifications/CL-SP-HTML5-MAP-I02-120510.pdf

Comment 1 Glenn Adams 2013-04-26 16:47:34 UTC

(In reply to comment #0)
> Instead of adopting this proposal, the editor has chosen to remove the
> TextTrackCue, ...

s/TextTrackCue/TextTrackCue constructor/

Comment 2 Ian 'Hixie' Hickson 2013-04-26 23:48:18 UTC

These members don't make sense for all formats.

We could have a common interface that inherits from TextTrackCue and is inherited by WebVTT, though, if it's so common that duplicating the two members in multiple interfaces is considered bad. But that seems like overkill.

(In reply to comment #0)
> 
> (1) it is no longer possible to create or add a text track cue to a text
> track by programmatic means (JS) when not using VTT;

That's not true; you can use whatever API the format provides. If your UA supports DVD cues, for example, there could be a .setImage() method that takes a BitmapImage. It's up to the cue's spec, just like WebVTT exposes the attributes mentioned in this bug.


> (2) it is no longer possible to access a text form of the cue by
> programmatic means (JS) when not using VTT;
> (3) it is no longer possible to access an HTML form of the cue by
> programmatic means (JS) when not using VTT;

Again, for formats where doing so makes sense, the format's API can just expose these attributes and methods.

Comment 3 Glenn Adams 2013-04-27 03:41:31 UTC

(In reply to comment #2)
> These members don't make sense for all formats.
> 
> We could have a common interface that inherits from TextTrackCue and is
> inherited by WebVTT, though, if it's so common that duplicating the two
> members in multiple interfaces is considered bad. But that seems like
> overkill.
> 
> (In reply to comment #0)
> > 
> > (1) it is no longer possible to create or add a text track cue to a text
> > track by programmatic means (JS) when not using VTT;
> 
> That's not true; you can use whatever API the format provides. If your UA
> supports DVD cues, for example, there could be a .setImage() method that
> takes a BitmapImage. It's up to the cue's spec, just like WebVTT exposes the
> attributes mentioned in this bug.

As presently defined in HTML5.0, it is not necessary for an application to know what interface is implemented by a cue object beyond TextTrackCue if it's only goal is to obtain a text or an html representation of the cue.

While it is true that the application would need to either (1) have a priori knowledge about what type of content is delivered with a given text track or (2) sniff the content based on what (if anything) is returned when evaluating the text IDL attribute, as presently defined, it can at least assume that a string or null is returned, though the current 5.0 prose doesn't really provide sufficient detail about the return value when the user agent doesn't know how to constitute a text value.

The present language, in HTML 5.0 section 4.8.10.12.5 states:

"The text attribute, on getting, must return the raw text track cue text of the text track cue that the TextTrackCue object represents. On setting, the text track cue text must be set to the new value.

The getCueAsHTML() method must convert the text track cue text to a DocumentFragment for the script's document of the entry script, using the appropriate rules for doing so. For example, for WebVTT, those rules are the WebVTT cue text parsing rules and the WebVTT cue text DOM construction rules. [WEBVTT]"

This prose is sufficiently general to support the continue presence of these two members regardless the text track format.

I would point out that:

(1) a TextTrack, by mere use of the qualifier "Text", leads one to believe that a text representation for its content is reasonable, or at least possible;

(2) in the case of text tracks that refer to binary data, it is straightforward to define that the text IDL attribute evaluates to null or to a text representation of the binary data, e.g., a BASE64 encoding; this has indeed been done with a CableLabs specification [1] that provides access to MPEG-2 program specific information (PSI) via TextTrack and TextTrackCue;

[1] http://www.cablelabs.com/specifications/CL-SP-HTML5-MAP-I02-120510.pdf

(3) in the case of caption formats that use image data, such as DVD, it is straightforward to define that text returns null or returns an alternate text or description text if available; it could also return a URL to a built-in protocol handler that permits dereferencing the image from an image cache;

The point is, there is always a reasonable defined value for the text attribute: either null or some string of useful value.

> 
> 
> > (2) it is no longer possible to access a text form of the cue by
> > programmatic means (JS) when not using VTT;
> > (3) it is no longer possible to access an HTML form of the cue by
> > programmatic means (JS) when not using VTT;
> 
> Again, for formats where doing so makes sense, the format's API can just
> expose these attributes and methods.

The problem with the approach entailed by moving these two members out of TextTrackCue is that it doesn't recognize that the primary intent of these interfaces is to expose text and that all referenced text track formats can define a reasonable text expression (even if null in some cases where no text representation whatsoever makes sense).

The same can be said for the getCueAsHTML member. Even DVD sub-picture images could be represented in HTML by merely returning an img element that employs a UA specific internal URL value that the UA knows how to dereference (since it created it).

Overall, I'd say the proposed change to remove these members is a case of allowing the tail to wag the dog.

Comment 4 Ian 'Hixie' Hickson 2013-04-30 05:14:54 UTC

If you don't know what kind of cue it is, and the cue might return either WebVTT cue text (marked up), base64 data, some random other format, or kittens know what else, then honestly the feature is pointless. I really don't see any value in exposing that, especially given that there's no guarantee anything can be exposed at all. It just doesn't make any sense. What's the use case? If it's just debugging, then just expose the whole object and all its attributes, like console.log does for anything else.

The term "TextTrack" is merely meant to distinguish it from AudioTracks or VideoTracks. There's nothing about TextTracks that forces them to be text; they could equally be named "TimedTracks" (indeed at one point they were, we changed it for unrelated reasons).

It makes no sense to force every cue format interface to find a string representation. There's no use case for it, there's no need for it (every format can offer its own API), there's no benefit to it. All it does is constrain future format developers.

It is no more true that all timed text track formats can expose a string for their cues than it is true that all timed text track formats can expose an X coordinate or a Y coordinate for their cues.

I really do not understand why you want this.

Comment 5 Glenn Adams 2013-04-30 06:23:47 UTC

(In reply to comment #4)
> If you don't know what kind of cue it is, and the cue might return either
> WebVTT cue text (marked up), base64 data, some random other format, or
> kittens know what else, then honestly the feature is pointless.

In some cases, the page's author knows the text track content type, because the author chooses what sources to use, and may (but need not) know the text track.

Frankly, I am not satisfied with this state of affairs; however, you and others argued against providing UA hints to the client JS in the case this isn't true when [1] was closed without really solving the problem.

[1] https://www.w3.org/Bugs/Public/show_bug.cgi?id=13359

> I really
> don't see any value in exposing that, especially given that there's no
> guarantee anything can be exposed at all. It just doesn't make any sense.
> What's the use case? If it's just debugging, then just expose the whole
> object and all its attributes, like console.log does for anything else.
> 
> The term "TextTrack" is merely meant to distinguish it from AudioTracks or
> VideoTracks. There's nothing about TextTracks that forces them to be text;
> they could equally be named "TimedTracks" (indeed at one point they were, we
> changed it for unrelated reasons).
> 
> It makes no sense to force every cue format interface to find a string
> representation. There's no use case for it, there's no need for it (every
> format can offer its own API), there's no benefit to it. All it does is
> constrain future format developers.
> 
> It is no more true that all timed text track formats can expose a string for
> their cues than it is true that all timed text track formats can expose an X
> coordinate or a Y coordinate for their cues.
> 
> I really do not understand why you want this.

Very simple: (1) it's there now (on TextTrackCue in 5.0), (2) it's implemented (in a variety of UAs), (3) it's being used, and (4) the proposed change to remove it doesn't solve any problem but creates new problems.

As presently defined, an author can use knowledge about possible track sources or content and can use this in combination with what is returned from text to guess the cue type. It would be better if the UA used knowledge from its decoders to provide a more concrete hint of the track type. But even that wouldn't eliminate the widespread utility of having a text attribute in the base cue class.

Comment 6 Glenn Adams 2013-04-30 06:40:33 UTC

(In reply to comment #5)
> (In reply to comment #4)
> > If you don't know what kind of cue it is, and the cue might return either
> > WebVTT cue text (marked up), base64 data, some random other format, or
> > kittens know what else, then honestly the feature is pointless.
> 
> In some cases, the page's author knows the text track content type, because
> the author chooses what sources to use, and may (but need not) know the text
> track.
> 
> Frankly, I am not satisfied with this state of affairs; however, you and
> others argued against providing UA hints to the client JS in the case this
> isn't true when [1] was closed without really solving the problem.
> 
> [1] https://www.w3.org/Bugs/Public/show_bug.cgi?id=13359
> 
> > I really
> > don't see any value in exposing that, especially given that there's no
> > guarantee anything can be exposed at all. It just doesn't make any sense.
> > What's the use case? If it's just debugging, then just expose the whole
> > object and all its attributes, like console.log does for anything else.
> > 
> > The term "TextTrack" is merely meant to distinguish it from AudioTracks or
> > VideoTracks. There's nothing about TextTracks that forces them to be text;
> > they could equally be named "TimedTracks" (indeed at one point they were, we
> > changed it for unrelated reasons).
> > 
> > It makes no sense to force every cue format interface to find a string
> > representation. There's no use case for it, there's no need for it (every
> > format can offer its own API), there's no benefit to it. All it does is
> > constrain future format developers.
> > 
> > It is no more true that all timed text track formats can expose a string for
> > their cues than it is true that all timed text track formats can expose an X
> > coordinate or a Y coordinate for their cues.
> > 
> > I really do not understand why you want this.
> 
> Very simple: (1) it's there now (on TextTrackCue in 5.0), (2) it's
> implemented (in a variety of UAs), (3) it's being used, and (4) the proposed
> change to remove it doesn't solve any problem but creates new problems.
> 
> As presently defined, an author can use knowledge about possible track
> sources or content and can use this in combination with what is returned
> from text to guess the cue type. It would be better if the UA used knowledge
> from its decoders to provide a more concrete hint of the track type. But
> even that wouldn't eliminate the widespread utility of having a text
> attribute in the base cue class.

I guess I should also note that removing from 5.1 creates uncertainty about the status of these features in 5.0. Will they be removed from 5.0 before it goes to REC? If so, then when will 5.1 become a REC? All of this produces delay in testing, patent protection, device certification, and ultimately deployment.

In exchange for this uncertainty, you are saying that there are *some* cases where the value of a text attribute doesn't make sense. The trade doesn't seem to be worth it.

Comment 7 Ian 'Hixie' Hickson 2013-04-30 17:50:22 UTC

(In reply to comment #5)
> 
> In some cases, the page's author knows the text track content type, because
> the author chooses what sources to use, and may (but need not) know the text
> track.
> 
> Frankly, I am not satisfied with this state of affairs; however, you and
> others argued against providing UA hints to the client JS in the case this
> isn't true when [1] was closed without really solving the problem.
> 
> [1] https://www.w3.org/Bugs/Public/show_bug.cgi?id=13359

I don't understand the relevance of this to this bug.


> Very simple: (1) it's there now (on TextTrackCue in 5.0)

There was a lot more on TextTrackCue before, not just these two members. Why are these two members special?


> (2) it's implemented (in a variety of UAs)
> (3) it's being used

The model of every format's interface having these members is completely backwards compatible with what is implemented and used, as far as I can tell, without making future interfaces meaningless and confusing.


> and (4) the proposed
> change to remove it doesn't solve any problem but creates new problems.

It solves the problem of the base interface having attributes that don't make sense for every format.

What problems does it create?


> As presently defined, an author can use knowledge about possible track
> sources or content and can use this in combination with what is returned
> from text to guess the cue type.

You don't need to guess the cue type. Just look at what interface it implements.


> It would be better if the UA used knowledge
> from its decoders to provide a more concrete hint of the track type. But
> even that wouldn't eliminate the widespread utility of having a text
> attribute in the base cue class.

I have no idea what this means.

Comment 8 Glenn Adams 2013-04-30 18:54:05 UTC

(In reply to comment #7)
> (In reply to comment #5)
> > 
> > In some cases, the page's author knows the text track content type, because
> > the author chooses what sources to use, and may (but need not) know the text
> > track.
> > 
> > Frankly, I am not satisfied with this state of affairs; however, you and
> > others argued against providing UA hints to the client JS in the case this
> > isn't true when [1] was closed without really solving the problem.
> > 
> > [1] https://www.w3.org/Bugs/Public/show_bug.cgi?id=13359
> 
> I don't understand the relevance of this to this bug.

You left out the relevant context from comment #4:

> If you don't know what kind of cue it is, and the cue might return either
> WebVTT cue text (marked up), base64 data, some random other format, or
> kittens know what else, then honestly the feature is pointless.

You noted that the client JS may not "know what kind of cue it is". I'm reminding you that it was requested to have the UA provide type information to the client JS, but that was never implemented in the spec.

> 
> 
> > Very simple: (1) it's there now (on TextTrackCue in 5.0)
> 
> There was a lot more on TextTrackCue before, not just these two members. Why
> are these two members special?

The other members that have been moved have not been written into external specs and used by content authors, at least in the domain I'm dealing with.

> 
> 
> > (2) it's implemented (in a variety of UAs)
> > (3) it's being used
> 
> The model of every format's interface having these members is completely
> backwards compatible with what is implemented and used, as far as I can
> tell, without making future interfaces meaningless and confusing.

It is also reasonable to leave them on the base class and have the base class return a nonce result in the absence of an override.

> 
> 
> > and (4) the proposed
> > change to remove it doesn't solve any problem but creates new problems.
> 
> It solves the problem of the base interface having attributes that don't
> make sense for every format.

There is no design principle that requires this. Does Node.nodeValue make sense in every Node type? Why not remove it also? Node.nodeName? CustomEvent.detail?

> 
> What problems does it create?

Please read my previous responses. I would just be repeating myself to respond again.

> 
> 
> > As presently defined, an author can use knowledge about possible track
> > sources or content and can use this in combination with what is returned
> > from text to guess the cue type.
> 
> You don't need to guess the cue type. Just look at what interface it
> implements.

The presence of a text attribute doesn't tell you anything about the syntax or semantics of its values. Only when you know the text track format type can this be reliably deduced. In the absence of such information from the demux/decoder, only a priori knowledge of the source can be used if that info is available. Otherwise one has to guess.

> 
> 
> > It would be better if the UA used knowledge
> > from its decoders to provide a more concrete hint of the track type. But
> > even that wouldn't eliminate the widespread utility of having a text
> > attribute in the base cue class.
> 
> I have no idea what this means.

It means I can write:

if (cue.type == "text/x-my-metadata")
  useTextAccordingToMyMetadataFormat(cue.text);
else if (cue.type == "text/x-your-metdata")
  useTextAccordingToYourMetadataFormat(cue.text);

The demux/decoder can or often will determine the text track format, and can provide this as a hint to client JS.

Comment 9 Silvia Pfeiffer 2013-05-06 07:32:39 UTC

(In reply to comment #8)
> > 
> > > Very simple: (1) it's there now (on TextTrackCue in 5.0)
> > 
> > There was a lot more on TextTrackCue before, not just these two members. Why
> > are these two members special?
> 
> The other members that have been moved have not been written into external
> specs and used by content authors, at least in the domain I'm dealing with.

The relevant external spec is:
http://www.cablelabs.com/specifications/CL-SP-HTML5-MAP-I02-120510.pdf

In view of this specification, I suggest adding .text back into TextTrackCue as a generic means of accessing the unchanged content of a cue. It might make sense to rename that to .content to be more generic for binary and other types of timed tracks.

I'd also suggest to the authors of that specification to return a more specific WebVTTCue object or a TTMLCue object or CEA708Cue object which has all the API of the TextTrack object plus a getCueAsHTML() and an algorithm to convert the .content to HTML (and any other custom API that these formats would require). Without the conversion algorithm, getCueAsHTML() is pretty useless anyway.


> > > (2) it's implemented (in a variety of UAs)
> > > (3) it's being used
> > 
> > The model of every format's interface having these members is completely
> > backwards compatible with what is implemented and used, as far as I can
> > tell, without making future interfaces meaningless and confusing.
> 
> It is also reasonable to leave them on the base class and have the base
> class return a nonce result in the absence of an override.

I can see a problem with having getCueAsHTML() on the "base class". It requires application of a conversion algorithm to the original cue content, converting it to HTML. However, the UA only knows which conversion algorithm to apply when being told what type the cue is of and if it has a conversion algorithm for that type available. In the general case - in particular for @kind=metadata tracks - such a conversion algorithm is not available.

For example, assuming that the UA is given a WebVTT file for a <track> of @kind=metadata whose cues consist of image data URLs. A getCueAsHTML() function is pretty useless on such a track, and as a JS developer, I would know that. So I would not want to interpret that track as a WebVTTCue, but instead just use TextTrackCue.text to access the content, and then interpret the content as a dataURL that I hand to <img @src> elements.

We could eventually standardise a specific BinaryWebVTTCue API, inheriting from TextTrackCue that adds a getCueAsDataURL() or getCueAsArrayBuffer() function (similar to the FileReader API http://www.w3.org/TR/FileAPI/#dfn-filereader) . getCueAsHTML() certainly makes no sense in such an Object (though .content would).


> > > and (4) the proposed
> > > change to remove it doesn't solve any problem but creates new problems.
> > 
> > It solves the problem of the base interface having attributes that don't
> > make sense for every format.
> 
> There is no design principle that requires this.

It's not just about making sense - it's about creating less problems. See my example above with the image data URIs. If a developer was to use getCueAsHTML() on the dataURI, the binary data URI was interpreted according to the WebVTT-to-HTML algorithm, which could have all sorts of consequences. I'd regard that as a potential security hole.


> > > As presently defined, an author can use knowledge about possible track
> > > sources or content and can use this in combination with what is returned
> > > from text to guess the cue type.
> > 
> > You don't need to guess the cue type. Just look at what interface it
> > implements.
> 
> The presence of a text attribute doesn't tell you anything about the syntax
> or semantics of its values. Only when you know the text track format type
> can this be reliably deduced. In the absence of such information from the
> demux/decoder, only a priori knowledge of the source can be used if that
> info is available. Otherwise one has to guess.
> 
> > 
> > 
> > > It would be better if the UA used knowledge
> > > from its decoders to provide a more concrete hint of the track type. But
> > > even that wouldn't eliminate the widespread utility of having a text
> > > attribute in the base cue class.
> > 
> > I have no idea what this means.
> 
> It means I can write:
> 
> if (cue.type == "text/x-my-metadata")
>   useTextAccordingToMyMetadataFormat(cue.text);
> else if (cue.type == "text/x-your-metdata")
>   useTextAccordingToYourMetadataFormat(cue.text);
> 
> The demux/decoder can or often will determine the text track format, and can
> provide this as a hint to client JS.

I thought that's what the in-band metadata track dispatch type (http://www.whatwg.org/specs/web-apps/current-work/#text-track-in-band-metadata-track-dispatch-type) was for (see IDL attribute of TextTrack http://www.whatwg.org/specs/web-apps/current-work/#texttrack ).

Comment 10 Ian 'Hixie' Hickson 2013-05-07 05:27:50 UTC

If you know what kind of cue it is, then you just use its API.

If you don't know what kind of cue it is, then the suggested features here are pointless.

Note that there is now a way to find out what kind of cue it is, namely, check what kind of interface the cue object implements.


> The other members that have been moved have not been written into external
> specs and used by content authors, at least in the domain I'm dealing with.

This makes no sense. Just put the attributes into the format-specific API that you implement, and everything will just work.


> It is also reasonable to leave them on the base class and have the base
> class return a nonce result in the absence of an override.

No, it's not reasonable, it's non-sensical, as I've explained.


> > It solves the problem of the base interface having attributes that don't
> > make sense for every format.
> 
> There is no design principle that requires this.

It's common sense language design.


> Does Node.nodeValue make sense in every Node type? Why not remove it also? 
> Node.nodeName? CustomEvent.detail?

Those are all great examples of _terrible_ API design.


> > What problems does it create?
> 
> Please read my previous responses. I would just be repeating myself to
> respond again.

You haven't listed any actual problems so far.


> > > As presently defined, an author can use knowledge about possible track
> > > sources or content and can use this in combination with what is returned
> > > from text to guess the cue type.
> > 
> > You don't need to guess the cue type. Just look at what interface it
> > implements.
> 
> The presence of a text attribute doesn't tell you anything about the syntax
> or semantics of its values.

I'm not suggesting doing that. I'm suggesting looking at what interface the object implements. WebVTTCue, MyOwnProprietaryCue, TTMLCue, whatever.


> It means I can write:
> 
> if (cue.type == "text/x-my-metadata")
>   useTextAccordingToMyMetadataFormat(cue.text);
> else if (cue.type == "text/x-your-metdata")
>   useTextAccordingToYourMetadataFormat(cue.text);
> 
> The demux/decoder can or often will determine the text track format, and can
> provide this as a hint to client JS.

Why can't you write:

   if (cue instanceof WebVTTCue)
      useTextAccordingToMyMetadataFormat(cue.text);
   else if (cue instanceof YourMetadataCue)
      useTextAccordingToYourMetadataFormat(cue.dataheader, cue.databody);

...or whatever YourMetadataCue exposes?

I don't understand why this would mean we had to expose .text on every format.


> The relevant external spec is:
> http://www.cablelabs.com/specifications/CL-SP-HTML5-MAP-I02-120510.pdf

That spec can trivially be updated by just having it create an MPEG2Cue instead of a TextTrackCue (or whatever the interface should be). All problems solved, as far as I can tell. It's backwards-compatible, and fits the new API, and doesn't result in any meaningless APIs.


> In view of this specification, I suggest adding .text back into TextTrackCue
> as a generic means of accessing the unchanged content of a cue.

That makes no sense at all. The fix here is not breaking HTML, it's fixing the spec above to actually define its own cue format. The entire point of breaking out the cue formats is so that this is possible.


> It might make sense to rename that to .content to be more generic for binary 
> and other types of timed tracks.

It doesn't make sense to have an API that is _by definition_ undefined in this manner.


> I'd also suggest to the authors of that specification to return a more
> specific WebVTTCue object or a TTMLCue object or CEA708Cue object which has
> all the API of the TextTrack object plus a getCueAsHTML() and an algorithm
> to convert the .content to HTML (and any other custom API that these formats
> would require). Without the conversion algorithm, getCueAsHTML() is pretty
> useless anyway.

That's what should happen, except without the members on TextTrackCue. They are of no use if you have a dedicated API.


> I thought that's what the in-band metadata track dispatch type
> (http://www.whatwg.org/specs/web-apps/current-work/#text-track-in-band-
> metadata-track-dispatch-type) was for (see IDL attribute of TextTrack
> http://www.whatwg.org/specs/web-apps/current-work/#texttrack ).

That's about different kinds of cues within a specific format (e.g. a metadata cue vs a chapter cue in WebVTT). This bug is about different formats (e.g. MPEG vs WebVTT).

Comment 11 Silvia Pfeiffer 2013-05-07 22:24:33 UTC

(In reply to comment #10)
> If you know what kind of cue it is, then you just use its API.
> 
> If you don't know what kind of cue it is, then the suggested features here
> are pointless.

There is a difference between what the UA knows about the cue and what the JS dev knows and can do about the cue.

It is possible to have an implementation in a UA that parses timed tracks and exposes cues, but the UA has no implementation for how to further parse the cue content. The CableLabs spec at http://www.cablelabs.com/specifications/CL-SP-HTML5-MAP-I02-120510.pdf talks about this: to expose the cue as a UTF-16 string, or if it's binary as Base64, but it does not require an algorithm for what to do with the cue content thereafter in the UA. Though the UA may be able to provide a hint as to what the cues are, which it would provide in the text-track-in-band-metadata-track-dispatch-type (e.g. text/ttml or text/scc).

It would, of course, be better to have a dedicated API for TTMLCue or SCC708Cue, but if there is no further data than just the cue content, TextTrackCue is entirely sufficient if it has a .content (or .text) API.

The expectation is then that JS knows what to do with such cues, which I guess implies that such cues can only work for @kind=metadata tracks.



> > I thought that's what the in-band metadata track dispatch type
> > (http://www.whatwg.org/specs/web-apps/current-work/#text-track-in-band-
> > metadata-track-dispatch-type) was for (see IDL attribute of TextTrack
> > http://www.whatwg.org/specs/web-apps/current-work/#texttrack ).
> 
> That's about different kinds of cues within a specific format (e.g. a
> metadata cue vs a chapter cue in WebVTT). This bug is about different
> formats (e.g. MPEG vs WebVTT).

It's really about both, since the getCueAsHTML() function only makes sense for cue formats that the UA can render, even when it is already able to parse the cue file.

Comment 12 Glenn Adams 2013-05-08 00:12:49 UTC

(In reply to comment #9)
> (In reply to comment #8)
> > > 
> > > > Very simple: (1) it's there now (on TextTrackCue in 5.0)
> > > 
> > > There was a lot more on TextTrackCue before, not just these two members. Why
> > > are these two members special?
> > 
> > The other members that have been moved have not been written into external
> > specs and used by content authors, at least in the domain I'm dealing with.
> 
> The relevant external spec is:
> http://www.cablelabs.com/specifications/CL-SP-HTML5-MAP-I02-120510.pdf
> 
> In view of this specification, I suggest adding .text back into TextTrackCue
> as a generic means of accessing the unchanged content of a cue. It might
> make sense to rename that to .content to be more generic for binary and
> other types of timed tracks.

ok, but I'd prefer keeping the name unchanged, since DOMString return value is always text as it were

> 
> I'd also suggest to the authors of that specification to return a more
> specific WebVTTCue object or a TTMLCue object or CEA708Cue object which has
> all the API of the TextTrack object plus a getCueAsHTML() and an algorithm
> to convert the .content to HTML (and any other custom API that these formats
> would require). Without the conversion algorithm, getCueAsHTML() is pretty
> useless anyway.

ok, i can agree to not requiring getCueAsHTML() on the base class, but to define it as a partial interface on subclasses

> 
> 
> > > > (2) it's implemented (in a variety of UAs)
> > > > (3) it's being used
> > > 
> > > The model of every format's interface having these members is completely
> > > backwards compatible with what is implemented and used, as far as I can
> > > tell, without making future interfaces meaningless and confusing.
> > 
> > It is also reasonable to leave them on the base class and have the base
> > class return a nonce result in the absence of an override.
> 
> I can see a problem with having getCueAsHTML() on the "base class". It
> requires application of a conversion algorithm to the original cue content,
> converting it to HTML. However, the UA only knows which conversion algorithm
> to apply when being told what type the cue is

I do not agree that the UA needs to be told what kind of cue it is. In fact, the UA has better information than the client JS in this regard. I agree it would be useful to allow the author to provide authorial hints to the UA in the <track/> element that can help the UA better determine the type of the referenced track resource, but in this case, the authorial hint is akin to specifying @type on <img>.

As for determining the concrete track resource type, the UA always sees the bytes before the client JS (except in the case of using MSE to feed those bytes to the UA). As a consequence, the UA is in a better position to determine the concrete type of the delivered track bytes.

> of and if it has a conversion
> algorithm for that type available. In the general case - in particular for
> @kind=metadata tracks - such a conversion algorithm is not available.

Obviously, if the UA does not already know about a given track format, then it isn't going to be able to determine either a specific track media type or a subclass of TextTrackCue that could convert the text track content.

> 
> For example, assuming that the UA is given a WebVTT file for a <track> of
> @kind=metadata whose cues consist of image data URLs. A getCueAsHTML()
> function is pretty useless on such a track, and as a JS developer, I would
> know that. So I would not want to interpret that track as a WebVTTCue, but
> instead just use TextTrackCue.text to access the content, and then interpret
> the content as a dataURL that I hand to <img @src> elements.

I guess you are suggesting that the "convert to HTML" algorithm is not well defined for VTT either. Perhaps WebVTTCue shouldn't provide a getCueAsHTML() method either?

> 
> We could eventually standardise a specific BinaryWebVTTCue API, inheriting
> from TextTrackCue that adds a getCueAsDataURL() or getCueAsArrayBuffer()
> function (similar to the FileReader API
> http://www.w3.org/TR/FileAPI/#dfn-filereader) . getCueAsHTML() certainly
> makes no sense in such an Object (though .content would).
> 
> 
> > > > and (4) the proposed
> > > > change to remove it doesn't solve any problem but creates new problems.
> > > 
> > > It solves the problem of the base interface having attributes that don't
> > > make sense for every format.
> > 
> > There is no design principle that requires this.
> 
> It's not just about making sense - it's about creating less problems. See my
> example above with the image data URIs. If a developer was to use
> getCueAsHTML() on the dataURI, the binary data URI was interpreted according
> to the WebVTT-to-HTML algorithm, which could have all sorts of consequences.
> I'd regard that as a potential security hole.

All you've done here is demonstrate that getCueAsHTML() is not well defined on VTT  itself. But that is an entirely different problem. If it's a security hole, then it is a security hole on VTT and its conversion to HTML semantics.

But since I have conceded to not requiring getCueAsHTML() you can address this as a different bug on VTT at your pleasure.

> 
> 
> > > > As presently defined, an author can use knowledge about possible track
> > > > sources or content and can use this in combination with what is returned
> > > > from text to guess the cue type.
> > > 
> > > You don't need to guess the cue type. Just look at what interface it
> > > implements.
> > 
> > The presence of a text attribute doesn't tell you anything about the syntax
> > or semantics of its values. Only when you know the text track format type
> > can this be reliably deduced. In the absence of such information from the
> > demux/decoder, only a priori knowledge of the source can be used if that
> > info is available. Otherwise one has to guess.
> > 
> > > 
> > > 
> > > > It would be better if the UA used knowledge
> > > > from its decoders to provide a more concrete hint of the track type. But
> > > > even that wouldn't eliminate the widespread utility of having a text
> > > > attribute in the base cue class.
> > > 
> > > I have no idea what this means.
> > 
> > It means I can write:
> > 
> > if (cue.type == "text/x-my-metadata")
> >   useTextAccordingToMyMetadataFormat(cue.text);
> > else if (cue.type == "text/x-your-metdata")
> >   useTextAccordingToYourMetadataFormat(cue.text);
> > 
> > The demux/decoder can or often will determine the text track format, and can
> > provide this as a hint to client JS.
> 
> I thought that's what the in-band metadata track dispatch type
> (http://www.whatwg.org/specs/web-apps/current-work/#text-track-in-band-
> metadata-track-dispatch-type) was for (see IDL attribute of TextTrack
> http://www.whatwg.org/specs/web-apps/current-work/#texttrack ).

That won't work for two reasons: (1) it is restricted to @kind metadata, and (2) it is restricted to in-band.

In my estimation, the @kind attribute is completely orthogonal to media type, and is intrinsically tied to the interpretation and semantics of specific media types (like VTT).

Comment 13 Glenn Adams 2013-05-08 00:32:03 UTC

(In reply to comment #10)
> If you know what kind of cue it is, then you just use its API.

If it has one (an API). There should not be a requirement to define a new API in order to use a different text track media type.

One isn't required to use a different API to interact with different image types via HTMLImageElement, one isn't required to use a different API to interact with different A/V media types via HTMLMediaElement.

> 
> If you don't know what kind of cue it is, then the suggested features here
> are pointless.

No they aren't. They are no more pointless than defining an HTMLMediaElement API that can be used with all A/V media types.

> 
> Note that there is now a way to find out what kind of cue it is, namely,
> check what kind of interface the cue object implements.

Again, to repeat myself, this imposes a new, unique requirement on making use of text tracks that don't hold for images, A/V media, and other interfaces.

What happened to the well-known software concept of generalizing features?

> 
> 
> > The other members that have been moved have not been written into external
> > specs and used by content authors, at least in the domain I'm dealing with.
> 
> This makes no sense. Just put the attributes into the format-specific API
> that you implement, and everything will just work.

I've mentioned that technical definitions are not the only issue here: publishing schedule and IPR commitments are also at issue.

> 
> 
> > It is also reasonable to leave them on the base class and have the base
> > class return a nonce result in the absence of an override.
> 
> No, it's not reasonable, it's non-sensical, as I've explained.

We disagree on what is reasonable. At least I'm not calling your ideas nonsense. Calling my position nonsense is not a way to have an objective dialogue.

> 
> 
> > > It solves the problem of the base interface having attributes that don't
> > > make sense for every format.
> > 
> > There is no design principle that requires this.
> 
> It's common sense language design.

And so is abstracting base classes.

> 
> 
> > Does Node.nodeValue make sense in every Node type? Why not remove it also? 
> > Node.nodeName? CustomEvent.detail?
> 
> Those are all great examples of _terrible_ API design.

They work.

> 
> 
> > > What problems does it create?
> > 
> > Please read my previous responses. I would just be repeating myself to
> > respond again.
> 
> You haven't listed any actual problems so far.

The main problems are not technical, they are (1) spec uncertainty (introducing possible backwards incompatible changes for 5.0 to 5.1), (2) publishing schedule, and (3) ipr commitments.

These are more important issues than technical perfection with respect to API design.

> 
> 
> > > > As presently defined, an author can use knowledge about possible track
> > > > sources or content and can use this in combination with what is returned
> > > > from text to guess the cue type.
> > > 
> > > You don't need to guess the cue type. Just look at what interface it
> > > implements.
> > 
> > The presence of a text attribute doesn't tell you anything about the syntax
> > or semantics of its values.
> 
> I'm not suggesting doing that. I'm suggesting looking at what interface the
> object implements. WebVTTCue, MyOwnProprietaryCue, TTMLCue, whatever.

I understand. And I also understand this requires defining many new interfaces when generic interfaces can get most of the job done. See HTMLMediaElement.

> 
> 
> > It means I can write:
> > 
> > if (cue.type == "text/x-my-metadata")
> >   useTextAccordingToMyMetadataFormat(cue.text);
> > else if (cue.type == "text/x-your-metdata")
> >   useTextAccordingToYourMetadataFormat(cue.text);
> > 
> > The demux/decoder can or often will determine the text track format, and can
> > provide this as a hint to client JS.
> 
> Why can't you write:
> 
>    if (cue instanceof WebVTTCue)
>       useTextAccordingToMyMetadataFormat(cue.text);
>    else if (cue instanceof YourMetadataCue)
>       useTextAccordingToYourMetadataFormat(cue.dataheader, cue.databody);
> 
> ...or whatever YourMetadataCue exposes?

Sure, that would work technically. But it is more brittle, since it will result in a JS exception when attempting to dereference YourMetadataCue on a UA that doesn't support "text/x-your-metadata".

> 
> I don't understand why this would mean we had to expose .text on every
> format.

I didn't say we "have to". I'm saying its convenient and it works. You are proposing a different approach simply because in your opinion it makes for a more pleasing API design. Pleasing API design has never been a factor of W3C APIs. Why should TextTrackCue suffer from an over-zealous desire for some API design ideal that doesn't exist in practice?

> 
> 
> > The relevant external spec is:
> > http://www.cablelabs.com/specifications/CL-SP-HTML5-MAP-I02-120510.pdf
> 
> That spec can trivially be updated by just having it create an MPEG2Cue
> instead of a TextTrackCue (or whatever the interface should be). All
> problems solved, as far as I can tell. It's backwards-compatible, and fits
> the new API, and doesn't result in any meaningless APIs.

Yes, and it may at some point in the future take such an option, but why should your suggested change force this change in an external spec simply for some vague notion of API ideology?

> 
> 
> > In view of this specification, I suggest adding .text back into TextTrackCue
> > as a generic means of accessing the unchanged content of a cue.
> 
> That makes no sense at all. The fix here is not breaking HTML, it's fixing
> the spec above to actually define its own cue format. The entire point of
> breaking out the cue formats is so that this is possible.

It's also called throwing out the baby with the bath water. If having a generic text attribute works in many if not all cases, then its existence is justified.

> 
> 
> > It might make sense to rename that to .content to be more generic for binary 
> > and other types of timed tracks.
> 
> It doesn't make sense to have an API that is _by definition_ undefined in
> this manner.

It is not "undefined" it is just not fully specified for all possible uses.

> 
> 
> > I'd also suggest to the authors of that specification to return a more
> > specific WebVTTCue object or a TTMLCue object or CEA708Cue object which has
> > all the API of the TextTrack object plus a getCueAsHTML() and an algorithm
> > to convert the .content to HTML (and any other custom API that these formats
> > would require). Without the conversion algorithm, getCueAsHTML() is pretty
> > useless anyway.
> 
> That's what should happen, except without the members on TextTrackCue. They
> are of no use if you have a dedicated API.

It is not necessary to require a dedicated text track type specific API. That is my fundamental disagreement with your proposed approach.

> 
> 
> > I thought that's what the in-band metadata track dispatch type
> > (http://www.whatwg.org/specs/web-apps/current-work/#text-track-in-band-
> > metadata-track-dispatch-type) was for (see IDL attribute of TextTrack
> > http://www.whatwg.org/specs/web-apps/current-work/#texttrack ).
> 
> That's about different kinds of cues within a specific format (e.g. a
> metadata cue vs a chapter cue in WebVTT). This bug is about different
> formats (e.g. MPEG vs WebVTT).

Correct.

Comment 14 Silvia Pfeiffer 2013-05-12 11:46:42 UTC

(In reply to comment #12)
> > 
> > In view of this specification, I suggest adding .text back into TextTrackCue
> > as a generic means of accessing the unchanged content of a cue. It might
> > make sense to rename that to .content to be more generic for binary and
> > other types of timed tracks.
> 
> ok, but I'd prefer keeping the name unchanged, since DOMString return value
> is always text as it were

OK. I've changed http://www.w3.org/html/wg/wiki/User:Spfeiffe/TextTrackChange accordingly.


> > I can see a problem with having getCueAsHTML() on the "base class". It
> > requires application of a conversion algorithm to the original cue content,
> > converting it to HTML. However, the UA only knows which conversion algorithm
> > to apply when being told what type the cue is
> 
> I do not agree that the UA needs to be told what kind of cue it is. In fact,
> the UA has better information than the client JS in this regard.

I can see that being the case for in-band cues, but for cues that are created by JS? Hardly.

> I agree it
> would be useful to allow the author to provide authorial hints to the UA in
> the <track/> element that can help the UA better determine the type of the
> referenced track resource, but in this case, the authorial hint is akin to
> specifying @type on <img>.

I agree that providing information about the file format to the browser is useless - if it can interpret it, it will, and if it can't, telling it what it is doesn't help. However, I was referring to the content type of the cue, not of the track. If you create a TextTrackCue in JS, you can create it such that .text contains a WebVTT caption cue, or a TTML caption cue. Only when trying to render it to HTML will you need to interpret it as WebVTTCue or TTMLCue.


> > For example, assuming that the UA is given a WebVTT file for a <track> of
> > @kind=metadata whose cues consist of image data URLs. A getCueAsHTML()
> > function is pretty useless on such a track, and as a JS developer, I would
> > know that. So I would not want to interpret that track as a WebVTTCue, but
> > instead just use TextTrackCue.text to access the content, and then interpret
> > the content as a dataURL that I hand to <img @src> elements.
> 
> I guess you are suggesting that the "convert to HTML" algorithm is not well
> defined for VTT either. Perhaps WebVTTCue shouldn't provide a getCueAsHTML()
> method either?

That is indeed a good question, but one to be discussed in the CG. I personally think that we need to distinguish between the encapsulation file type and the cue file type and that cue file types can have different interpretation algorithms even for the same encapsulation file type.



> > > It means I can write:
> > > 
> > > if (cue.type == "text/x-my-metadata")
> > >   useTextAccordingToMyMetadataFormat(cue.text);
> > > else if (cue.type == "text/x-your-metdata")
> > >   useTextAccordingToYourMetadataFormat(cue.text);
> > > 
> > > The demux/decoder can or often will determine the text track format, and can
> > > provide this as a hint to client JS.
> > 
> > I thought that's what the in-band metadata track dispatch type
> > (http://www.whatwg.org/specs/web-apps/current-work/#text-track-in-band-
> > metadata-track-dispatch-type) was for (see IDL attribute of TextTrack
> > http://www.whatwg.org/specs/web-apps/current-work/#texttrack ).
> 
> That won't work for two reasons: (1) it is restricted to @kind metadata, and
> (2) it is restricted to in-band.

I think we could change that, while there is no implementation of that attribute in browsers yet. We would lift both restrictions and could rename it to @cueType and it would provide a hint as to what algorithm to use to interpret the cue content of a TextTrack. E.g. "D_WEBVTT/captions" would use the caption/subtitle rendering algorithm of WebVTT, while "D_WEBVTT/chapters" would use a different rendering algorithm and "D_WEBVTT/metadata" would not do any rendering interpretation at all. Other formats would set defined types, too.



> The main problems are not technical, they are (1) spec uncertainty
> (introducing possible backwards incompatible changes for 5.0 to 5.1), (2)
> publishing schedule, and (3) ipr commitments.

The idea is to introduce the changes here to 5.0, too, if all browsers agree to implement them.

Comment 15 Glenn Adams 2013-05-12 13:33:59 UTC

(In reply to comment #14)
> (In reply to comment #12)
> 
> > > I can see a problem with having getCueAsHTML() on the "base class". It
> > > requires application of a conversion algorithm to the original cue content,
> > > converting it to HTML. However, the UA only knows which conversion algorithm
> > > to apply when being told what type the cue is
> > 
> > I do not agree that the UA needs to be told what kind of cue it is. In fact,
> > the UA has better information than the client JS in this regard.
> 
> I can see that being the case for in-band cues, but for cues that are
> created by JS? Hardly.

I'm referring to tracks/cues constructed by the UA, from either in-band or out-of-band text track content.

> 
> > I agree it
> > would be useful to allow the author to provide authorial hints to the UA in
> > the <track/> element that can help the UA better determine the type of the
> > referenced track resource, but in this case, the authorial hint is akin to
> > specifying @type on <img>.
> 
> I agree that providing information about the file format to the browser is
> useless - if it can interpret it, it will, and if it can't, telling it what
> it is doesn't help.

Actually, it may be important to allow an author to define a fallback order when multiple resources are available for a track, and those resources employ different content types. Really, <track> should support <source> just like video and audio elements. I don't know why it doesn't. If it does, then providing a content type advisory is useful for the UA (before it attempts to fetch/decode a resource).

> However, I was referring to the content type of the cue,
> not of the track. If you create a TextTrackCue in JS, you can create it such
> that .text contains a WebVTT caption cue, or a TTML caption cue. Only when
> trying to render it to HTML will you need to interpret it as WebVTTCue or
> TTMLCue.

In my mental model, I don't (thus far) distinguish between the content type of a cue and the content type of a text track. Rather, I view the content of a cue to be a derivation|compilation of some portion of the content of a text track, where the definition of that derivation is intrinsically tied to the definition of the content type of the track.

More specifically, when I use the term content type I'm referring to a something that could or does have a IANA Media Type assigned to it. While it's possible to treat TextTrackCue.text as a text serialization of some content type independently of the content type of the originating track, I'm not sure of the utility to that, unless there are multiple possible derivations that could end up producing different sub-formats for TextTrackCue.text from a given originating track's content type. Even in such a case, I would expect that other attributes of the cue or the actual content of TextTrackCue.text could be used to distinguish among these sub-formats.


> 
> 
> > > For example, assuming that the UA is given a WebVTT file for a <track> of
> > > @kind=metadata whose cues consist of image data URLs. A getCueAsHTML()
> > > function is pretty useless on such a track, and as a JS developer, I would
> > > know that. So I would not want to interpret that track as a WebVTTCue, but
> > > instead just use TextTrackCue.text to access the content, and then interpret
> > > the content as a dataURL that I hand to <img @src> elements.
> > 
> > I guess you are suggesting that the "convert to HTML" algorithm is not well
> > defined for VTT either. Perhaps WebVTTCue shouldn't provide a getCueAsHTML()
> > method either?
> 
> That is indeed a good question, but one to be discussed in the CG. I
> personally think that we need to distinguish between the encapsulation file
> type and the cue file type and that cue file types can have different
> interpretation algorithms even for the same encapsulation file type.

By "encapsulation file type" do you mean a container stream like MPEG-2 TS or MPEG-4 Part 12 (ISO Base Media File Format) or MPEG-4 Part 14 (MP4 File Format)? Or do you mean a text track format that can encapsulate different sub-types? I'd guess you are referring to the ability of text/vtt to carry different "kind"s of text: captions, metadata, etc., yes?

Do you expect to register (or be able to register) IANA Media Types for these sub-formats? If not, then it might be better to use a different term to describe the format of TextTrackCue.text. Perhaps TextTrackCue.textFormat? Where the (what you call) encapsulating content type (i.e., IANA Media Type of the containing text track resource) determines the acceptable values of textFormat?

> 
> 
> 
> > > > It means I can write:
> > > > 
> > > > if (cue.type == "text/x-my-metadata")
> > > >   useTextAccordingToMyMetadataFormat(cue.text);
> > > > else if (cue.type == "text/x-your-metdata")
> > > >   useTextAccordingToYourMetadataFormat(cue.text);
> > > > 
> > > > The demux/decoder can or often will determine the text track format, and can
> > > > provide this as a hint to client JS.
> > > 
> > > I thought that's what the in-band metadata track dispatch type
> > > (http://www.whatwg.org/specs/web-apps/current-work/#text-track-in-band-
> > > metadata-track-dispatch-type) was for (see IDL attribute of TextTrack
> > > http://www.whatwg.org/specs/web-apps/current-work/#texttrack ).
> > 
> > That won't work for two reasons: (1) it is restricted to @kind metadata, and
> > (2) it is restricted to in-band.
> 
> I think we could change that, while there is no implementation of that
> attribute in browsers yet. We would lift both restrictions and could rename
> it to @cueType and it would provide a hint as to what algorithm to use to
> interpret the cue content of a TextTrack. E.g. "D_WEBVTT/captions" would use
> the caption/subtitle rendering algorithm of WebVTT, while
> "D_WEBVTT/chapters" would use a different rendering algorithm and
> "D_WEBVTT/metadata" would not do any rendering interpretation at all. Other
> formats would set defined types, too.

OK, this sounds like what I meant by textFormat above, a text track media type specific enumeration of possible formats to be used for the purpose of interpreting TextTrackCue.text, yes?

> 
> 
> 
> > The main problems are not technical, they are (1) spec uncertainty
> > (introducing possible backwards incompatible changes for 5.0 to 5.1), (2)
> > publishing schedule, and (3) ipr commitments.
> 
> The idea is to introduce the changes here to 5.0, too, if all browsers agree
> to implement them.

OK, but "all" browsers may be too zealous. In any case, if at least two of the set of {IE, MOZ, WK} are amenable to changes for 5.0 in this regard, then that should be sufficient IMO.

Comment 16 Silvia Pfeiffer 2013-05-13 03:36:49 UTC

(In reply to comment #15)
> 
> In my mental model, I don't (thus far) distinguish between the content type
> of a cue and the content type of a text track.

It is important to do so, because it implements what you called the "fallback order". A browser may be able to parse a text track into its individual cues. But it may not know what to do with the cues - in fact, for track of kind=metadata, that is exactly the idea.


> Rather, I view the content of
> a cue to be a derivation|compilation of some portion of the content of a
> text track, where the definition of that derivation is intrinsically tied to
> the definition of the content type of the track.

Thus you merge the content type of the track and the cue into one. That won't work for WebVTT, which allows to deliver cues for different kinds that need different parsers. I don't think you even want that for TTML, seeing how I recently saw an example of a TTML file that delivers kind=metadata tracks containing JSON.

 
> More specifically, when I use the term content type I'm referring to a
> something that could or does have a IANA Media Type assigned to it.

Knowledge of a mime type of the text track is of no consequence to the JS developer. It is abstracted away by the browser by providing a TextTrack object with a list of cues. The only format that is of any relevance to the JS developer is the format of the cues.


> While
> it's possible to treat TextTrackCue.text as a text serialization of some
> content type independently of the content type of the originating track, I'm
> not sure of the utility to that, unless there are multiple possible
> derivations that could end up producing different sub-formats for
> TextTrackCue.text from a given originating track's content type.

See my above example of a WebVTT (or a TTML) file that could contain captions to be interpreted as HTML or as JSON.

> Even in
> such a case, I would expect that other attributes of the cue or the actual
> content of TextTrackCue.text could be used to distinguish among these
> sub-formats.

Other than educated guessing, the JS has nothing to go on.


> > That is indeed a good question, but one to be discussed in the CG. I
> > personally think that we need to distinguish between the encapsulation file
> > type and the cue file type and that cue file types can have different
> > interpretation algorithms even for the same encapsulation file type.
> 
> By "encapsulation file type" do you mean a container stream like MPEG-2 TS
> or MPEG-4 Part 12 (ISO Base Media File Format) or MPEG-4 Part 14 (MP4 File
> Format)?

Yes.

> Or do you mean a text track format that can encapsulate different
> sub-types? I'd guess you are referring to the ability of text/vtt to carry
> different "kind"s of text: captions, metadata, etc., yes?

"sub-types" are the cue types.


> Do you expect to register (or be able to register) IANA Media Types for
> these sub-formats?

No. They are cue formats and not mime types (file types). You wouldn't register a mime type on a TTML cue object [1] either. However, your TTML cue object could contain cue text to be interpreted for a caption or subtitle track, or it could be interpreted as JSON for a metadata track.

[1] http://www.cwmwenallt.com/ttml/TTMLmapping.htm#ttml-cue


> If not, then it might be better to use a different term
> to describe the format of TextTrackCue.text. Perhaps
> TextTrackCue.textFormat?

Since the cues of a TextTrack all have to have the same cue format, I suggested adding it to the TextTrack as .cueType. 


> Where the (what you call) encapsulating content
> type (i.e., IANA Media Type of the containing text track resource)
> determines the acceptable values of textFormat?

The encapsulating content can only provide part of the information, just like the inBandMetadataTrackDispatchType consist of more than just the file format.

For example, for WebVTT cues that come from a WebM file it ends up being a text track CodecID of “D_WEBVTT/kind“, where kind is one of SUBTITLES, CAPTIONS, DESCRIPTIONS, or METADATA. The "kind" in this case doesn't actually have a semantic meaning, but is more like a mapping to the interpretation algorithm to be used on the cues (i.e. it's a codecID).

[2] http://www.w3.org/html/wg/drafts/html/master/single-page.html#steps-to-expose-a-media-resource-specific-text-track

To explain the point even better, for WebVTT, I'd eventually like to see kind=descriptions being interpreted through the speech synthesis API [3], which can take both pain text and SSML. So, SSML needs a specific interface. I'd prefer this to not be called WebVTTSSMLCue, but rather something generic like SSMLCue.

[3] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html



> > I think we could change that, while there is no implementation of that
> > attribute in browsers yet. We would lift both restrictions and could rename
> > it to @cueType and it would provide a hint as to what algorithm to use to
> > interpret the cue content of a TextTrack. E.g. "D_WEBVTT/captions" would use
> > the caption/subtitle rendering algorithm of WebVTT, while
> > "D_WEBVTT/chapters" would use a different rendering algorithm and
> > "D_WEBVTT/metadata" would not do any rendering interpretation at all. Other
> > formats would set defined types, too.
> 
> OK, this sounds like what I meant by textFormat above, a text track media
> type specific enumeration of possible formats to be used for the purpose of
> interpreting TextTrackCue.text, yes?

Yes.


> > > The main problems are not technical, they are (1) spec uncertainty
> > > (introducing possible backwards incompatible changes for 5.0 to 5.1), (2)
> > > publishing schedule, and (3) ipr commitments.
> > 
> > The idea is to introduce the changes here to 5.0, too, if all browsers agree
> > to implement them.
> 
> OK, but "all" browsers may be too zealous. In any case, if at least two of
> the set of {IE, MOZ, WK} are amenable to changes for 5.0 in this regard,
> then that should be sufficient IMO.

Sure. :-)

Comment 17 Silvia Pfeiffer 2013-05-13 03:39:17 UTC

*** Bug 21080 has been marked as a duplicate of this bug. ***

Comment 18 Silvia Pfeiffer 2013-05-13 03:55:41 UTC

*** Bug 21627 has been marked as a duplicate of this bug. ***

Comment 19 Ian 'Hixie' Hickson 2013-05-28 21:06:58 UTC

(In reply to comment #11)
> 
> It is possible to have an implementation in a UA that parses timed tracks
> and exposes cues, but the UA has no implementation for how to further parse
> the cue content.

This has nothing to do with parsing the cue content. It's about parsing the cues. In an implementation such as you describe, you'd still have a new Cue interface, for whatever text track format you're dealing with.


> It would, of course, be better to have a dedicated API for TTMLCue or
> SCC708Cue, but if there is no further data than just the cue content,
> TextTrackCue is entirely sufficient if it has a .content (or .text) API.

That's really poor API design.

If you have a potential hierarchy of interfaces, and some leafs of this hierarchy would only have one or two APIs, you don't therefore not bother to create the leaves and instead put the APIs on the abstract common ancestor. You only put things on the abstract common ancestor if they apply to _every_ descendant interface.

There's no harm in having a "TTMLCue" interface that has a .text API. Or alternatively, in having an interface that inherits from the abstract TextTrackCue and that is inherited by WebVTTCue, which can then be used for those formats where the UA essentially just has a subset of WebVTT (though that would be inferior to having separate leaf interfaces, since you'd want each one to have its own rendering rules, and it'd be confusing to have a single interface used for multiple different rendering rules).

Comment 20 Silvia Pfeiffer 2013-06-05 03:44:09 UTC

(In reply to comment #19)
> (In reply to comment #11)
> > 
> > It is possible to have an implementation in a UA that parses timed tracks
> > and exposes cues, but the UA has no implementation for how to further parse
> > the cue content.
> 
> This has nothing to do with parsing the cue content. It's about parsing the
> cues.

Actually, this bug is about parsing cue content and not about file formats. The bug is about getCueAsHTML(), which parses a cue's content given int .text to HTML.


> In an implementation such as you describe, you'd still have a new Cue
> interface, for whatever text track format you're dealing with.

We've decoupled the format of the cue from the format of the text track file by having <track> parse any format that a browser cares to support into a list of Cues. That's our starting point. From there on, we only care about parsing cue content.


> > It would, of course, be better to have a dedicated API for TTMLCue or
> > SCC708Cue, but if there is no further data than just the cue content,
> > TextTrackCue is entirely sufficient if it has a .content (or .text) API.
> 
> That's really poor API design.
> 
> If you have a potential hierarchy of interfaces, and some leafs of this
> hierarchy would only have one or two APIs, you don't therefore not bother to
> create the leaves and instead put the APIs on the abstract common ancestor.
> You only put things on the abstract common ancestor if they apply to _every_
> descendant interface.

This is assuming that we will have text track formats that provide cues that don't have text in them. Is this realistic? For example, if I want to provide timed thumbnails to the browser (for example to do something like the thumbnail seek in YouTube), I'd continue to use WebVTT and provide data-urls for the images in the cues. Are you expecting this to be replaced in the future with a binary file format that delivers images? And would the <track> element still be the right delivery format for such a binary file format?


> There's no harm in having a "TTMLCue" interface that has a .text API. Or
> alternatively, in having an interface that inherits from the abstract
> TextTrackCue and that is inherited by WebVTTCue, which can then be used for
> those formats where the UA essentially just has a subset of WebVTT (though
> that would be inferior to having separate leaf interfaces, since you'd want
> each one to have its own rendering rules, and it'd be confusing to have a
> single interface used for multiple different rendering rules).

What we need is a means of creating a cue in JS without requiring to pre-specify what format the cue is in. In this way, JS developers can create text tracks with cues that contain JSON or plain text, or image URLs or whatever else they need and then interpret that content as the cues become active.

I've had a brief discussion with the text track devs in the browsers about this. They follow your argument that it shouldn't be on the abstract common ancestor, but suggested we introduce:

[Constructor(double startTime, double endTime, DOMString text)]
interface UnparsedCue : TextTrackCue {
};

Possibly with the .text attribute in it rather than on the base class.

Also, it was argued that since we are making non-backwards compatible changes, removing the TextTrackCue constructor makes old uses more obviously broken and it's easier to debug (by just checking the error console).

A further advantage of UnparsedCue is that browsers can parse text track formats that they understand how to parse (e.g. TTML, MP4 file text tracks), but for which they don't understand how to parse the individual cues (e.g. SSML cues in a WebVTT file; or metadata/chapter/description cues in a WebVTT) and can still make the cues available in JS through UnparsedCue.

Alternatively, we could also rename the abstract ancestor to AbstractTextTrackCue and make the generic, unparsed cue the TextTrackCue, which would make it more backwards compatible with what is currently implemented.

Comment 21 Silvia Pfeiffer 2013-06-05 04:24:14 UTC

I've thought about this some more and my suggestion for resolution is at http://www.w3.org/html/wg/wiki/User:Spfeiffe/TextTrackChange#.283.29_Removal_of_TextTrackCue_constructor 

On top of renaming TextTrackCue to AbstractTextTrackCue and introducing an unparsed TextTrackCue API, I also suggest renaming inBandMetadataTrackDispatchType on TextTrack to cueFormatHint.

The cueFormatHint would be filled by the browser not just for in-band tracks, but also for files coming from <track>. For example, parsing a WebVTT file with captions would result in a cueFormatHint of "webvtt", but parsing a WebVTT file with metadata or chapters or descriptions would give a cueFormatHint of "plaintext". Or if the browser knew that it was metadata and knew it was JSON, it could set cueFormatHint to "json".

Then, it's clear which XXXCue object should be used to parse the .cues in the TextTrack.

Comment 22 Graham 2013-07-24 23:25:44 UTC

(In reply to comment #21)
> I've thought about this some more and my suggestion for resolution is at
> http://www.w3.org/html/wg/wiki/User:Spfeiffe/TextTrackChange#.283.
> 29_Removal_of_TextTrackCue_constructor 
> 
> On top of renaming TextTrackCue to AbstractTextTrackCue and introducing an
> unparsed TextTrackCue API, I also suggest renaming
> inBandMetadataTrackDispatchType on TextTrack to cueFormatHint.
> 
> The cueFormatHint would be filled by the browser not just for in-band
> tracks, but also for files coming from <track>. For example, parsing a
> WebVTT file with captions would result in a cueFormatHint of "webvtt", but
> parsing a WebVTT file with metadata or chapters or descriptions would give a
> cueFormatHint of "plaintext". Or if the browser knew that it was metadata
> and knew it was JSON, it could set cueFormatHint to "json".
> 
> Then, it's clear which XXXCue object should be used to parse the .cues in
> the TextTrack.

You proposal for resolving "Removal of .text from TextTrackCue" is to re-introduce the .text attribute.
Your proposal for resolving "Removal of TextTrackCue Constructor" is to introduce the new TextCue that inherits from TextTrackCue. 

However it appears that the TextCue also has a .text attribute. 

Is the intention to change the resolution of the first proposal to say:
re-introduce the .text attribute, but on the new TextCue interface?

Comment 23 Silvia Pfeiffer 2013-07-28 11:17:54 UTC

(In reply to comment #22)
> 
> You proposal for resolving "Removal of .text from TextTrackCue" is to
> re-introduce the .text attribute.

Yes, I think re-introducing .text makes sense.

> Your proposal for resolving "Removal of TextTrackCue Constructor" is to
> introduce the new TextCue that inherits from TextTrackCue. 
>
> However it appears that the TextCue also has a .text attribute. 
> 
> Is the intention to change the resolution of the first proposal to say:
> re-introduce the .text attribute, but on the new TextCue interface?

It was, but I've since changed that wiki page. Since we're talking about text tracks, it can be expected that cue content will typically have text content, so I thought we can do without a TextCue and just add .text and the constructor back on the TextTrackCue.

Comment 24 Silvia Pfeiffer 2013-08-09 07:06:53 UTC

EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are
satisfied with this response, please change the state of this bug to CLOSED. If
you have additional information and would like the Editor to reconsider, please
reopen this bug. If you would like to escalate the issue to the full HTML
Working Group, please add the TrackerRequest keyword to this bug, and suggest
title and text for the Tracker Issue; or you may create a Tracker Issue
yourself, if you are able to do so. For more details, see this document:

   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Accepted

Change Description:
https://github.com/w3c/html/commit/1e84a5b2c6c16322c277ed2784689d779f5a173e


Rationale: see discussion at
http://lists.w3.org/Archives/Public/public-html/2013Jul/0034.html

Rather than implement #comment20 , I decided to view the TextTrackCue object as the quint-essential representation of a cue of type metadata with a text content. That is the 90% use case FAICT.

Comment 25 Glenn Adams 2013-08-09 15:39:46 UTC

Accepted

Comment 26 Simon Pieters 2013-08-12 12:40:56 UTC

Sorry for late reply, I've been on vacation.

I prefer my proposal to have a new constructor for unparsed cues instead of repurposing TextTrackCue, detailed at the end of comment 20, for the reasons given at the end of comment 20.

TextTrackCue looks like it would give a usable object that gets rendered (and it does in existing implementations), but instead you get an unparsed object that does not get rendered. This is not obvious and is hard to debug.

I read http://lists.w3.org/Archives/Public/public-html/2013Jul/0034.html but I don't see any rationale against the proposal in comment 20, nor any addressing of the concerns.

Comment 27 Silvia Pfeiffer 2013-08-12 13:06:18 UTC

(In reply to comment #26)
> Sorry for late reply, I've been on vacation.

Sure. It only just happened.

> I prefer my proposal to have a new constructor for unparsed cues instead of
> repurposing TextTrackCue, detailed at the end of comment 20, for the reasons
> given at the end of comment 20.
>
> TextTrackCue looks like it would give a usable object that gets rendered
> (and it does in existing implementations), but instead you get an unparsed
> object that does not get rendered. This is not obvious and is hard to debug.

Is the main reason for your objection backwards compatibility?

The name TextTrackCue by itself (without looking at history) doesn't imply whether it gets rendered on not - in fact, not even every VTTCue is/can be rendered.


> I read http://lists.w3.org/Archives/Public/public-html/2013Jul/0034.html but
> I don't see any rationale against the proposal in comment 20, nor any
> addressing of the concerns.

The main reason I didn't introduce an UnprasedCue object is that I don't really see the advantage of creating a basically empty object, just to get a constructor:

[Constructor(double startTime, double endTime, DOMString text)]
interface UnparsedCue : TextTrackCue {
};

Comment 28 Philip Jägenstedt 2013-08-12 13:14:03 UTC

(In reply to comment #27)

> The main reason I didn't introduce an UnprasedCue object is that I don't
> really see the advantage of creating a basically empty object, just to get a
> constructor:
> 
> [Constructor(double startTime, double endTime, DOMString text)]
> interface UnparsedCue : TextTrackCue {
> };

You would also add the text property to UnparsedCue, since it doesn't need to be on TextTrackCue, right?

Comment 29 Silvia Pfeiffer 2013-08-12 13:18:30 UTC

(In reply to comment #28)
> (In reply to comment #27)
> 
> > The main reason I didn't introduce an UnprasedCue object is that I don't
> > really see the advantage of creating a basically empty object, just to get a
> > constructor:
> > 
> > [Constructor(double startTime, double endTime, DOMString text)]
> > interface UnparsedCue : TextTrackCue {
> > };
> 
> You would also add the text property to UnparsedCue, since it doesn't need
> to be on TextTrackCue, right?

Could do... but...

Then we need to make VTTCue a child of UnparsedCue, too, right?

And then I don't see a need for the original TextTrackCue object any more, because all cues that get created for all the use cases that we currently cover will have some textual content being part of it, so would get created as an UnparsedCue.

Comment 30 Silvia Pfeiffer 2013-08-12 13:25:24 UTC

Replying to Philip's comment on the related bug:
> I don't think that keeping the TextTrackCue constructor and text property
> makes a lot of sense after TextTrackCue has been stripped of its WebVTT
> semantics. As far as I can tell, a TextTrackCue created by script can't be
> rendered at all, since it doesn't have any rendering rules. In other words,
> I think the WHATWG spec makes more sense here.

The JS developer could get such a cue rendered after parsing it himself. I don't see the problem.

Comment 31 Philip Jägenstedt 2013-08-12 14:11:41 UTC

As far as I am concerned, there are two questions of interest:

1. Should TextTrackCue have the .text property?

2. Should there be a TextTrackCue constructor?

For the first question, I don't really see why it should. Even for TTML it doesn't seem like a useful property to have, even if the DOM fragment representing the cue can be serialized. It seems far more useful to just expose the DOM fragment directly, no? I'm not sure if anyone has serious plans to support bitmap subtitles in things like MPEG-2, but a text property obviously wouldn't make sense for that either.

If there's no text property there's nothing much useful you can do with a TextTrackCue object, so why bother with a constructor?

(Most of the discussion for these changes happened while I was on leave, so let me know if I've missed something.)

Comment 32 Simon Pieters 2013-08-12 14:55:41 UTC

(In reply to comment #27)
> Is the main reason for your objection backwards compatibility?

My objection is

* existing scripts that use TextTrackCue will stop working without any indication about what's wrong

* new scripts might be written with the assumption that the cue will get rendered

> The name TextTrackCue by itself (without looking at history) doesn't imply
> whether it gets rendered on not - in fact, not even every VTTCue is/can be
> rendered.

Sure, but it's reasonable to assume that it can be rendered, especially since that's the case in current implementations.

> The main reason I didn't introduce an UnprasedCue object is that I don't
> really see the advantage of creating a basically empty object, just to get a
> constructor:
> 
> [Constructor(double startTime, double endTime, DOMString text)]
> interface UnparsedCue : TextTrackCue {
> };

The advantage is that existing scripts that use the TextTrackCue constructor get an exception so that it's super-easy to debug why it stopped working, and that it is clearer for newcomers that it's a dummy cue that they have to parse and render themselves. (I'm open for other names if anyone can think of a better name.)

As for the .text property, I would suggest defining it on both UnparsedCue and VTTCue, and have these two interfaces inherit from TextTrackCue.

Comment 33 Glenn Adams 2013-08-12 15:23:28 UTC

(In reply to comment #32)
> (In reply to comment #27)
> > Is the main reason for your objection backwards compatibility?
> 
> My objection is
> 
> * existing scripts that use TextTrackCue will stop working without any
> indication about what's wrong

not all of them, just the ones that explicitly invoke TextTrackCue() constructor; do you have any stats on usage in the wild?

> 
> * new scripts might be written with the assumption that the cue will get
> rendered

authors that use experimental (pre-REC) features must deal with changes; this is not a new problem here; caveat emptor applies

> 
> > The name TextTrackCue by itself (without looking at history) doesn't imply
> > whether it gets rendered on not - in fact, not even every VTTCue is/can be
> > rendered.
> 
> Sure, but it's reasonable to assume that it can be rendered, especially
> since that's the case in current implementations.

I agree with Silvia. Furthermore, there are implementations that do not render generic metadata mapped to generic TextTrackCue.

> 
> > The main reason I didn't introduce an UnprasedCue object is that I don't
> > really see the advantage of creating a basically empty object, just to get a
> > constructor:
> > 
> > [Constructor(double startTime, double endTime, DOMString text)]
> > interface UnparsedCue : TextTrackCue {
> > };
> 
> The advantage is that existing scripts that use the TextTrackCue constructor
> get an exception so that it's super-easy to debug why it stopped working,
> and that it is clearer for newcomers that it's a dummy cue that they have to
> parse and render themselves. (I'm open for other names if anyone can think
> of a better name.)
> 
> As for the .text property, I would suggest defining it on both UnparsedCue
> and VTTCue, and have these two interfaces inherit from TextTrackCue.

I think it better to leave on TextTrackCue as originally defined, and follow well established O-O design principles: namely, data abstraction. By definition, or at least by name, a TextTrackCue is associated with some "Text". It is perfectly reasonable, and understandable to abstract this by means of a common attribute on the base type. In fact, moving this potentially common abstraction to concrete sub-types makes it more difficult to understand and use.

Comment 34 Simon Pieters 2013-08-12 18:52:52 UTC

(In reply to comment #33)
> not all of them, just the ones that explicitly invoke TextTrackCue()
> constructor;

That's what I meant.

> do you have any stats on usage in the wild?

Not really stats, but see https://github.com/search?l=javascript&q=TextTrackCue&ref=searchresults&type=Code for some examples.

Comment 35 Graham 2013-08-12 19:04:06 UTC

In order to address the issue where a textTrackCue object may or may not be rendered by the user agent I would like to propose the following:

The definition of the mode attribute on the TextTrack interface is modified such that any attempts to change the mode to 'showing' fails unless there there is a cueFormatHint(as per http://www.w3.org/html/wg/wiki/User:Spfeiffe/TextTrackChange#.283 prop (3) ) 
present that is recognized by the User Agent as a TextTrack with renderable TextTrackCues based on this hint. 

In this way the application can detect whether to attempt CC display of texttracks  unrecognized by the user agent by utilizing a JS method.

Comment 36 Silvia Pfeiffer 2013-08-12 22:33:48 UTC

(In reply to comment #31)
> As far as I am concerned, there are two questions of interest:
> 
> 1. Should TextTrackCue have the .text property?
> 
> 2. Should there be a TextTrackCue constructor?

Indeed. Both of these were discussed at length.

> For the first question, I don't really see why it should. Even for TTML it
> doesn't seem like a useful property to have, even if the DOM fragment
> representing the cue can be serialized. It seems far more useful to just
> expose the DOM fragment directly, no?

I don't know the current state of mind of the TTML WG on this, but last time I looked, the way they were proposing to deal with the TextTrack API was to convert TTML files to an intermediate format consisting of a sequence of TTML region elements with the styled content in it, then further convert to styled HTML document fragments for rendering. I suppose the intermediate region elements would be in the .text content and the styled HTML fragments would be used in getCueAsHTML().


> I'm not sure if anyone has serious
> plans to support bitmap subtitles in things like MPEG-2, but a text property
> obviously wouldn't make sense for that either.

If there's one thing I learnt in the discussions, it's that we shouldn't address hypothetical use cases. Is anyone keen to implement a binary format on the text track?

For example, assuming your use case is a concrete one, I would expect that while having a bitmap to render on top of the video, there could still be a .text attribute that would contain the textual equivalent of the bitmap for accessibility and search purposes.

Right now we have a concrete, non-hypothetical use case where there is a spec to expose textual cue content from MPEG files that are not WebVTT cues and whose type is specified in TextTrack.inBandMetadataTrackDispatchType so the JS developer is able to apply a particular cue parser to it without the browser doing the parsing or rendering:
http://www.cablelabs.com/specifications/CL-SP-HTML5-MAP-I02-120510.pdf

This is the main reason to get the .text back on something more generic than VTTCue.


> (Most of the discussion for these changes happened while I was on leave, so
> let me know if I've missed something.)

You likely have: I've updated the summary of the discussion at http://www.w3.org/html/wg/wiki/User:Spfeiffe/TextTrackChange#Discussion_issues so you can review it there.

Comment 37 Silvia Pfeiffer 2013-08-12 22:35:28 UTC

(In reply to comment #34)
>
> > do you have any stats on usage in the wild?
> 
> Not really stats, but see
> https://github.com/
> search?l=javascript&q=TextTrackCue&ref=searchresults&type=Code for some
> examples.

Yeah... looks like we've broken the TextTrackCue API so often - there are several incompatible uses of it there already.

Comment 38 Silvia Pfeiffer 2013-08-12 22:41:14 UTC

(In reply to comment #35)
> In order to address the issue where a textTrackCue object may or may not be
> rendered by the user agent I would like to propose the following:
> 
> The definition of the mode attribute on the TextTrack interface is modified
> such that any attempts to change the mode to 'showing' fails unless there
> there is a cueFormatHint(as per
> http://www.w3.org/html/wg/wiki/User:Spfeiffe/TextTrackChange#.283 prop (3) ) 
> present that is recognized by the User Agent as a TextTrack with renderable
> TextTrackCues based on this hint. 
> 
> In this way the application can detect whether to attempt CC display of
> texttracks  unrecognized by the user agent by utilizing a JS method.

Hmm... you are correct - we have to deal with the "showing" mode for cues of type TextTrackCue. That's actually a problem of both specs - the WHATWG spec, too.

Comment 39 Silvia Pfeiffer 2013-09-24 12:25:00 UTC

First commit towards resolving this bug has been made:
https://github.com/w3c/html/commit/e254307b50a0411ca5e8f37587817804c3dd3d14

Missing:
* need to point to DataCue in 4.8.10.12.2 Sourcing in-band text tracks for in-band metadata tracks
* need to update 4.8.9 The track element to clarify that any TextTrack that is not rendered by the browser is kind=metadata

Comment 40 Silvia Pfeiffer 2013-09-26 02:47:19 UTC

EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are
satisfied with this response, please change the state of this bug to CLOSED. If
you have additional information and would like the Editor to reconsider, please
reopen this bug. If you would like to escalate the issue to the full HTML
Working Group, please add the TrackerRequest keyword to this bug, and suggest
title and text for the Tracker Issue; or you may create a Tracker Issue
yourself, if you are able to do so. For more details, see this document:

   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Accepted with alternative resolution
Change Description:
https://github.com/w3c/html/commit/e254307b50a0411ca5e8f37587817804c3dd3d14
https://github.com/w3c/html/commit/026452845c5e83860c70b82cddba1bc9e94da7ad

Rationale:
The introduction of a DataCue allows for the use cases that this bug is asking for, while TextTrackCue remains the abstract parent interface for all concrete cue formats.

Comment 41 Glenn Adams 2013-09-26 04:28:09 UTC

Thanks!