This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 25353 - [DataCue] change DOMString? text to any? value
Summary: [DataCue] change DOMString? text to any? value
Status: RESOLVED WONTFIX
Alias: None
Product: HTML WG
Classification: Unclassified
Component: HTML5 spec (show other bugs)
Version: unspecified
Hardware: PC All
: P2 enhancement
Target Milestone: ---
Assignee: This bug has no owner yet - up for the taking
QA Contact: HTML WG Bugzilla archive list
URL:
Whiteboard:
Keywords:
: 25356 (view as bug list)
Depends on:
Blocks: 25352 25354 25356
  Show dependency treegraph
 
Reported: 2014-04-15 20:34 UTC by Edward O'Connor
Modified: 2016-04-25 18:59 UTC (History)
11 users (show)

See Also:


Attachments

Description Edward O'Connor 2014-04-15 20:34:20 UTC
Currently DataCue.data is the raw data and sometimes (when the engine knows the data to be textual) DataCue.text is a more user-friendly form of that data. There are many more cases where engines or their underlying media frameworks know how to parse metadata. Let's replace .text with a .value which can carry data of several types.
Comment 1 Boris Zbarsky 2014-04-15 23:48:19 UTC
"any?" is not a valid type.  You presumably just mean "any".
Comment 2 Aaron Colwell 2014-04-16 16:15:14 UTC
I'm concerned this will lead to unspeced "protocols" between the UA and the web application if we do this. Why is having a generic DataCue that returns an any better than defining specific cue types for different types of information that can be returned? 

This feels a little like the postMessage()/onmessage pairing, but the posting entity is not actually visible to the web application so it has no idea what the universe of things to expect could contain.

I also feels like this is taking us away from the text track concept into something more "timed event"-like which makes me wonder if this should even be considered a text track.
Comment 3 Bob Lund 2014-04-16 16:46:01 UTC
(In reply to Aaron Colwell from comment #2)
> I'm concerned this will lead to unspeced "protocols" between the UA and the
> web application if we do this. Why is having a generic DataCue that returns
> an any better than defining specific cue types for different types of
> information that can be returned?

There are many types of metadata used in various media containers and, initially, no way for script to determine what type of metadata was in the track. I filed a bug [1] to address this limitation. The resolution was creation of inBandTrackMetadataDispatchType that applied only to metadata text tracks. This resolution defined what the UA needed to provide so that the UA could provide script information to decode the track while not requiring the UA to know the syntax of all the various types of metadata.

As I understand this bug, there may be UAs that can recognize some types of metadata and, in this case, format the data for script. This ability doesn't solve the general case I identified in [1]; there are many more types of metadata that any UA will recognize. Case in point is Nielsen ratings added a new private data track to MPEG-2 TS and supplied a Web app that used this data to do ratings on browser based platforms. All the UA needed to do was to populate the inBand...DispatchType attribute and make the raw cue available. The UA did not need to be extended to recognize the new metadata cue syntax.
 
> 
> This feels a little like the postMessage()/onmessage pairing, but the
> posting entity is not actually visible to the web application so it has no
> idea what the universe of things to expect could contain.
> 
> I also feels like this is taking us away from the text track concept into
> something more "timed event"-like which makes me wonder if this should even
> be considered a text track.

I don't see this. The bug is asking for a way so that the UA can format the metadata for those types it recognizes. The timed aspect of text tracks is still there.



[1] https://www.w3.org/Bugs/Public/show_bug.cgi?id=13359
Comment 4 Aaron Colwell 2014-04-16 17:16:58 UTC
(In reply to Bob Lund from comment #3)
> (In reply to Aaron Colwell from comment #2)
> > I'm concerned this will lead to unspeced "protocols" between the UA and the
> > web application if we do this. Why is having a generic DataCue that returns
> > an any better than defining specific cue types for different types of
> > information that can be returned?
> 
> There are many types of metadata used in various media containers and,
> initially, no way for script to determine what type of metadata was in the
> track. I filed a bug [1] to address this limitation. The resolution was
> creation of inBandTrackMetadataDispatchType that applied only to metadata
> text tracks. This resolution defined what the UA needed to provide so that
> the UA could provide script information to decode the track while not
> requiring the UA to know the syntax of all the various types of metadata.

I'm fine with this, but I'm wondering if it makes sense for the UA to emit a DataCue for metadata that it actually knows how to parse. I thought DataCue was only intended for things the UA doesn't know how to parse which is why I feel like the .value property is a little strange.

> 
> As I understand this bug, there may be UAs that can recognize some types of
> metadata and, in this case, format the data for script. This ability doesn't
> solve the general case I identified in [1]; there are many more types of
> metadata that any UA will recognize. Case in point is Nielsen ratings added
> a new private data track to MPEG-2 TS and supplied a Web app that used this
> data to do ratings on browser based platforms. All the UA needed to do was
> to populate the inBand...DispatchType attribute and make the raw cue
> available. The UA did not need to be extended to recognize the new metadata
> cue syntax.

This seems reasonable to me and I have no problem with this assuming that there is some sort of registry that indicates what the dispatch type refers to in the underlying format. In your example here, I'd expect there to be a registry entry that maps a string to specifics about how private track data is mapped to DataCues. For opaque stuff like private tracks, I realize this is the best you can do. For things like PMT and PAT, I think it is worth having explicit cue types for those since it is pretty straightforward to identify and parse that data.

>  
> > 
> > This feels a little like the postMessage()/onmessage pairing, but the
> > posting entity is not actually visible to the web application so it has no
> > idea what the universe of things to expect could contain.
> > 
> > I also feels like this is taking us away from the text track concept into
> > something more "timed event"-like which makes me wonder if this should even
> > be considered a text track.
> 
> I don't see this. The bug is asking for a way so that the UA can format the
> metadata for those types it recognizes. The timed aspect of text tracks is
> still there.

My problem is that in the general case this isn't "text" and doesn't really have anything to do with captioning or accessibility. Calling this type of stuff a text track is not quite accurate. For example, does PMT really have an end time? Isn't it just active until the next PMT? Do we really want to keep all PMTs in the cuesList especially since they occur frequently? I'm wondering if these types of "data/metadata tracks" should really be their own thing and not placed in the text track class hierarchy just because the text track stuff looks close enough to what we need.
Comment 5 Glenn Adams 2014-04-16 17:28:54 UTC
(In reply to Aaron Colwell from comment #4)
> I thought DataCue
> was only intended for things the UA doesn't know how to parse which is why I
> feel like the .value property is a little strange.

Not exactly. It is intended to be used for @kind metadata tracks, which imply the UA doesn't know how to process (e.g., render) the content.

At a minimum, the UA will need to parse the container sufficiently to extract a data cue's content (payload), but it may or may not be able to parse that payload.

One basic type of parsing a UA may provide on that payload, if it can determine the payload is an encoded string (text), is to decode the payload bytes into a string, which was the original intention of the DOMString text attribute. But that doesn't imply the UA knows enough about that string to further parse it into a structure, e.g., if is plain text, xml text, json text, etc.

My understanding of Ted's ask is that it should be possible for a UA that does know how to parse the cue's payload into something other than a string to expose the result of that parsing to the APP, without forcing every APP to perform the same decoding on the APP side of the fence.
Comment 6 Glenn Adams 2014-04-16 17:30:48 UTC
(In reply to Aaron Colwell from comment #4)
> My problem is that in the general case this isn't "text" and doesn't really
> have anything to do with captioning or accessibility. Calling this type of
> stuff a text track is not quite accurate.

Some of us don't assume that a text track has anything to do with captioning or accessibility. You seem to have assigned it a more restricted role, which seems unwarranted.
Comment 7 Eric Carlson 2014-04-16 17:44:12 UTC
(In reply to Glenn Adams from comment #5)
> (In reply to Aaron Colwell from comment #4)
> > I thought DataCue
> > was only intended for things the UA doesn't know how to parse which is why I
> > feel like the .value property is a little strange.
> 
> Not exactly. It is intended to be used for @kind metadata tracks, which
> imply the UA doesn't know how to process (e.g., render) the content.
> 
I wouldn't say that a UA doesn't know how to render the content, metadata is "data about data" so it isn't necessarily meant to be rendered.

> My understanding of Ted's ask is that it should be possible for a UA that
> does know how to parse the cue's payload into something other than a string
> to expose the result of that parsing to the APP, without forcing every APP
> to perform the same decoding on the APP side of the fence.
>
We are also suggesting this because some formats, eg. HLS, can have metadata with non-POD values. We *could* serialize these values to XML or JSON and return them from the existing .text attribute, but a script will almost always need to create an Object/Document to work with it so have a .value attribute is more efficient.
Comment 8 Aaron Colwell 2014-04-16 17:52:28 UTC
(In reply to Glenn Adams from comment #5)
> (In reply to Aaron Colwell from comment #4)
> > I thought DataCue
> > was only intended for things the UA doesn't know how to parse which is why I
> > feel like the .value property is a little strange.
> 
> Not exactly. It is intended to be used for @kind metadata tracks, which
> imply the UA doesn't know how to process (e.g., render) the content.

I have no problem with this.

> 
> At a minimum, the UA will need to parse the container sufficiently to
> extract a data cue's content (payload), but it may or may not be able to
> parse that payload.

I have no problem with this.

> 
> One basic type of parsing a UA may provide on that payload, if it can
> determine the payload is an encoded string (text), is to decode the payload
> bytes into a string, which was the original intention of the DOMString text
> attribute. But that doesn't imply the UA knows enough about that string to
> further parse it into a structure, e.g., if is plain text, xml text, json
> text, etc.

If the UA knows that this is text, then I'd expect it to return a type that indicates "I know this is text data, but I don't understand how to parse it". Perhaps something like UnparsedTextCue or some other name that clearly indicates it is text, but the UA doesn't understand anything more than that.

> 
> My understanding of Ted's ask is that it should be possible for a UA that
> does know how to parse the cue's payload into something other than a string
> to expose the result of that parsing to the APP, without forcing every APP
> to perform the same decoding on the APP side of the fence.

My ask is that, if the UA knows how to parse it, then it should emit a different cue type that explicitly indicates the parsed information instead of this generic DataCue.
Comment 9 Aaron Colwell 2014-04-16 17:55:03 UTC
(In reply to Glenn Adams from comment #6)
> (In reply to Aaron Colwell from comment #4)
> > My problem is that in the general case this isn't "text" and doesn't really
> > have anything to do with captioning or accessibility. Calling this type of
> > stuff a text track is not quite accurate.
> 
> Some of us don't assume that a text track has anything to do with captioning
> or accessibility. You seem to have assigned it a more restricted role, which
> seems unwarranted.

Ok. Drop that part of the sentence. We aren't only talking about text data here. It sounds like arbitrary types of data are intended to be conveyed via this mechanism but all the names in the object hierarchy refer to text.
Comment 10 Glenn Adams 2014-04-16 20:07:53 UTC
(In reply to Eric Carlson from comment #7)
> (In reply to Glenn Adams from comment #5)
> > (In reply to Aaron Colwell from comment #4)
> > > I thought DataCue
> > > was only intended for things the UA doesn't know how to parse which is why I
> > > feel like the .value property is a little strange.
> > 
> > Not exactly. It is intended to be used for @kind metadata tracks, which
> > imply the UA doesn't know how to process (e.g., render) the content.
> > 
> I wouldn't say that a UA doesn't know how to render the content, metadata is
> "data about data" so it isn't necessarily meant to be rendered.

We are using different meanings for "content". I was referring to the content of the ostensible metadata payload, not the media content with which this payload is associated.
Comment 11 Glenn Adams 2014-04-16 20:10:40 UTC
(In reply to Aaron Colwell from comment #8)
> My ask is that, if the UA knows how to parse it, then it should emit a
> different cue type that explicitly indicates the parsed information instead
> of this generic DataCue.

If such a type (interface) were to add members beyond what is in a base DataCue type, then that might be warranted. However, there is a fairly significant standardization overhead with defining a new public type. If the base type can work as is, then the extra overhead may not be justified.
Comment 12 Glenn Adams 2014-04-16 20:12:12 UTC
(In reply to Aaron Colwell from comment #9)
> (In reply to Glenn Adams from comment #6)
> > (In reply to Aaron Colwell from comment #4)
> > > My problem is that in the general case this isn't "text" and doesn't really
> > > have anything to do with captioning or accessibility. Calling this type of
> > > stuff a text track is not quite accurate.
> > 
> > Some of us don't assume that a text track has anything to do with captioning
> > or accessibility. You seem to have assigned it a more restricted role, which
> > seems unwarranted.
> 
> Ok. Drop that part of the sentence. We aren't only talking about text data
> here. It sounds like arbitrary types of data are intended to be conveyed via
> this mechanism but all the names in the object hierarchy refer to text.

True. And I would have never picked that name, but that horse has left the gate (perhaps).
Comment 13 Aaron Colwell 2014-04-17 16:40:57 UTC
(In reply to Glenn Adams from comment #11)
> (In reply to Aaron Colwell from comment #8)
> > My ask is that, if the UA knows how to parse it, then it should emit a
> > different cue type that explicitly indicates the parsed information instead
> > of this generic DataCue.
> 
> If such a type (interface) were to add members beyond what is in a base
> DataCue type, then that might be warranted. However, there is a fairly
> significant standardization overhead with defining a new public type. If the
> base type can work as is, then the extra overhead may not be justified.

It is not lost on me that my comments here might be considered part of that extra overhead. :) 

My primary concern here is that this design easily leads to simply returning key/value hashmaps for .value instead of requiring a type to be defined and the semantics to be explicitly defined for each attribute. This leaves implementers in a situation where they have to reverse engineer the hashmap and hope that the semantics don't change somewhere down the road or new keys magically appear or are conditionally present. It is also unclear what web applications are supposed to expect from this interface since sometimes .data is parsed and sometimes it is not, which means applications will have to always have their own parsers "just in case". I'm not sure that is a good interop story.
Comment 14 Glenn Adams 2014-04-17 16:56:08 UTC
(In reply to Aaron Colwell from comment #13)
> (In reply to Glenn Adams from comment #11)
> > (In reply to Aaron Colwell from comment #8)
> > > My ask is that, if the UA knows how to parse it, then it should emit a
> > > different cue type that explicitly indicates the parsed information instead
> > > of this generic DataCue.
> > 
> > If such a type (interface) were to add members beyond what is in a base
> > DataCue type, then that might be warranted. However, there is a fairly
> > significant standardization overhead with defining a new public type. If the
> > base type can work as is, then the extra overhead may not be justified.
> 
> It is not lost on me that my comments here might be considered part of that
> extra overhead. :) 
> 
> My primary concern here is that this design easily leads to simply returning
> key/value hashmaps for .value instead of requiring a type to be defined and
> the semantics to be explicitly defined for each attribute. This leaves
> implementers in a situation where they have to reverse engineer the hashmap
> and hope that the semantics don't change somewhere down the road or new keys
> magically appear or are conditionally present. It is also unclear what web
> applications are supposed to expect from this interface since sometimes
> .data is parsed and sometimes it is not, which means applications will have
> to always have their own parsers "just in case". I'm not sure that is a good
> interop story.

I think it is fine. The world doesn't work in lock step. Evolution is required to determine what is important and what can be discarded. Current information is always imperfect. Extensibility mechanisms are important even though they allow a certain degree of non-interoperability. We don't know what metadata types every browser will support, so we need a way to move from zero to that state. View DataCue as one link in a longer chain still being forged.
Comment 15 Brendan Long 2014-04-17 20:06:20 UTC
I'd like to see us standardize how we come up with .value, since it would create additional work for JS authors if they couldn't rely on it. I suggested having a table to Eric, but he said that there are cases where an ID3 tag doesn't have a well-known name, but you can still figure out the type. I'd like to have a section in the spec (or a reference) saying how we decide what type the contents are, so we can be sure that every UA will provide the same .value.
Comment 16 Silvia Pfeiffer 2014-05-19 08:06:28 UTC
Some time ago, I wrote a blog post on "Your metadata is not my metadata": http://gingertech.net/2010/10/01/your-metadata-is-not-my-metadata/ . I think this whole thread is a prime example of this.

Here are are some replies in random order on the various topics of discussion:

* "text track" is a misnomer
I agree to some extent. The "text track" concept is really about time-aligned blobs of data. However, there are some implications of this naming that I like: 
  - text is clearly a discontinuous data stream (i.e. we have to expect gaps in the timeline). 
  - we don't expect a text track to be of as high volume as a video track.
  - the main use case, while not the only one, is text-based, namely captions and subtitles.
  - captions/subtitles have a history and well understood properties - "data track" would be a fair bit less concrete, though in hindsight might have been a better choice

* DataCue objects created for metadata
While DataCue objects are on a kind=@metadata track, that use of the word "metadata" is actually wrong (since "timed blobs of data" clearly aren't metadata). So, we should have probably used kind=@timeddata rather than kind=@metadata (maybe we can still fix this - let's see).


To make clear what we created DataCue for, I've gone ahead and explained its use in the context of Matroska subtitle tracks - in particular SSA/ASS tracks:
http://rawgit.com/silviapfeiffer/HTMLSourcingInbandTracks/master/index.html#webm (see the green notes)

This is the use case that we created DataCue for: streams of data extracted from a media resource that the UA finds easy to map into TextTrackCue objects to hand over to JS, but has no intention of rendering or even parsing.

We did not create DataCue for parsed content!

Therefore, for the suggested use case of timed *metadata* (the sort that I called "Timed Semantic Metadata" in my blog post), we should think of something more useful. I would suggest we invent a JSONCue for that and chuck all parsed metadata into JSON objects that we let the browser extract.

Here's what I would suggest:

[Constructor(double startTime, double endTime, JSONObject metadata)]
interface JSONCue : TextTrackCue {
                    JSONObject metadata;
  readonly attribute DOMString type;
}

** metadata would contain the actual metadata
** type would contain a descriptor for what kind of metadata this is - we might need to start a list (e.g. 'id3')

This is using the JSONObject as defined in ECMA-262 [1]. We could also make it a JSONArray, but then we'd not have structured metadata. That's fine IMO, but maybe we don't have to be this strict.

This will allow the creation of timed metadata tracks for media resources, such as timed ID3 tags, or marking time intervals where there are ads expected, or marking time intervals for copyright and other rights ownership, or for viewership restrictions etc.

Since we're including @type into the cue, it's possible to mash up all sorts of timed metadata in a single track, which is ok, seeing as the browser is not rendering any of it and just handing over the data to the JS dev.

Incidentally, defining the @type values doesn't need us to also define the format of the JSON blob that's in @metadata, seeing as we're leaving the interpretation of that blob to the JS dev. They can do whatever data analysis/filtering they need.

Thoughts?



[1] http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf
Comment 17 Silvia Pfeiffer 2014-05-19 08:13:56 UTC
Incidentally, just reading bug 25354, I'd also be ok to define a JSONCue and a ID3Cue, if ID3Cue is too complex to just expose ID3 content in JSON. I guess, we could always require base64 for images.
Comment 18 Silvia Pfeiffer 2014-05-19 08:37:37 UTC
*** Bug 25356 has been marked as a duplicate of this bug. ***
Comment 19 Brendan Long 2014-05-19 15:06:58 UTC
This makes sense to me, except for this:

> Incidentally, defining the @type values doesn't need us to also define the
> format of the JSON blob that's in @metadata, seeing as we're leaving the
> interpretation of that blob to the JS dev. They can do whatever data
> analysis/filtering they need.

If the metadata attribute has no defined content, then we force devs back into the dark ages of web development: Try everything we can think of and hope we didn't miss something. Or more likely, use the format that [your favorite browser] uses and ignore all of the others.

I think it would be worthwhile to standardize the content for each type, and expose everything that isn't standardized as an unparsed DataCue, since at least that way everything would be consistent. Having undefined JSON is worse than undefined binary, because the binary should be the original data, while the JSON is "someone's interpretation of what would be useful in that data".
Comment 20 Silvia Pfeiffer 2014-06-01 04:16:20 UTC
Brendan is right.

I'd encourage interested parties to start developing a ID3Cue spec with a definition of the metadata name-value pairs that are supported.
Comment 21 Silvia Pfeiffer 2015-06-20 03:12:50 UTC
I'm curious where people stand with this? DataCue was removed from the HTML5 spec. Is there still interest in a generic cue container object? Was anything implemented? Could we just close this bug and other DataCue related bugs and declare it a failed approach?
Comment 22 Bob Lund 2015-06-22 12:36:34 UTC
(In reply to Silvia Pfeiffer from comment #21)
> I'm curious where people stand with this? DataCue was removed from the HTML5
> spec. Is there still interest in a generic cue container object? Was
> anything implemented? Could we just close this bug and other DataCue related
> bugs and declare it a failed approach?

We're implementing a polyfill to expose DASH Events in the MPD and ISOBMFF media segments where MSE is used for the DASH client. Initially we defined a DASHEVentCue object interface. It had the same interface as DataCue. For this reason, we decided to implement DataCue instead of DASHEventCue. I think all metadata cues will be the same - what would be different? It's all about passing data to JavaScript that is opaque to the UA.

If DataCue goes away then I think we'll have a proliferation of more specific metadata cues, all with the same interface.

As soon as this work is completed we'll make it available for others to evaluate and use.

Bob
Comment 23 Travis Leithead [MSFT] 2016-04-25 18:59:27 UTC
HTML5.1 Bugzilla Bug Triage: Won't Fix. Looks like DataCue is back in 5.1, and this thread left off with Bob saying he'll return and report. For further issue tracking let's migrate to GitHub to avoid duplicate issue tracking systems.

If this resolution is not satisfactory, please copy the relevant bug details/proposal into a new issue at the W3C HTML5 Issue tracker: https://github.com/w3c/html/issues/new where it will be re-triaged. Thanks!