This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 24563 - Make TextTrack.kind/language writable to work with MSE
Summary: Make TextTrack.kind/language writable to work with MSE
Status: RESOLVED WORKSFORME
Alias: None
Product: WHATWG
Classification: Unclassified
Component: HTML (show other bugs)
Version: unspecified
Hardware: Other other
: P3 normal
Target Milestone: Unsorted
Assignee: Ian 'Hixie' Hickson
QA Contact: contributor
URL: http://www.whatwg.org/specs/web-apps/...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-02-06 16:58 UTC by contributor
Modified: 2014-07-30 00:00 UTC (History)
5 users (show)

See Also:


Attachments

Description contributor 2014-02-06 16:58:49 UTC
Specification: http://www.whatwg.org/specs/web-apps/current-work/multipage/the-video-element.html
Multipage: http://www.whatwg.org/C#text-track-api
Complete: http://www.whatwg.org/c#text-track-api
Referrer: http://www.whatwg.org/specs/web-apps/current-work/multipage/

Comment:
Make TextTrack.kind/language writable to work with MSE

Posted from: 42.112.56.121
User agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.76 Safari/537.36 OPR/19.0.1326.45 (Edition Next)
Comment 1 Philip Jägenstedt 2014-02-06 17:05:35 UTC
Forked from https://www.w3.org/Bugs/Public/show_bug.cgi?id=24370

MSE does this:

partial interface TextTrack {
                attribute DOMString     kind;
                attribute DOMString     language;
    readonly    attribute SourceBuffer? sourceBuffer;
};

An implementation supporting MSE would have a problem, since kind/language are readonly in HTML.

Can HTML change to make these attributes writable and define what happens when a HTMLTrackElement is the underlying object?

Or is there a better solution to the problem stated in the https://www.w3.org/Bugs/Public/show_bug.cgi?id=24370#c2 ?
Comment 2 Ian 'Hixie' Hickson 2014-02-06 18:58:21 UTC
What's the problem here? I don't understand. Why would they be writable?
Comment 3 Ian 'Hixie' Hickson 2014-02-06 19:36:58 UTC
(Wouldn't a track's kind and language be set on creation? I don't understand the use case for mutating these. Seems like it'd be really confusing, you pick an English track, then later it changes to Vietnamese?)
Comment 4 Philip Jägenstedt 2014-02-07 03:11:49 UTC
Aaron, can you explain the problem? I don't think I'd do a good job :)
Comment 5 Philip Jägenstedt 2014-02-07 03:13:35 UTC
(In reply to Ian 'Hixie' Hickson from comment #3)
> (Wouldn't a track's kind and language be set on creation? I don't understand
> the use case for mutating these. Seems like it'd be really confusing, you
> pick an English track, then later it changes to Vietnamese?)

This is already the case though: "The language of a text track can change dynamically, in the case of a text track corresponding to a track element."
Comment 6 Aaron Colwell 2014-02-07 17:41:41 UTC
In the case of MSE, the media data may not actually contain the language and kind information. For adaptive streaming solutions like HLS & DASH, this information is typically provided in the manifest file, which MSE is completely unaware of. MSE only deals with the media segments and the web application is responsible for handling the manifest and fetching the appropriate media segments. The language and kind need to be mutable so that the web application update the XXXTrack objects based on what is specified in the manifest.
Comment 7 Ian 'Hixie' Hickson 2014-02-07 17:54:26 UTC
Wouldn't you have that information before you knew you had the track?

foolip: in the case of <track>, there is no use case for changing these, it's only something we have to handle because the DOM is mutable.

If we do make these mutable, what should we do when the underlying data (in-band tracks) is not mutable? What should we do if you change the metadata of a video track, in the user interface, in media controllers, etc?
Comment 8 Aaron Colwell 2014-02-07 18:06:23 UTC
(In reply to Ian 'Hixie' Hickson from comment #7)
> Wouldn't you have that information before you knew you had the track?

Yes, but how is the application supposed to convey this information? It seemed to me that the cleanest solution would be to have the information updated once the actual XXXTrack objects exist. Otherwise you have do have something along the lines of "if you ever have an audio track, then set it's language & kind to x & y respectively". That seems a little awkward and less direct than explicitly setting these values on the track objects themselves.

> 
> foolip: in the case of <track>, there is no use case for changing these,
> it's only something we have to handle because the DOM is mutable.
> 
> If we do make these mutable, what should we do when the underlying data
> (in-band tracks) is not mutable? What should we do if you change the
> metadata of a video track, in the user interface, in media controllers, etc?

I think that the UA should always honor what the attribute values indicate. If the language data in the in-band track says English, but the web application changes the language to German for some reason, then I think the UA should treat the track as German. That is what I would expect to happen if I saw that these attributes were mutable.
Comment 9 Philip Jägenstedt 2014-02-07 18:45:20 UTC
(In reply to Ian 'Hixie' Hickson from comment #7)
> Wouldn't you have that information before you knew you had the track?
> 
> foolip: in the case of <track>, there is no use case for changing these,
> it's only something we have to handle because the DOM is mutable.

I know, I'm just saying that we already have to deal with the changing at any time for TextTracks.

> If we do make these mutable, what should we do when the underlying data
> (in-band tracks) is not mutable? What should we do if you change the
> metadata of a video track, in the user interface, in media controllers, etc?

That's a good point. I assumed that the spec already had some hooks to "honor user preferences for automatic text track" when language or type changes, but I can't find it. I'm not sure how complicated it would be, but avoiding the problem for audio and video tracks seems preferable, if it can be done.
Comment 10 Philip Jägenstedt 2014-02-07 18:50:25 UTC
I suppose the problem here is that with MSE there is no underlying mutable object that could mirror the role of HTMLTrackElement here, there's just SourceBuffer and then the next level is *TrackList... I don't have any good ideas at this point.
Comment 11 Ian 'Hixie' Hickson 2014-02-07 22:18:01 UTC
(In reply to Aaron Colwell from comment #8)
> > Wouldn't you have that information before you knew you had the track?
> 
> Yes, but how is the application supposed to convey this information?

Well presumably, you have to create the track somehow, right? e.g. using a constructor or a factory function? (I'm not familiar enough with MSE to know how it works.) Wherever you create the track would be the place to set its language and kind.


> > If we do make these mutable, what should we do when the underlying data
> > (in-band tracks) is not mutable? What should we do if you change the
> > metadata of a video track, in the user interface, in media controllers, etc?
> 
> I think that the UA should always honor what the attribute values indicate.
> If the language data in the in-band track says English, but the web
> application changes the language to German for some reason, then I think the
> UA should treat the track as German. That is what I would expect to happen
> if I saw that these attributes were mutable.

That seems like a really scary API to me. Why would you ever want a script to be able to override the video file's internal data? Why not allow it to override the frame data as well, at that point? Or the colour profile, or anything else?


> I know, I'm just saying that we already have to deal with the changing at
> any time for TextTracks.

Yes. (Not for audio or video tracks, though. And only because we have little other choice for text tracks.)


> > If we do make these mutable, what should we do when the underlying data
> > (in-band tracks) is not mutable? What should we do if you change the
> > metadata of a video track, in the user interface, in media controllers, etc?
> 
> That's a good point. I assumed that the spec already had some hooks to
> "honor user preferences for automatic text track" when language or type
> changes, but I can't find it.

I don't think it does; such changes aren't expected to happen and I've not spent any time optimising the algorithms for it.


(In reply to Philip Jägenstedt from comment #10)
> I suppose the problem here is that with MSE there is no underlying mutable
> object that could mirror the role of HTMLTrackElement here, there's just
> SourceBuffer and then the next level is *TrackList... I don't have any good
> ideas at this point.

Well one option is to just introduce an underlying mutable object where you stick the video data and metadata, and then inject those objects into the media element and thus create the read-only *TrackList and *Track objects. But that would presumably require changes to MSE. I'm not familiar with that spec.
Comment 12 Aaron Colwell 2014-02-19 02:50:35 UTC
(In reply to Ian 'Hixie' Hickson from comment #11)
> (In reply to Aaron Colwell from comment #8)
> > > Wouldn't you have that information before you knew you had the track?
> > 
> > Yes, but how is the application supposed to convey this information?
> 
> Well presumably, you have to create the track somehow, right? e.g. using a
> constructor or a factory function? (I'm not familiar enough with MSE to know
> how it works.) Wherever you create the track would be the place to set its
> language and kind.

You should take a look at the spec. Tracks are created in response to initialization segments being appending to a SourceBuffer object. These are binary, format specific pieces of data that describe the tracks. Requiring the application to modify this data essentially means that they need to implement a parser AND a generator. The whole point was to prevent the application from needing to parse this binary data.

> 
> 
> > > If we do make these mutable, what should we do when the underlying data
> > > (in-band tracks) is not mutable? What should we do if you change the
> > > metadata of a video track, in the user interface, in media controllers, etc?
> > 
> > I think that the UA should always honor what the attribute values indicate.
> > If the language data in the in-band track says English, but the web
> > application changes the language to German for some reason, then I think the
> > UA should treat the track as German. That is what I would expect to happen
> > if I saw that these attributes were mutable.
> 
> That seems like a really scary API to me. Why would you ever want a script
> to be able to override the video file's internal data? Why not allow it to
> override the frame data as well, at that point? Or the colour profile, or
> anything else?

The same argument could have been made for text documents. I think it has been proven that being able to edit text content has been beneficial. Treating media as immutable seems inconsistent to me. Having hooks like output to Canvas, WebGL and WebAudio show that people actually DO want to modify media. I don't think we should hold media to a different standard than text. Plus the level of mutability that I am suggesting is rather small.

> 
> 
> > I know, I'm just saying that we already have to deal with the changing at
> > any time for TextTracks.
> 
> Yes. (Not for audio or video tracks, though. And only because we have little
> other choice for text tracks.)
> 
> 
> > > If we do make these mutable, what should we do when the underlying data
> > > (in-band tracks) is not mutable? What should we do if you change the
> > > metadata of a video track, in the user interface, in media controllers, etc?
> > 
> > That's a good point. I assumed that the spec already had some hooks to
> > "honor user preferences for automatic text track" when language or type
> > changes, but I can't find it.
> 
> I don't think it does; such changes aren't expected to happen and I've not
> spent any time optimising the algorithms for it.
> 
> 
> (In reply to Philip Jägenstedt from comment #10)
> > I suppose the problem here is that with MSE there is no underlying mutable
> > object that could mirror the role of HTMLTrackElement here, there's just
> > SourceBuffer and then the next level is *TrackList... I don't have any good
> > ideas at this point.
> 
> Well one option is to just introduce an underlying mutable object where you
> stick the video data and metadata, and then inject those objects into the
> media element and thus create the read-only *TrackList and *Track objects.
> But that would presumably require changes to MSE. I'm not familiar with that
> spec.

It seems silly to have to introduce another object hierarchy just to make these 2 attributes mutable. Perhaps you should actually look at the MSE spec to better inform your suggestions.
Comment 13 Ian 'Hixie' Hickson 2014-02-20 20:33:42 UTC
Do you have a link to the relevant spec?
Comment 14 Philip Jägenstedt 2014-02-21 06:34:35 UTC
https://dvcs.w3.org/hg/html-media/raw-file/tip/media-source/media-source.html#sourcebuffer

SourceBuffer is the object to which raw bytes are added, and where audio/video/text tracks show up. Because one SourceBuffer can result in any number of tracks, it isn't possible to have some attributes on that object to give the kind/language.

I'm not sure about this, but I think that the root problem here is that the media data itself doesn't have the metadata at all, but that it's provided out-of-band, e.g. in a DASH manifest. Presumably the media data *could* contain the correct metadata, but presumably the ability to use existing content with MSE was seen as more important than avoiding a mutable kind/language.
Comment 15 Aaron Colwell 2014-02-21 17:17:12 UTC
(In reply to Ian 'Hixie' Hickson from comment #13)
> Do you have a link to the relevant spec?

Yes. https://dvcs.w3.org/hg/html-media/raw-file/default/media-source/media-source.html#sourcebuffer  (Note this is slightly different than philipj's URL because of Mercurial. The "default" branch, not "tip" actually tracks the Editor's draft.)

Sorry for being a little snippy in Comment 12.

Perhaps it might be a little easier to think of this in terms of a non-MSE use case first. How is the HTMLMediaElement supposed to behave if the language for a track changes in the middle of a presentation? I'm thinking of something like a live cable-access channel where you might have the Vietnamese local community show in Vietnamese, followed by some punk show in English, followed by a telenovela, etc. Presumably the language info in the stream could change based on which program is playing. Given this scenario, I have a few questions.
- Is the expectation that tracks would be added/removed as the language changes?
- If tracks are added/removed, how is the selected/enabled track set supposed to be updated? (I ask because I don't see any rules in the spec about this.)

Another case I'd like some clarity on is how alternate-language tracks that come and go should be handled. Here I'm thinking of a SAP track in traditional TV where you can hear Spanish instead of English. Typically there are periods where there is an English and Spanish track and periods where there is only an English track (typically commercials). If I'm a Spanish language consumer, I'd want to here Spanish as much as possible so I'd like the media element to select the Spanish track when available, and fallback to English if it isn't available. Is this behavior defined somewhere? I couldn't find it.


I bring up these 2 cases, because I expect the MSE mutable language feature to simply trigger the algorithms that handle the behaviors for these 2 common broadcast TV use cases. I suppose I could add a SourceBuffer.changeLanguage(VideoTrack videoTrack, DOMString language) to trigger an add/remove sequence, but API wise that seems...ugly.
Comment 16 Ian 'Hixie' Hickson 2014-02-27 19:28:38 UTC
There's no support right now in <video> for a video stream to change language on the fly. In the case you describe, there would either be multiple video tracks, and script would have to manually flip between them, or the one track would have one language that was not always accurate. (It's not really clear to me what value the language metadata would have on a track if the track kept changing content language anyway. The language metadata is intended to help the user pick a track amongst several, for when they start watching a movie. If there's only one track in the first place, who cares what language it's described as.)

Similarly, there's no built-in support for an audio track that keeps coming and going and flipping between them. The expectation is that ads would use a different <video> element.

To be clear, <video> isn't intended for broadcast TV. It's intended for the Web. It's highly unusual (maybe even unheard of?) for Web users to stream content channels with long-lived tracks that flip between languages like broadcast TV. It doesn't really make any sense to use a single video stream for lots of different content. If you want to show ads, you bring up a separate <video> element with the ads. Since you have to have script for MSE anyway, it's trivial for such a script to also manage the language selection, etc, when ads are shown with different sets of audio tracks, or subtitles, or whatnot.


Looking at the MSE spec, I don't understand what's going on at all. What is the precise data format that is being appended via appendBuffer() et al? If it's just a raw media file, why would you need to be able to mutate any of the data once it's parsed? I mean, I don't see any way to mutate individual frames or change the video dimensions of frame rate or whatever; why is the language any different? How is a script supposed to splice different content together? You can't just cut an H.264 file half-way through a frame and splice in another file and have anything useful result.
Comment 17 Aaron Colwell 2014-02-27 20:44:24 UTC
(In reply to Ian 'Hixie' Hickson from comment #16)
> There's no support right now in <video> for a video stream to change
> language on the fly. In the case you describe, there would either be
> multiple video tracks, and script would have to manually flip between them,
> or the one track would have one language that was not always accurate. (It's
> not really clear to me what value the language metadata would have on a
> track if the track kept changing content language anyway. The language
> metadata is intended to help the user pick a track amongst several, for when
> they start watching a movie. If there's only one track in the first place,
> who cares what language it's described as.)

Ok. I'm not sure I agree with its lack of utility, but at least I understand now where you are coming from. If it is never expected to change, then I think that requirement should be explicitly stated in the spec.

> 
> Similarly, there's no built-in support for an audio track that keeps coming
> and going and flipping between them. The expectation is that ads would use a
> different <video> element.

This is not clear to me in the spec. It seems like there should be text in the spec that says that AudioTrackList & VideoTrackList are not allowed to change after the transition to HAVE_METADATA then if the intent was not to support this. I suspect you will get some pushback from people if you make such a change.

> 
> To be clear, <video> isn't intended for broadcast TV. It's intended for the
> Web. It's highly unusual (maybe even unheard of?) for Web users to stream
> content channels with long-lived tracks that flip between languages like
> broadcast TV. It doesn't really make any sense to use a single video stream
> for lots of different content. If you want to show ads, you bring up a
> separate <video> element with the ads. Since you have to have script for MSE
> anyway, it's trivial for such a script to also manage the language
> selection, etc, when ads are shown with different sets of audio tracks, or
> subtitles, or whatnot.

I am pretty sure broadcasters are saddened by this and it seems unfair to bias the web platform against them. This seems like an unnecessary restriction to me. Also using a seperate element practically guarentees that the transition to/from the ad won't be seamless.

> 
> 
> Looking at the MSE spec, I don't understand what's going on at all. What is
> the precise data format that is being appended via appendBuffer() et al?

It currently includes byte stream format specifications to accept WebM, ISOBMFF, or MPEG2-TS segments. Others can be added to the registry when needed.

> If it's just a raw media file, why would you need to be able to mutate any of
> the data once it's parsed? 

MSE allows you to splice segments of media together into a single presentation timeline. You aren't dealing with whole files, your dealing with portions. It also allows you splice together content from different files seamlessly into a single presentation. This is one of the draws over a multiple element approach.

> I mean, I don't see any way to mutate individual
> frames or change the video dimensions of frame rate or whatever; why is the
> language any different?

This information is stored in the "initialization segments". You can think of these as the media file headers. Since content can be mixed from different files, each new initialization segment appended could potentially change the frame dimensions, frame rate, language, etc. Even outside the MSE context there are file formats, like MPEG2-TS, that allow all this information to change over time. If <video> is supposed to be format ignostic, it seems like it should be able to accomodate these formats.


> How is a script supposed to splice different content
> together? You can't just cut an H.264 file half-way through a frame and
> splice in another file and have anything useful result.

You append segments of media. Media can be spliced at the frame level as long as you keep track of all the random access point locations and properly preserve the decoding dependencies. MSE does this and describes a buffering model for how the source buffer should be updated when media segments are appended over the top of eachother. Splices are created by simply appending media segments adjacent to eachother in time, or appending a new media segment over the top of some media data that is already in the buffer. This can be done and you can see it in action every time you watch an HTML5 YouTube video. All of the HTML5 adaptive streaming on YouTube and Netflix use MSE.
Comment 18 Silvia Pfeiffer 2014-02-27 23:00:28 UTC
(In reply to Aaron Colwell from comment #6)
> In the case of MSE, the media data may not actually contain the language and
> kind information. For adaptive streaming solutions like HLS & DASH, this
> information is typically provided in the manifest file, which MSE is
> completely unaware of. MSE only deals with the media segments and the web
> application is responsible for handling the manifest and fetching the
> appropriate media segments. The language and kind need to be mutable so that
> the web application update the XXXTrack objects based on what is specified
> in the manifest.

Is the problem that the XXXTrack objects are being created by MSE, but the kind and language come from the manifest file (DASH file etc)?

Why would it not be possible for the Web app to parse the manifest file, create the XXXTrack objects (including use of addTextTrack which allows setting kind, language etc [1]) and then hand these objects to MSE to provide the data?


[1] http://www.whatwg.org/specs/web-apps/current-work/multipage/the-video-element.html#dom-media-addtexttrack
Comment 19 Aaron Colwell 2014-02-27 23:15:16 UTC
(In reply to Silvia Pfeiffer from comment #18)
> (In reply to Aaron Colwell from comment #6)
> > In the case of MSE, the media data may not actually contain the language and
> > kind information. For adaptive streaming solutions like HLS & DASH, this
> > information is typically provided in the manifest file, which MSE is
> > completely unaware of. MSE only deals with the media segments and the web
> > application is responsible for handling the manifest and fetching the
> > appropriate media segments. The language and kind need to be mutable so that
> > the web application update the XXXTrack objects based on what is specified
> > in the manifest.
> 
> Is the problem that the XXXTrack objects are being created by MSE, but the
> kind and language come from the manifest file (DASH file etc)?
> 
> Why would it not be possible for the Web app to parse the manifest file,
> create the XXXTrack objects (including use of addTextTrack which allows
> setting kind, language etc [1]) and then hand these objects to MSE to
> provide the data?

That now assumes the manifest has all the information for each track. It is also making the assumption that I am using a manifest all the time. Sometimes all the information doesn't come from a single source just like all the pieces of a web page don't come from a single source.

I really don't understand why making these 2 attributes mutable is that big of a deal especially since it doesn't sound like there are any automatic algorithms that react to these values. It sounds like that web application is responsible for using the information to select an appropriate track. If we fired a "change" TrackEvent if one of these attributes was changed, it seems like an application would be easily be able to deal with these changes and reasses its track selection if need be.
Comment 20 Silvia Pfeiffer 2014-02-27 23:28:38 UTC
(In reply to Aaron Colwell from comment #19)
>
> I really don't understand why making these 2 attributes mutable is that big
> of a deal especially since it doesn't sound like there are any automatic
> algorithms that react to these values.

The user interface would need to react to it. Changing the menu in which the user selects e.g. which captions they want to see (based on the language and the kind) halfway through their viewing experience is poor user experience. The problem is that if you allow JS authors to change them for MSE, we actually do have to react to these values, which right now we don't.

> It sounds like that web application
> is responsible for using the information to select an appropriate track.

No it's not. The browser creates the menu from information it gets from inband tracks and <track> elements. The user then selects which one it wants to see. JS can disable this browser functionality and replicate it itself, but that's just an option. (Incidentally, the menu functionality has not been implemented in Chrome yet, but has in other browsers).

> If
> we fired a "change" TrackEvent if one of these attributes was changed, it
> seems like an application would be easily be able to deal with these changes
> and reasses its track selection if need be.

Currently, the way it is supposed to work is that there are browser settings for default languages and for the default activation of e.g. caption tracks. These will kick in for a new website unless the JS dev overrules with a @default attribute which <track> is being activated. After that, the choice is up to the user via the menu to overrule this. If JS changes kind or language mid-playback and the browser doesn't change the track, the user ends up with a kind or language they did not select. Should the browser now guess which track to pick instead?
Comment 21 Aaron Colwell 2014-02-28 00:18:55 UTC
(In reply to Silvia Pfeiffer from comment #20)
> (In reply to Aaron Colwell from comment #19)
> >
> > I really don't understand why making these 2 attributes mutable is that big
> > of a deal especially since it doesn't sound like there are any automatic
> > algorithms that react to these values.
> 
> The user interface would need to react to it. Changing the menu in which the
> user selects e.g. which captions they want to see (based on the language and
> the kind) halfway through their viewing experience is poor user experience.
> The problem is that if you allow JS authors to change them for MSE, we
> actually do have to react to these values, which right now we don't.

This menu would likely have to change if a track was added/removed mid-playback as well. What is done in that situation? It doesn't appear that the spec prevents that.

How is this different than the out-of-band case? Section 4.7.10.12.3 states
"As the kind, label, and srclang attributes are set, changed, or removed, the text track must update accordingly, as per the definitions above." You already have to handle this case. I'm just requesting that we don't hold inband tracks to a different standard.


> 
> > It sounds like that web application
> > is responsible for using the information to select an appropriate track.
> 
> No it's not. The browser creates the menu from information it gets from
> inband tracks and <track> elements. The user then selects which one it wants
> to see. JS can disable this browser functionality and replicate it itself,
> but that's just an option. (Incidentally, the menu functionality has not
> been implemented in Chrome yet, but has in other browsers).

I'm assuming you are talking about section 4.7.10.13 here. I don't really see a problem here. The controls would just reflect the current value of language and kind. I wouldn't expect changing one of these properties to actually change which track the user selected.

> 
> > If
> > we fired a "change" TrackEvent if one of these attributes was changed, it
> > seems like an application would be easily be able to deal with these changes
> > and reasses its track selection if need be.
> 
> Currently, the way it is supposed to work is that there are browser settings
> for default languages and for the default activation of e.g. caption tracks.
> These will kick in for a new website unless the JS dev overrules with a
> @default attribute which <track> is being activated. After that, the choice
> is up to the user via the menu to overrule this. If JS changes kind or
> language mid-playback and the browser doesn't change the track, the user
> ends up with a kind or language they did not select. Should the browser now
> guess which track to pick instead?

Again. What happens if a track is added mid-playback for a default language? Is it automatically selected or not?

If a track was explicitly selected by a user through the menu, I'd expect that track to stay selected independent of the attribute change.

If a track was not explicitly selected via @default or user selection and it matches one of the default languages, I'd expect things to act the same way as if a track with similar properties was added mid-playback.
Comment 22 Ian 'Hixie' Hickson 2014-03-07 22:38:51 UTC
(I haven't yet read comments 17 to 21.)

It's come to my attention that we actually already have an API that represents a specific track:

http://dev.w3.org/2011/webrtc/editor/getusermedia.html#idl-def-MediaStreamTrack

Maybe the solution here is to use MediaStream and MediaStreamTrack rather than MediaSource, or somehow merge them (e.g. provide a way to create a MediaStreamTrack from a blob). This would then let us use MediaStreamTrack to dynamically change the track metadata.
Comment 23 Aaron Colwell 2014-04-02 15:31:02 UTC
Hixie, I've started a thread on the public-html-media list (http://lists.w3.org/Archives/Public/public-html-media/2014Apr/0006.html) to collect more info based on our IM conversation. Please feel free to correct anything if I accidentally misrepresented your position in some way.
Comment 24 Aaron Colwell 2014-04-25 04:43:34 UTC
Based on the discussion during the W3C F2F meeting, I came up with a new way to deal with this issue. I basically created a way for the application to provide default values for .kind & .language that are consulted when the SourceBuffer actually creates the xxxTracks objects. This allows the attributes to stay readonly and allows the application to specify kind and language info if it is not in the appended media data. 

I've posted this new proposal to the public-html-media list if you are interested.
http://lists.w3.org/Archives/Public/public-html-media/2014Apr/0087.html
Comment 25 Ian 'Hixie' Hickson 2014-04-28 22:44:10 UTC
Yeah, that makes a lot more sense to me, semantically-speaking.
Comment 26 Aaron Colwell 2014-04-29 16:52:41 UTC
(In reply to Ian 'Hixie' Hickson from comment #25)
> Yeah, that makes a lot more sense to me, semantically-speaking.
Ok. I think you can close this bug then. I'll be updating the MSE spec to use the new proposal. Thank you for your time and patience.
Comment 27 Ian 'Hixie' Hickson 2014-04-30 18:29:27 UTC
Okie dokie. Thanks!
Comment 28 Silvia Pfeiffer 2014-05-11 22:04:09 UTC
(In reply to Ian 'Hixie' Hickson from comment #7)
> foolip: in the case of <track>, there is no use case for changing these,
> it's only something we have to handle because the DOM is mutable.

Is this something that we should clarify in the spec? E.g. does a DOM change of @language and/or @kind during video playback have any effect on the activation of text tracks? does it cause a change event on tracks?
Comment 29 Philip Jägenstedt 2014-05-11 22:19:39 UTC
(In reply to Silvia Pfeiffer from comment #28)
> (In reply to Ian 'Hixie' Hickson from comment #7)
> > foolip: in the case of <track>, there is no use case for changing these,
> > it's only something we have to handle because the DOM is mutable.
> 
> Is this something that we should clarify in the spec? E.g. does a DOM change
> of @language and/or @kind during video playback have any effect on the
> activation of text tracks? does it cause a change event on tracks?

I'm OK with the spec as it is. The change event only fires when the mode changes, and the mode can't be set from the track element.
Comment 30 Silvia Pfeiffer 2014-05-12 00:55:24 UTC
(In reply to Philip Jägenstedt from comment #29)
>
> I'm OK with the spec as it is. The change event only fires when the mode
> changes, and the mode can't be set from the track element.

True, but assume the JS dev changes a kind from metadata to captions, and the track is in the user's default language and the user preferences say to always turn on captions, and there wasn't a previously active caption track that is relevant, would browsers want to react to that and turn the captions on (which would in turn cause mode to change)?
Comment 31 Philip Jägenstedt 2014-05-12 08:00:14 UTC
(In reply to Silvia Pfeiffer from comment #30)
> (In reply to Philip Jägenstedt from comment #29)
> >
> > I'm OK with the spec as it is. The change event only fires when the mode
> > changes, and the mode can't be set from the track element.
> 
> True, but assume the JS dev changes a kind from metadata to captions, and
> the track is in the user's default language and the user preferences say to
> always turn on captions, and there wasn't a previously active caption track
> that is relevant, would browsers want to react to that and turn the captions
> on (which would in turn cause mode to change)?

I don't have much of an opinion on this. It seems simpler to simply expect the right kind to be set from the beginning, or is it useful to have it change dynamically?
Comment 32 Silvia Pfeiffer 2014-05-12 08:06:49 UTC
(In reply to Philip Jägenstedt from comment #31)
> (In reply to Silvia Pfeiffer from comment #30)
> > (In reply to Philip Jägenstedt from comment #29)
> > >
> > > I'm OK with the spec as it is. The change event only fires when the mode
> > > changes, and the mode can't be set from the track element.
> > 
> > True, but assume the JS dev changes a kind from metadata to captions, and
> > the track is in the user's default language and the user preferences say to
> > always turn on captions, and there wasn't a previously active caption track
> > that is relevant, would browsers want to react to that and turn the captions
> > on (which would in turn cause mode to change)?
> 
> I don't have much of an opinion on this. It seems simpler to simply expect
> the right kind to be set from the beginning, or is it useful to have it
> change dynamically?

I think the above is the best use case I can come up with for a dynamic change. Having said this, I don't really want it in the spec either. So, I think we should just add a clarifying sentence stating that a change of kind or language after the track was loaded has no effect.
Comment 33 Aaron Colwell 2014-05-12 12:08:01 UTC
(In reply to Silvia Pfeiffer from comment #32)
> (In reply to Philip Jägenstedt from comment #31)
> > (In reply to Silvia Pfeiffer from comment #30)
> > > (In reply to Philip Jägenstedt from comment #29)
> > > >
> > > > I'm OK with the spec as it is. The change event only fires when the mode
> > > > changes, and the mode can't be set from the track element.
> > > 
> > > True, but assume the JS dev changes a kind from metadata to captions, and
> > > the track is in the user's default language and the user preferences say to
> > > always turn on captions, and there wasn't a previously active caption track
> > > that is relevant, would browsers want to react to that and turn the captions
> > > on (which would in turn cause mode to change)?
> > 
> > I don't have much of an opinion on this. It seems simpler to simply expect
> > the right kind to be set from the beginning, or is it useful to have it
> > change dynamically?
> 
> I think the above is the best use case I can come up with for a dynamic
> change. Having said this, I don't really want it in the spec either. So, I
> think we should just add a clarifying sentence stating that a change of kind
> or language after the track was loaded has no effect.

ISTM if you don't want people to be able to change the language and kind in the TextTrack object, then the following sentence in Section 4.7.10.12.3 Sourcing out-of-band text tracks (http://www.w3.org/html/wg/drafts/html/master/embedded-content.html#sourcing-out-of-band-text-tracks) should probably be changed.

"As the kind, label, and srclang attributes are set, changed, or removed, the text track must update accordingly, as per the definitions above."

It seems like kind & label should be dropped from this sentence and it should be made clear that kind and label are sampled when the <track> becomes a child of a media element and the TextTrack object is added to the textTracks list.
Comment 34 Silvia Pfeiffer 2014-05-12 12:18:39 UTC
(In reply to Aaron Colwell from comment #33)
> 
> It seems like kind & label should be dropped from this sentence

No, I don't think that's necessary, because they just update the IDL attributes, but they have no effect. That's exactly what I am trying to make understood.

> and it
> should be made clear that kind and label are sampled when the <track>
> becomes a child of a media element and the TextTrack object is added to the
> textTracks list.

Yes, that would be a valuable clarification.
Comment 35 Aaron Colwell 2014-05-12 13:10:07 UTC
(In reply to Silvia Pfeiffer from comment #34)
> (In reply to Aaron Colwell from comment #33)
> > 
> > It seems like kind & label should be dropped from this sentence
> 
> No, I don't think that's necessary, because they just update the IDL
> attributes, but they have no effect. That's exactly what I am trying to make
> understood.

Actually I just realized that these are talking about the attributes on the HTMLTrackElement object. Instead, I think the following sentences should be removed from the TextTrack.kind and TextTrack.language attribute text since they seem to contradict your intent that these values stay constant.

"The kind of track can change dynamically, in the case of a text track corresponding to a track element."

"The language of a text track can change dynamically, in the case of a text track corresponding to a track element."

> 
> > and it
> > should be made clear that kind and label are sampled when the <track>
> > becomes a child of a media element and the TextTrack object is added to the
> > textTracks list.
> 
> Yes, that would be a valuable clarification.
Comment 36 Silvia Pfeiffer 2014-05-14 09:45:06 UTC
(In reply to Aaron Colwell from comment #35)
> (In reply to Silvia Pfeiffer from comment #34)
> > (In reply to Aaron Colwell from comment #33)
> > > 
> > > It seems like kind & label should be dropped from this sentence
> > 
> > No, I don't think that's necessary, because they just update the IDL
> > attributes, but they have no effect. That's exactly what I am trying to make
> > understood.
> 
> Actually I just realized that these are talking about the attributes on the
> HTMLTrackElement object. Instead, I think the following sentences should be
> removed from the TextTrack.kind and TextTrack.language attribute text since
> they seem to contradict your intent that these values stay constant.
> 
> "The kind of track can change dynamically, in the case of a text track
> corresponding to a track element."
> 
> "The language of a text track can change dynamically, in the case of a text
> track corresponding to a track element."

Same thing. The IDL attributes always reflect what changes JS makes to the content attributes, so removing this makes no difference - JS can still change their values.
Comment 37 Aaron Colwell 2014-05-14 11:25:04 UTC
(In reply to Silvia Pfeiffer from comment #36)
> (In reply to Aaron Colwell from comment #35)
> > (In reply to Silvia Pfeiffer from comment #34)
> > > (In reply to Aaron Colwell from comment #33)
> > > > 
> > > > It seems like kind & label should be dropped from this sentence
> > > 
> > > No, I don't think that's necessary, because they just update the IDL
> > > attributes, but they have no effect. That's exactly what I am trying to make
> > > understood.
> > 
> > Actually I just realized that these are talking about the attributes on the
> > HTMLTrackElement object. Instead, I think the following sentences should be
> > removed from the TextTrack.kind and TextTrack.language attribute text since
> > they seem to contradict your intent that these values stay constant.
> > 
> > "The kind of track can change dynamically, in the case of a text track
> > corresponding to a track element."
> > 
> > "The language of a text track can change dynamically, in the case of a text
> > track corresponding to a track element."
> 
> Same thing. The IDL attributes always reflect what changes JS makes to the
> content attributes, so removing this makes no difference - JS can still
> change their values.

I understand that the attributes on the HTMLTrackElement can be changed by JS, but I don't think that means that the TextTrack object needs to reflect those changes. I think that would make it clearer that the TextTrack attributes are actually immutable and cannot return different values.
Comment 38 Philip Jägenstedt 2014-05-14 12:33:10 UTC
An immutable TextTrack would be nice and save a bit of code in the implementation, but at which point should HTMLTrackElement's TextTrack be created and frozen? Currently it's just always there, and changes with its element.
Comment 39 Ian 'Hixie' Hickson 2014-07-30 00:00:54 UTC
I'm filing a new bug to track the <track> issue, since it seems separate from the original issue here. See bug 26459. Returning this bug to WORKSFORME.