DataCue API – 25 January 2021

Meeting minutes

cpn: Call today focused on integration of DataCue and MSE. Understanding where changes may be needed. See shared Google doc.
… The doc summarizes the discussions we had so far
… Two main issues: #189 on exposing emsg boxes
… with discussion on how to do that in an interoperable manner

<kaz> media-source Issue 189

cpn: Other issue is notes from our discussion at TPAC 2020 with Matt, notably mapping with the HTML media timeline.
… For each of the different specs potentialy affected by this, I've started to look at the different areas that may need to change. This is intended as a guide to helping us specify the behavior of DataCue.

<kaz> ISO BMFF Byte Stream Format draft

cpn: Starting with the Byte Stream Format for ISOBMFF
… I'm thinking about adding a description that emsg boxes are allowed to appear after the initialization segment and before the media segment.
… Question is: do we specify the behavior here or do we specify it elsewhere?
… The way we want to expose this in the end is through a TextTrack, defined in HTML.
… Where to specify the emsg box handling?

zacharycava: How to handle timeline processing, time relative vs. fixed addressing. This would be the right area to address appending emsg boxes.

cpn: Is the timing of the emsg boxes dependent on any of the info from the MPD? Or can it be derived from the segment?

zacharycava: You can derive it from the segment entirely.
… Everything should be in the track itself.

cpn: In this, we would be looking at emsg boxes, v0 and v1 variants.
… There has been a discussion, in MPEG I believe, about carrying separate tracks. At this stage, we're considering only top-level boxes.
… We would consider tracks as that develops.
… Am I right thinking that we can refer the info in the DASH-IF processing document, as that will define how the timing is derived?

zacharycava: Yes, that's great. The document details the timing for both variants.

cpn: I'm proposing to not suggest edits to this document yet. It's more in the DataCue spec itself that we could write things down and, assuming we have implementors support, these changes could be moved into the relevant document.
… Next part that I wanted to look at were some of the MSE algorithms.
… Looking at appendBuffer, it's not clear to me that this actually needs to change. It may be that this is covered by the changes in the byte stream format document.
… What is interesting here is that we have the remove algorithm.
… If you call remove on a SourceBuffer with a time range, do we expect that also to remove the corresponding emsg boxes?

zacharycava: If it removes the totality of the range that the event was active, then it may. It relates to this other question of this dual dispatch mode.
… That seems trickier if it's on receive.

cpn: If it's on receive, it may have been raised to the application already.
… onstart gets mapped to time marches on.
… onreceive, it's not clear. We may have to signal it directly to the application.
… We've heard from implementors that it may be hard to support interoperably though.
… I'm more focusing on onstart for now.

zacharycava: I think that makes sense.

Iraj: In terms of the timing, when a segment is appended to MSE buffer. At that point, the MSE parser uses media time, before appending it. Is that the correct assumption?

cpn: I believe so.

Iraj: If that's correct, when top-level parsing happens, onreceive will be triggered there, before appending the buffer. Removing time ranges on the buffer would not change anything for onreceive.

cpn: OK.
… There's still the question on removing the events. It may be complicated because the events may overlap with the range being removed.
… How to re-align times? Everything has to shift, presumably, in order to keep the alignment.

zacharycava: Are you asking the question of whether removal of the time range should correspond to the start time of the time range, or the start time of the emsg box?
… Is there a precedent with in-band tracks?

eric: I think that removal of cues is optional.

Iraj: The start of the WebVTT captions is aligned with the samples. So such cases never arise.
… I don't know the internal logic of MSE.
… Let's say you have a TimeRanges available in MSE. Player seeks to a time before. How does it know that it needs to fetch a segment again?

zacharycava: The usage of the segment is not important. We're tracking the act of play which will influence the events at the play head.
… The desired behavior would be that you don't remove events that overlaps with the range being removed.

Iraj: So you have to maintain the position of the events in the segments.

zacharycava: The segments are not important because you don't remove based on segments but on time ranges.

Iraj: Whenever the client does random access on a previous segment, the client is supposed to see the events again. If you just maintain the event's start time, then if you do a seek, you're not going to see that event again.

cpn: I think you would, because once they are parsed and added to the TextTrack, they become part of the TextTrack timeline. Any seeking that you do will correctly trigger the events again.

Iraj: That's true for onstart. What about onreceive?

cpn: The way I'm thinking about onreceive is that maybe they don't get placed on the timeline.

Iraj: They are still tied to a location in the timeline. But in terms of receiving. When you seek to an earlier time, they should re-trigger.

cpn: So, what we're saying is. All events get placed on the media timeline. The onstart events are placed based on the calculated time based on the emsg timing and the event start time. And the onreceive events are placed on the parsing time of the box on the timeline.

Iraj: Correct.

cpn: I can only restate the feedback that we've heard before: we may not be able to achieve onreceive events in an interoperable way. In WebKit, they don't have the control on when parsing happens.

Iraj: So that relates to the discussion on whether the MSE buffer is a pre-decoded buffer or a decoded buffer.

cpn: Another area in MSE is timestampOffset in the SourceBuffer. It sets a timing offset to segments being appended.
… Is anybody aware of how the offset is used?

Iraj: The offset applies to the all timelines. Everything happens on the offset times.

cpn: Then this would apply also to the event messages because you would apply the same offsets.

Iraj: Yes. The difference is that in v1, you also need to reference the media start time.

cpn: Where does it come from?

Iraj: It's the timestamp of the start of the track.
… When you instantiate the MSE buffer, I suspect that you have to pass that value.
… So the UA can track that time and apply the appropriate offset.

cpn: Is this covered in the DASH-IF events document?

Iraj: Yes.
… But it needs to be translated to MSE implementations.
… It remains purely in the media segment, not an MPD property.

cpn: There's an action here to look at the DASH-IF document in more details and then figure out the right mapping for the v0 and the v1 timing.

<kaz> Sourcing In-band Media Resource Tracks from Media Containers into HTML draft

cpn: Moving on to Sourcing in-band tracks. We have this unofficial document and this document gets referred to from the HTML spec and MSE, and there's a section here that talks about ISOBMFF.
… I don't really whether the specification describes accurately what is implemented in browsers and therefore could be a more normative document. Or whether this is more aspirational.
… I welcome feedback on the status of the implementation.
… This talks about the TextTrack id implementation. What we would need to do for DataCue events is to have a way of defining which track they appear on and what attributes does that track have.
… We discussed before that there would be a single metadata track that would include all types of events, with events scheme information embedded at the cue level.

eric: FWIW, that's how it practically works today. We don't know beforehand whether there will be DataCue or not. When we see the first one, we create the track that will contain the first and other DataCue.
… Back to your question on the sourcing doc, I think it is mostly aspirational. As far as I know, Webkit is the only engine supporting some in-band event tracks. I'll need to check whether the engine actually follows the provisions in that document.

cpn: That would be very useful. I'd like the document to reflect what browsers actually implement. If things are not supported, then it may not make sense to keep it around. Converserly, it could perhaps move to a more normative document.

Gary: We do use the dispatch type to look at IDv3 tag in Safari and we fake it in other cases. That said, we're not married to it, and can update our implementation if it gets deprecated.

eric: I had argued against that attribute when it got added in the first place.

cpn: If we're merging different kinds of cues in a single track, perhaps it does not make a lot of sense in such contexts.
… It may be that we can just set it to a fixed value without needing to deprecate it.

cpn: Next step is to take the comments that we've made today about this and draft some words that will into the DataCue document to capture some of the considerations we've been going through.
… Eric, if you can look at the sourcing doc, that's great.
… Gary, if you can look at video.js, good as well.

<kaz> Sourcing In-band Media Resource Tracks from Media Containers into HTML draft

cpn: I could really use some help on the editorial side, so if anyone is interested, that would be gratefully received!

<kaz> [adjourned]

– DRAFT –
DataCue API

25 January 2021

Attendees

Meeting minutes

Diagnostics