17004 – Define a timestamp offset mechanism

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 17004 - Define a timestamp offset mechanism

Summary: Define a timestamp offset mechanism

Status:	RESOLVED FIXED

Alias:	None

Product:	HTML WG
Classification:	Unclassified
Component:	Media Source Extensions (show other bugs)
Version:	unspecified
Hardware:	All All

Importance:	P2 normal
Target Milestone:	---
Assignee:	Adrian Bateman [MSFT]
QA Contact:	HTML WG Bugzilla archive list

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2012-05-08 21:07 UTC by Aaron Colwell (c)
Modified:	2012-07-30 20:42 UTC (History)
CC List:	8 users (show)

See Also:

Attachments

The current spec text does not provide a mechanism to apply an offset to timestamps in media segments. This feature is primarily for ad insertion and mashups use cases. It would allow web applications to easily append content into the middle of presentations without resorting to format parsing & rewriting to adjust the timestamps.

Method signature suggestions:

void sourceTimestampMapping(in double presentationTimestamp, in double 
segmentTimestamp)

-- or --

void sourceTimestampOffset(in double timstampOffset)

The method would specify the timestamp mapping to use for future media segments that get appended. If the UA is in the middle of parsing a media segment, it must defer applying the new offset until the end of the current segment. The mapping would be applied at append time before the data goes into the source buffer. This allows the source buffer to solely deal with presentation timestamps. 

This is just an initial outline for the feature. There are likely other semantics that need to be nailed down as well.

Comment 1 Kevin Streeter 2012-05-17 20:32:45 UTC

Here is a list of some of the issues/use cases I feel we should verify as part of adding this functionality.  They are listed in no particular order:

* Content Initialization: not only is the timeline discontinuous, but the new media will likely require initialization (like H264 SPS/PPS).

* Seeking: the effective mapping for the stream timeline needs to be completely deterministic (from the app script's perspective), so that a sane view of the total stream timeline can be computed for things like seeking.

* Tracking: this is related to the seeking requirement...not only does the script need to be able to compute the timeline for seeking, but it also needs to be able to infer where in the the sequence the current playhead is ("tracking" the playhead).

* Content Protection: this behavior will likely impact the content protection proposal, in that we should be able to change DRM key/license context at the discontinuous content boundaries.  This may mean a new key/license, or a shift from protected to unprotected (and back to protected).

This is everything I can think of now :)

(In reply to comment #1)
> Here is a list of some of the issues/use cases I feel we should verify as part
> of adding this functionality.  They are listed in no particular order:
> 
> * Content Initialization: not only is the timeline discontinuous, but the new
> media will likely require initialization (like H264 SPS/PPS).

I think this can be handled by appending a new initialization segment before appending the data with the different parameters. Is this not sufficient?

> 
> * Seeking: the effective mapping for the stream timeline needs to be completely
> deterministic (from the app script's perspective), so that a sane view of the
> total stream timeline can be computed for things like seeking.

I believe the proposed solution fulfills this requirement. The intent was to apply the mapping at append time so that the source buffer is only aware of presentation timestamps. Future changes to the mapping only effect future appends and doesn't affect previous data.

> 
> * Tracking: this is related to the seeking requirement...not only does the
> script need to be able to compute the timeline for seeking, but it also needs
> to be able to infer where in the the sequence the current playhead is
> ("tracking" the playhead).

Is currentTime on HTMLMediaElement not sufficient? I'm assuming that the script is keeping track of where it inserted a segment and can use currentTime to determine whether the current playback position is within that segment.


> * Content Protection: this behavior will likely impact the content protection
> proposal, in that we should be able to change DRM key/license context at the
> discontinuous content boundaries.  This may mean a new key/license, or a shift
> from protected to unprotected (and back to protected).

I believe this is supported with the current spec language. I'd expect a new initialization segment to be appended at the encrypted->decrypted & decrypted->encrypted boundaries.

> 
> This is everything I can think of now :)

Thanks for your comments. :)

Comment 3 Kevin Streeter 2012-05-21 23:36:59 UTC

Agreed that some of what is described here is already allowed in the proposed API.  I called them out only because they also put requirements on the user agent to interpret certain sequences of API calls in a semantically specific way.  For example, it is important that the user agent playback the media continuously and seamlessly even if a new initialization segment is injected.  I feel like we should call out the appropriate semantics so that user agents implement the behavior consistently, since it would be easy to create a simplified implementation that didn't handle these cases well.

Regarding "tracking", if am not 100% sure that currentTime is sufficient for the task. One issue I see is that it will be challenging for the application to know when to check currentTime.  Polling is an option, but polling always leads to a tradeoff between responsiveness and efficiency.  It would be nice if the app would be notified as timeline boundaries were crossed, or maybe there was a mechanism to introduce app-specified markers that would lead to a notification as they were hit during stream play.

(In reply to comment #3)
> Agreed that some of what is described here is already allowed in the proposed
> API.  I called them out only because they also put requirements on the user
> agent to interpret certain sequences of API calls in a semantically specific
> way.  For example, it is important that the user agent playback the media
> continuously and seamlessly even if a new initialization segment is injected. 
> I feel like we should call out the appropriate semantics so that user agents
> implement the behavior consistently, since it would be easy to create a
> simplified implementation that didn't handle these cases well.

Ok. I was just giving examples to make sure I understood what you were talking about. I'll try to come up with some better text to convey that new initialization segments should be handled as seamlessly as possible.

> 
> Regarding "tracking", if am not 100% sure that currentTime is sufficient for
> the task. One issue I see is that it will be challenging for the application to
> know when to check currentTime.  Polling is an option, but polling always leads
> to a tradeoff between responsiveness and efficiency.  It would be nice if the
> app would be notified as timeline boundaries were crossed, or maybe there was a
> mechanism to introduce app-specified markers that would lead to a notification
> as they were hit during stream play.

I wonder if it would be better to handle this with a metadata TextTrack instead of something explicitly added to this API. You could add TextTrackCues when you append segments of interest. You could hook into the enter & exit events to get notified when the timeline enters & exits the cue time range. Would this be sufficient?

Comment 5 Kevin Streeter 2012-05-24 18:13:22 UTC

(In reply to comment #4)

> I wonder if it would be better to handle this with a metadata TextTrack instead
> of something explicitly added to this API. You could add TextTrackCues when you
> append segments of interest. You could hook into the enter & exit events to get
> notified when the timeline enters & exits the cue time range. Would this be
> sufficient?

Using the text track is an interesting idea.  It would be synchronized properly, and I definitely appreciate the benefit of using an existing construct.  A couple of questions come to mind:

* How easy is it for the application script to generate a text track on the fly, in order to insert these events dynamically?

* What is the format of the text track metadata?

* Can there be multiple text tracks, so that the generated one doesn't interfere with any "real" one that is a part of the content?

Comment 6 Silvia Pfeiffer 2012-05-28 08:42:19 UTC

(In reply to comment #5)
> * How easy is it for the application script to generate a text track on the
> fly, in order to insert these events dynamically?

Very easy. We have an explicit JavaScript API to create text tracks and cues.
See http://www.whatwg.org/specs/web-apps/current-work/multipage/the-video-element.html#dom-media-addtexttrack .


> * What is the format of the text track metadata?

For @kind=metadata tracks, the format is free-form.

 
> * Can there be multiple text tracks, so that the generated one doesn't
> interfere with any "real" one that is a part of the content?

You can have as many text tracks as you like.

Comment 7 Kevin Streeter 2012-06-17 17:09:54 UTC

(In reply to comment #6)
> (In reply to comment #5)
> > * How easy is it for the application script to generate a text track on the
> > fly, in order to insert these events dynamically?
> 
> Very easy. We have an explicit JavaScript API to create text tracks and cues.
> See
> http://www.whatwg.org/specs/web-apps/current-work/multipage/the-video-element.html#dom-media-addtexttrack
> .
> 
> 
> > * What is the format of the text track metadata?
> 
> For @kind=metadata tracks, the format is free-form.
> 
> 
> > * Can there be multiple text tracks, so that the generated one doesn't
> > interfere with any "real" one that is a part of the content?
> 
> You can have as many text tracks as you like.

Thanks for this info Silvia, I feel like based on what you have described this could adequately solve the problem.  The script would create a text track just for the purposes of signaling interesting time boundaries.  It would then inject text cues into this track with a presentation time that matches the splice point, which would be dispatched when the splice point was reached.

Comment 8 suzie.hyun 2012-06-18 21:55:58 UTC

(In reply to comment #2)

> (In reply to comment #1)
> > Here is a list of some of the issues/use cases I feel we should verify as part
> > of adding this functionality.  They are listed in no particular order:
> > 
> > * Content Initialization: not only is the timeline discontinuous, but the new
> > media will likely require initialization (like H264 SPS/PPS).
> I think this can be handled by appending a new initialization segment before
> appending the data with the different parameters. Is this not sufficient?
> > 
> > * Seeking: the effective mapping for the stream timeline needs to be completely
> > deterministic (from the app script's perspective), so that a sane view of the
> > total stream timeline can be computed for things like seeking.
> I believe the proposed solution fulfills this requirement. The intent was to
> apply the mapping at append time so that the source buffer is only aware of
> presentation timestamps. Future changes to the mapping only effect future
> appends and doesn't affect previous data.
> > 
> > * Tracking: this is related to the seeking requirement...not only does the
> > script need to be able to compute the timeline for seeking, but it also needs
> > to be able to infer where in the the sequence the current playhead is
> > ("tracking" the playhead).
> Is currentTime on HTMLMediaElement not sufficient? I'm assuming that the script
> is keeping track of where it inserted a segment and can use currentTime to
> determine whether the current playback position is within that segment.
> > * Content Protection: this behavior will likely impact the content protection
> > proposal, in that we should be able to change DRM key/license context at the
> > discontinuous content boundaries.  This may mean a new key/license, or a shift
> > from protected to unprotected (and back to protected).
> I believe this is supported with the current spec language. I'd expect a new
> initialization segment to be appended at the encrypted->decrypted &
> decrypted->encrypted boundaries.
> > 

Which spec language are you referring to for content protection?
Other than key delivery, does this spec assume that buffers are intact?

> > This is everything I can think of now :)
> Thanks for your comments. :)

Spec changes committed.

Revision: 
http://dvcs.w3.org/hg/html-media/rev/087ea42f59c8

Mailing list discussions: 
http://lists.w3.org/Archives/Public/public-html-media/2012Jul/0035.html
http://lists.w3.org/Archives/Public/public-html-media/2012Jun/0080.html