25005 – Add MPEG-2 TS metadata text track cue generation guideline

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 25005 - Add MPEG-2 TS metadata text track cue generation guideline

Summary: Add MPEG-2 TS metadata text track cue generation guideline

Status:	RESOLVED WORKSFORME

Alias:	None

Product:	HTML WG
Classification:	Unclassified
Component:	HTML5 spec (show other bugs)
Version:	unspecified
Hardware:	PC All

Importance:	P2 normal
Target Milestone:	---
Assignee:	Silvia Pfeiffer
QA Contact:	HTML WG Bugzilla archive list

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2014-03-11 16:03 UTC by Bob Lund
Modified:	2014-06-01 04:12 UTC (History)
CC List:	8 users (show)

See Also:

Attachments

Description Bob Lund 2014-03-11 16:03:07 UTC

Guidelines for generating MPEG-2 TS metadata text track cues have been proposed in the in-band tracks community group [1]. [2] shows a new table to be added in [3].

[1] https://www.w3.org/community/inbandtracks/wiki/Main_Page
[2] https://www.w3.org/community/inbandtracks/wiki/Main_Page#Guidelines_for_creating_metadata_text_track_cues
[3] http://www.w3.org/TR/html5/embedded-content-0.html#guidelines-for-exposing-cues-in-various-formats-as-text-track-cues

Comment 1 Silvia Pfeiffer 2014-03-17 08:20:16 UTC

Bob, I don't really understand what the table in [2] means.

I don't follow how a cue is created in DASH or MPEG-4.

Even for MPEG-2 the statement is strange. What is the time that the cue is created? Where can I find that in the file? Would it be better to say something like "Every PSI section is encapsulated in a DataCue with the cue's start time provided by the activation time of that PSI section and the cue's end time by the deactivation time of that PSI or by a repeat PSI packet."

Does that make sense?

Comment 2 Bob Lund 2014-03-17 16:58:59 UTC

(In reply to Silvia Pfeiffer from comment #1)
> Bob, I don't really understand what the table in [2] means.
> 
> I don't follow how a cue is created in DASH or MPEG-4.

I don't think there is a general rule - it's dependent on the MIME. For example, WebVTT (text/vtt) has a cue format, TTML (application/xml+ttml) has another cue format. Don't the specs for these formats define how a UA would create a cue?
> 
> Even for MPEG-2 the statement is strange. What is the time that the cue is
> created? Where can I find that in the file? Would it be better to say
> something like "Every PSI section is encapsulated in a DataCue with the
> cue's start time provided by the activation time of that PSI section and the
> cue's end time by the deactivation time of that PSI or by a repeat PSI
> packet."
> 
> Does that make sense?

There is no explicit presentation/activation and deactivation time - the cue should fire as soon as the UA can create it, which implies the UA has received all TS packets comprising the section data; hence,the proposed language. endTime is not defined so could be same as start, a really big number or an indication of infinity/no endTime.

Comment 3 Silvia Pfeiffer 2014-03-18 10:58:25 UTC

(In reply to Bob Lund from comment #2)
> (In reply to Silvia Pfeiffer from comment #1)
> > Bob, I don't really understand what the table in [2] means.
> > 
> > I don't follow how a cue is created in DASH or MPEG-4.
> 
> I don't think there is a general rule - it's dependent on the MIME. For
> example, WebVTT (text/vtt) has a cue format, TTML (application/xml+ttml) has
> another cue format. Don't the specs for these formats define how a UA would
> create a cue?

Correct. So, nothing needs to be added to the HTML spec for these.


> > Even for MPEG-2 the statement is strange. What is the time that the cue is
> > created? Where can I find that in the file? Would it be better to say
> > something like "Every PSI section is encapsulated in a DataCue with the
> > cue's start time provided by the activation time of that PSI section and the
> > cue's end time by the deactivation time of that PSI or by a repeat PSI
> > packet."
> > 
> > Does that make sense?
> 
> There is no explicit presentation/activation and deactivation time - the cue
> should fire as soon as the UA can create it, which implies the UA has
> received all TS packets comprising the section data; hence,the proposed
> language. endTime is not defined so could be same as start, a really big
> number or an indication of infinity/no endTime.

Cue start and end time are not about where something happens to be encapsulated in the media file, but have a semantic meaning. The cue start time means that at that time the data in the cue becomes active and end time when it becomes inactive. IIUC the PSI has an implied activation time at the time in the media stream where it appears and its value is implied to be correct until another PSI arrives. So, to represent it correctly in a cue, these would need to be start and end times.

Comment 4 Bob Lund 2014-03-18 16:44:44 UTC

(In reply to Silvia Pfeiffer from comment #3)
> (In reply to Bob Lund from comment #2)
> > (In reply to Silvia Pfeiffer from comment #1)
> > > Bob, I don't really understand what the table in [2] means.
> > > 
> > > I don't follow how a cue is created in DASH or MPEG-4.
> > 
> > I don't think there is a general rule - it's dependent on the MIME. For
> > example, WebVTT (text/vtt) has a cue format, TTML (application/xml+ttml) has
> > another cue format. Don't the specs for these formats define how a UA would
> > create a cue?
> 
> Correct. So, nothing needs to be added to the HTML spec for these.
> 
> 
> > > Even for MPEG-2 the statement is strange. What is the time that the cue is
> > > created? Where can I find that in the file? Would it be better to say
> > > something like "Every PSI section is encapsulated in a DataCue with the
> > > cue's start time provided by the activation time of that PSI section and the
> > > cue's end time by the deactivation time of that PSI or by a repeat PSI
> > > packet."
> > > 
> > > Does that make sense?
> > 
> > There is no explicit presentation/activation and deactivation time - the cue
> > should fire as soon as the UA can create it, which implies the UA has
> > received all TS packets comprising the section data; hence,the proposed
> > language. endTime is not defined so could be same as start, a really big
> > number or an indication of infinity/no endTime.
> 
> Cue start and end time are not about where something happens to be
> encapsulated in the media file, but have a semantic meaning. The cue start
> time means that at that time the data in the cue becomes active and end time
> when it becomes inactive.
> IIUC the PSI has an implied activation time at the
> time in the media stream where it appears

These streams are not PSI actually but the principle you outline is
correct, and what I tried to convey. So, startTime = 'time in the media stream where it appears'.

> and its value is implied to be
> correct until another PSI arrives. So, to represent it correctly in a cue,
> these would need to be start and end times.

The endTime then wouldn't be known until the next private section arrives. This raises the question of what the endTime should be set to in the intervening time and requires the UA to update the endTime of the prior cue.

A preferred alternative would be to set endTime to startTime.

Comment 5 Silvia Pfeiffer 2014-03-24 04:09:57 UTC

(In reply to Bob Lund from comment #4)
> (In reply to Silvia Pfeiffer from comment #3)
> >
> > > > Even for MPEG-2 the statement is strange. What is the time that the cue is
> > > > created? Where can I find that in the file? Would it be better to say
> > > > something like "Every PSI section is encapsulated in a DataCue with the
> > > > cue's start time provided by the activation time of that PSI section and the
> > > > cue's end time by the deactivation time of that PSI or by a repeat PSI
> > > > packet."
> > > > 
> > > > Does that make sense?
> > > 
> > > There is no explicit presentation/activation and deactivation time - the cue
> > > should fire as soon as the UA can create it, which implies the UA has
> > > received all TS packets comprising the section data; hence,the proposed
> > > language. endTime is not defined so could be same as start, a really big
> > > number or an indication of infinity/no endTime.
> > 
> > Cue start and end time are not about where something happens to be
> > encapsulated in the media file, but have a semantic meaning. The cue start
> > time means that at that time the data in the cue becomes active and end time
> > when it becomes inactive.
> > IIUC the PSI has an implied activation time at the
> > time in the media stream where it appears
> 
> These streams are not PSI actually but the principle you outline is
> correct, and what I tried to convey. So, startTime = 'time in the media
> stream where it appears'.
> 
> > and its value is implied to be
> > correct until another PSI arrives. So, to represent it correctly in a cue,
> > these would need to be start and end times.
> 
> The endTime then wouldn't be known until the next private section arrives.
> This raises the question of what the endTime should be set to in the
> intervening time and requires the UA to update the endTime of the prior cue.
>
> A preferred alternative would be to set endTime to startTime.

That would create a cue of duration 0, which would never end up in the list of "current cues", but always only in the list of "missed cues" (see the "time marches on" algorithm).

In any case, do you still want a sentence on how to create cues from PSI information in MPEG-2 in the spec?

Comment 6 Bob Lund 2014-03-24 16:39:09 UTC

(In reply to Silvia Pfeiffer from comment #5)
> (In reply to Bob Lund from comment #4)
> > (In reply to Silvia Pfeiffer from comment #3)
> > >
> > > > > Even for MPEG-2 the statement is strange. What is the time that the cue is
> > > > > created? Where can I find that in the file? Would it be better to say
> > > > > something like "Every PSI section is encapsulated in a DataCue with the
> > > > > cue's start time provided by the activation time of that PSI section and the
> > > > > cue's end time by the deactivation time of that PSI or by a repeat PSI
> > > > > packet."
> > > > > 
> > > > > Does that make sense?
> > > > 
> > > > There is no explicit presentation/activation and deactivation time - the cue
> > > > should fire as soon as the UA can create it, which implies the UA has
> > > > received all TS packets comprising the section data; hence,the proposed
> > > > language. endTime is not defined so could be same as start, a really big
> > > > number or an indication of infinity/no endTime.
> > > 
> > > Cue start and end time are not about where something happens to be
> > > encapsulated in the media file, but have a semantic meaning. The cue start
> > > time means that at that time the data in the cue becomes active and end time
> > > when it becomes inactive.
> > > IIUC the PSI has an implied activation time at the
> > > time in the media stream where it appears
> > 
> > These streams are not PSI actually but the principle you outline is
> > correct, and what I tried to convey. So, startTime = 'time in the media
> > stream where it appears'.
> > 
> > > and its value is implied to be
> > > correct until another PSI arrives. So, to represent it correctly in a cue,
> > > these would need to be start and end times.
> > 
> > The endTime then wouldn't be known until the next private section arrives.
> > This raises the question of what the endTime should be set to in the
> > intervening time and requires the UA to update the endTime of the prior cue.
> >
> > A preferred alternative would be to set endTime to startTime.
> 
> That would create a cue of duration 0, which would never end up in the list
> of "current cues", but always only in the list of "missed cues" (see the
> "time marches on" algorithm).
> 
> In any case, do you still want a sentence on how to create cues from PSI
> information in MPEG-2 in the spec?

Yes. What about endTime = startTime + 250msec? It really doesn't matter what the endTime is. The application only needs to know the startTime.

Comment 7 Silvia Pfeiffer 2014-03-24 19:55:45 UTC

(In reply to Bob Lund from comment #6)
> 
> Yes. What about endTime = startTime + 250msec? It really doesn't matter what
> the endTime is. The application only needs to know the startTime.

Sure, that works.

Comment 8 Brendan Long 2014-03-24 20:24:34 UTC

(In reply to Silvia Pfeiffer from comment #7)
> (In reply to Bob Lund from comment #6)
> > 
> > Yes. What about endTime = startTime + 250msec? It really doesn't matter what
> > the endTime is. The application only needs to know the startTime.
> 
> Sure, that works.

What happens if JavaScript ties up execution for 250+ ms though? Wouldn't we have the same problem, where the cue gets missed?

Could we fix this problem at the source by adding a list of changed cues to the cuechange event?

interface CueChangeEvent : Event {
    attribute TextTrackCue[] cues;
}

Comment 9 Silvia Pfeiffer 2014-03-24 20:41:05 UTC

(In reply to Brendan Long from comment #8)
> (In reply to Silvia Pfeiffer from comment #7)
> > (In reply to Bob Lund from comment #6)
> > > 
> > > Yes. What about endTime = startTime + 250msec? It really doesn't matter what
> > > the endTime is. The application only needs to know the startTime.
> > 
> > Sure, that works.
> 
> What happens if JavaScript ties up execution for 250+ ms though? Wouldn't we
> have the same problem, where the cue gets missed?

Sure - cues can always get missed.

 
> Could we fix this problem at the source by adding a list of changed cues to
> the cuechange event?
> 
> interface CueChangeEvent : Event {
>     attribute TextTrackCue[] cues;
> }

That was the point of the discussion in bug #24161 - see particularly the discussion in but #24382

Comment 10 Bob Lund 2014-03-27 21:29:58 UTC

(In reply to Silvia Pfeiffer from comment #7)
> (In reply to Bob Lund from comment #6)
> > 
> > Yes. What about endTime = startTime + 250msec? It really doesn't matter what
> > the endTime is. The application only needs to know the startTime.
> 
> Sure, that works.

As discussed in http://lists.w3.org/Archives/Public/public-inbandtracks/2014Mar/0079.html, the proposal is now

- startTime = 0
- endTime = video.currentTime equivalent of video frame PTS where cue is in the media resource
- pause_on_exit = false.

Comment 11 Brendan Long 2014-04-04 15:22:44 UTC

Er sorry I think I just accidentally added people to the CC list on this. I was looking at the wrong bug :\

Comment 12 Silvia Pfeiffer 2014-04-08 01:33:24 UTC

(In reply to Bob Lund from comment #10)
> (In reply to Silvia Pfeiffer from comment #7)
> > (In reply to Bob Lund from comment #6)
> > > 
> > > Yes. What about endTime = startTime + 250msec? It really doesn't matter what
> > > the endTime is. The application only needs to know the startTime.
> > 
> > Sure, that works.
> 
> As discussed in
> http://lists.w3.org/Archives/Public/public-inbandtracks/2014Mar/0079.html,
> the proposal is now
> 
> - startTime = 0
> - endTime = video.currentTime equivalent of video frame PTS where cue is in
> the media resource
> - pause_on_exit = false.

I am confused: where do you want me to make that change?

Comment 13 Bob Lund 2014-04-08 22:38:39 UTC

(In reply to Silvia Pfeiffer from comment #12)
> (In reply to Bob Lund from comment #10)
> > (In reply to Silvia Pfeiffer from comment #7)
> > > (In reply to Bob Lund from comment #6)
> > > > 
> > > > Yes. What about endTime = startTime + 250msec? It really doesn't matter what
> > > > the endTime is. The application only needs to know the startTime.
> > > 
> > > Sure, that works.
> > 
> > As discussed in
> > http://lists.w3.org/Archives/Public/public-inbandtracks/2014Mar/0079.html,
> > the proposal is now
> > 
> > - startTime = 0
> > - endTime = video.currentTime equivalent of video frame PTS where cue is in
> > the media resource
> > - pause_on_exit = false.
> 
> I am confused: where do you want me to make that change?

The bug proposed the table [1] in the existing guidelines for creating cues section [2]. This would change if all of this were moved to another spec.

[1] https://www.w3.org/community/inbandtracks/wiki/Main_Page#Guidelines_for_creating_metadata_text_track_cues
[2] http://www.w3.org/TR/html5/embedded-content-0.html#guidelines-for-exposing-cues-in-various-formats-as-text-track-cues

Comment 14 Silvia Pfeiffer 2014-05-12 07:16:45 UTC

(In reply to Bob Lund from comment #13)
> (In reply to Silvia Pfeiffer from comment #12)
> > (In reply to Bob Lund from comment #10)
> > > (In reply to Silvia Pfeiffer from comment #7)
> > > > (In reply to Bob Lund from comment #6)
> > > > > 
> > > > > Yes. What about endTime = startTime + 250msec? It really doesn't matter what
> > > > > the endTime is. The application only needs to know the startTime.
> > > > 
> > > > Sure, that works.
> > > 
> > > As discussed in
> > > http://lists.w3.org/Archives/Public/public-inbandtracks/2014Mar/0079.html,
> > > the proposal is now
> > > 
> > > - startTime = 0
> > > - endTime = video.currentTime equivalent of video frame PTS where cue is in
> > > the media resource
> > > - pause_on_exit = false.
> > 
> > I am confused: where do you want me to make that change?
> 
> The bug proposed the table [1] in the existing guidelines for creating cues
> section [2]. This would change if all of this were moved to another spec.
> 
> [1]
> https://www.w3.org/community/inbandtracks/wiki/
> Main_Page#Guidelines_for_creating_metadata_text_track_cues
> [2]
> http://www.w3.org/TR/html5/embedded-content-0.html#guidelines-for-exposing-
> cues-in-various-formats-as-text-track-cues

Is http://rawgit.com/silviapfeiffer/HTMLSourcingInbandTracks/master/index.html sufficient now to close this bug?

Comment 15 Bob Lund 2014-05-12 15:52:36 UTC

(In reply to Silvia Pfeiffer from comment #14)
> (In reply to Bob Lund from comment #13)
> > (In reply to Silvia Pfeiffer from comment #12)
> > > (In reply to Bob Lund from comment #10)
> > > > (In reply to Silvia Pfeiffer from comment #7)
> > > > > (In reply to Bob Lund from comment #6)
> > > > > > 
> > > > > > Yes. What about endTime = startTime + 250msec? It really doesn't matter what
> > > > > > the endTime is. The application only needs to know the startTime.
> > > > > 
> > > > > Sure, that works.
> > > > 
> > > > As discussed in
> > > > http://lists.w3.org/Archives/Public/public-inbandtracks/2014Mar/0079.html,
> > > > the proposal is now
> > > > 
> > > > - startTime = 0
> > > > - endTime = video.currentTime equivalent of video frame PTS where cue is in
> > > > the media resource
> > > > - pause_on_exit = false.
> > > 
> > > I am confused: where do you want me to make that change?
> > 
> > The bug proposed the table [1] in the existing guidelines for creating cues
> > section [2]. This would change if all of this were moved to another spec.
> > 
> > [1]
> > https://www.w3.org/community/inbandtracks/wiki/
> > Main_Page#Guidelines_for_creating_metadata_text_track_cues
> > [2]
> > http://www.w3.org/TR/html5/embedded-content-0.html#guidelines-for-exposing-
> > cues-in-various-formats-as-text-track-cues
> 
> Is
> http://rawgit.com/silviapfeiffer/HTMLSourcingInbandTracks/master/index.html
> sufficient now to close this bug?

Currently there are no guidelines for Cue creation in http://rawgit.com/silviapfeiffer/HTMLSourcingInbandTracks/master/index.html. Once we have them there I think we can close this bug. I will do that and create a pull request.

Comment 16 Bob Lund 2014-05-15 15:23:03 UTC

I have created a pull request in [1] for a adding MPEG-2 TS metadata cue creation guideline.

[1] https://github.com/silviapfeiffer/HTMLSourcingInbandTracks

Comment 17 Bob Lund 2014-05-15 19:43:00 UTC

Bug 25733 [1] has been opened to add an HTML5 to the sourcing spec.

[1] https://www.w3.org/Bugs/Public/show_bug.cgi?id=25733

Comment 18 Silvia Pfeiffer 2014-06-01 04:12:06 UTC

EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are
satisfied with this response, please change the state of this bug to CLOSED. If
you have additional information and would like the Editor to reconsider, please
reopen this bug. If you would like to escalate the issue to the full HTML
Working Group, please add the TrackerRequest keyword to this bug, and suggest
title and text for the Tracker Issue; or you may create a Tracker Issue
yourself, if you are able to do so. For more details, see this document:

   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Partially Accepted

Change Description:
Cue creation guidelines have been added to
http://rawgit.com/w3c/HTMLSourcingInbandTracks/master/index.html
and will be maintained there.

Rationale:
This use case is being dealt with in a different spec.