Re: Issue-270 and Issue-335 from Nigel Megitt on 2014-09-23 (public-tt@w3.org from September 2014)

From: Nigel Megitt <nigel.megitt@bbc.co.uk>
Date: Tue, 23 Sep 2014 10:15:14 +0000
To: Glenn Adams <glenn@skynav.com>
CC: Timed Text Working Group <public-tt@w3.org>
Message-ID: <D046EAFB.11A4F%nigel.megitt@bbc.co.uk>

Glenn Adams <glenn@skynav.com<mailto:glenn@skynav.com>>, Monday, 22 September 2014 22:14 wrote:
On Mon, Sep 22, 2014 at 8:38 AM, Nigel Megitt <nigel.megitt@bbc.co.uk<mailto:nigel.megitt@bbc.co.uk>> wrote:
Glenn, Courtney, all,

The edit to TTML2 ascribed to issue-270 and issue-335 (https://dvcs.w3.org/hg/ttml/rev/3cbc109b90bd) is causing me some concern. I have added notes to both those issues, and additionally I have a number of queries to raise for discussion:

Concerns

1. it appears to define an addition/subtraction operation on SMPTE time values even if they're discontinuous. The processing of these seems to be undefined, so they should be disallowed, shouldn't they?

I had intended to add material to deal with the discontinuous smpte mode, but it didn't get into the edit. Will add.

2. It blurs the layers of interpretation of time values from documents up into any external context. For example it opens up the ambiguity that, when a sequence of TTML documents is wrapped e.g. in ISOBMFF, there are media time offsets available both in TTML and in the wrapper, and authors may be unclear whether they are intended as independent (additive) offsets or as duplicate offsets in which one may be considered not for processing, i.e. metadata.

Since TTML doesn't know anything about external wrapper metadata, it isn't the right place to deal with such possible ambiguity (e.g., in different offset values internal and external). The correct place to deal with this is in the external spec.

Since those external specs already exist we should work in sympathy with them rather than redefining what's already there and creating confusion. Can we avoid redefining TTML so that it invalidates external wrappers that should be independent?

3. It is actually the opposite proposal to the one I made in Issue-335: I've added a note there and re-opened it.

4. If clock time is prohibited from using media offset because the discontinuityOffset can not be derived in the absence of a date, then I would certainly be happy to propose the addition of a date value. A use case for this is when a TTML document is created as an archive artefact by a processor that observes some real world timed events and converts them into TTML.

My reason for excluding clock mode is because it doesn't have a related media object.

Ah, right. There may in fact be a related media object, but the temporal relationship would be indirect, and mediated by the clock rather than some other time embedded in the media.

5. It does nothing to address the scenario where the media time corresponding to the beginning of the related media object is known at authoring time, and is non-zero. This media begin time is distinct from, and possibly earlier than, the beginning of the contents of the TTML document.

I don't understand this statement, since this is precisely what ttp:mediaOffset does: allow the beginning of the root temporal extent to be offset either before or after the beginning of the related media object.

ttp:mediaOffset doesn't do that though: it merely allows for times in the document to be offset prior to processing. It doesn't extend the root temporal extent beyond the document's contents.

I'm puzzled by this: in your ISD generation use case, if the TTML document were untimed but you knew ttp:mediaOffset then how would you derive the begin time of the first ISD? ttp:mediaBegin would define the begin time of the first possible ISD without further calculation, unless you also want to map the times into another time base.

There are two distinct one-dimensional temporal coordinate spaces here that are potentially related:

* document's temporal coordinate space, call this TIME(document)
* origin is at ORIGIN(document), which is always ZERO (0)
* has begin time BEGIN(document)
* has explicit or implied duration of DUR(document)
* so root temporal extent is always the open interval:
* [ 0, DUR(document) )
* related media object's temporal coordinate space, call this TIME(media)
* origin is at ORIGIN(media)
* has begin time BEGIN(media)
* has explicit or implied duration of DUR(media)
* so media temporal extent is always the open interval:
* [ BEGIN(media), BEGIN(media) + DUR(media) )

Since TIME(media) may have a different play rate or frame rate to TIME(document) I think we need to introduce the concept of evaluation time of this parameter, since conversion between the document time base and the media time base may only be achievable by a simple addition at one instant.

However the play rate of the media may not be known, so I've assumed that any time base mapping must be external to the document, and that what we need to do to ensure that BEGIN(document) aligns with the right point in the media's temporal coordinate space is to define a known fixed datum in the media, in the document's time base, and require the processor to map the temporal coordinate spaces.

The intent of ttp:mediaOffset is to express the delta between BEGIN(document) and BEGIN(media):

That's not what I expect from a parameter called mediaOffset – I'd certainly been reading it as ORIGIN(document) - ORIGIN(media).

* if ttp:mediaOffset > 0, then BEGIN(document) temporally follows BEGIN(media)
* if ttp:mediaOffset < 0, then BEGIN(document) temporally precedes BEGIN(media)

Note that this definition is arbitrary: we could invert the meaning if we wish. In any case, the current language decodes as follows:

Given ttp:mediaOffset = +10s, then <body begin="5s"/> means that body starts at 15s after BEGIN(media).

That seems to be an offset of ORIGIN(document) - ORIGIN(media) that must be evaluated one time only, at BEGIN(document) or 5s in the document.

This is still problematic, since it's content dependent. Consider that two videos Va and Vb both have continuous timecode where the beginning of the programme is at 10:00:00. Va has dialogue and a corresponding TTML document Ta such that BEGIN(Ta) = 10:01:00 and Vb has Tb where BEGIN(Tb)=10:05:00. I would state that the more useful parameter would be identical in both documents, i.e. mediaBegin="10:00:00", so that any processor can start the effective clock (e.g. a frame counter) ticking at the same point, rather than having to evaluate at the arbitrary point that is BEGIN(document).

My proposed ttp:mediaBegin would have the value "10:00:00" in these cases, and not mix in the concept of mapping between the document's temporal coordinates and the related media's temporal coordinates. The play rate in the document's time base is well defined as now. It's reasonable to assume that any media playback device knows when the related media begins and what it's play rate is.

Or, given ttp:mediaOffset = -5s, then <body begin="5s"/> means that body starts at BEGIN(media).

Given this formalism, we don't really care about BEGIN(media) - ORIGIN(media).

Agreed. What we care about is BEGIN(media) in the temporal coordinate space of the document, or in your useful terminology, in TIME(document).

Now, if you are suggesting an alternative use case where ORIGIN(document) != 0 in the TIME(document) coordinate space, then that is something I haven't considered, and certainly did not intend to address. Indeed, doing so would be problematic since SMIL timing semantics assumes that unspecified begin defaults to 0s, and further, that 0s corresponds to ORIGIN(document).

I'm not suggesting that ORIGIN(document) !=0 in TIME(document), since that would as you say create a whole bunch of other problems.

My response to such a proposed use case would probably be: we don't support it, you don't need to do it anyway, so don't do it.

Note that the above considerations assume that time base is media, or that time base is smpte continuous mode, or that time base is smpte discontinuous mode and that all smpte time events have been converted to equivalent smpte continuous mode values, e.g., by playing back a media object in 1X normal play mode and recording the PTS time that corresponds with each frame associated with a smpte time label.

Just for completeness (at the expense of being repetitious), did you also assume that the media play rate is identical to the document's play rate, i.e. that the only difference between TIME(media) and TIME(document) is an additive offset?

Proposals

I would propose a resolution to points 1, 2, 3 and 5 that is to remove mediaOffset and add a ttp:mediaBegin parameter, expressed in the same time base as the document's ttp:timeBase parameter. This also fits better with ttp:mediaDuration.

Hmmm. I'm not inclined to make this change, because mentally I see mediaOffset as expressing a difference/delta/offset between two points in two different one-dimensional coordinate spaces both representing linear time (at 1X play rate). Calling it mediaBegin implies in my mind BEGIN(media), i.e., the delta between BEGIN(media) and ORIGIN(media), and not the delta between BEGIN(document) and BEGIN(media).

If this is just about the name we choose for the parameter then we're right to choose carefully, but it shouldn't prevent us from agreeing the semantics. To my mind mediaBegin does suggest the delta between BEGIN(document) and BEGIN(media), both in TIME(document). Whereas to me mediaOffset suggests the delta between ORIGIN(document) in TIME(document) and ORIGIN(media) in TIME(??? - this is not clear), which if I understand correctly isn't what you intend. Or if it is what you intend it doesn't seem to be a complete solution for the problem.

I would additionally propose allowing dates to be specified to use in relation to clock times to resolve point 4, perhaps with a ttp:date parameter, valid only when ttp:timeBase="clock". Note that this does not resolve any time comparison issues caused by documents whose times cross midnight and wrap back round to a smaller number of hours.

Again, I'm wondering what is the related media object? To my recollection, ttp:timeBase="clock" was added to TTML to handle timed text cases that don't have a related media object.

It would be a media object that had also been captured with reference to a clock.

Are there other related use cases or requirements not met by these proposals?

Kind regards,

Nigel

Received on Tuesday, 23 September 2014 10:15:47 UTC