TTML/changeProposal004

From W3C Wiki

< Change Proposal Index

Clarifications for Time Expression Semantics - CLOSED-IMPLEMENTED

  • The following is a Change Proposal for Issue 199
  • Editor: Glenn Adams.
  • Date: January 28, 2013.

Summary

TTML does not adequately define how to relate a time expression to media time when the effective frame rate is not an integer.

Details

TTML defines the media time base as follows [1]:

If the time base is designated as media, then a time expression denotes a coordinate in some media object's time line, where the media object may be an external media object with which the content of a document instance is to be synchronized, or it may be the content of a document instance itself in a case where the timed text content is intended to establish an independent time line.

The media time base is related to local real time in accordance to the related media play rate and the related media real start time (i.e., the real time when the related media playback started), parameters not modeled by TTML itself. The relationship between media time (M) and local real time (R) is as follows:

R = playRate * M + realStartTime

where

M ∈ ℜ | 0 ≤ M < ∞ | M in seconds
playRate ∈ ℜ | −∞ < playRate < ∞ | playRate is unit-less
realStartTime ∈ ℜ | 0 ≤ realStartTime < ∞ | realStartTime in seconds, with 0 being start of epoch

Without loss of generality, we will assume playRate is 1 (one) and realStartTime is 0 for the remainder of this document, which simplifies this relationship to R = M.

Problem Example

A number of common non-integral frame rates occur in common use in the U.S., originally deriving from NTSC video formats. An example of this is an effective frame rate of 30 * 1000/1001 = 29.970029970029… frames per second, or 41.708333... effective frame duration (in milliseconds). In TTML, this frame rate would be denoted as follows:

<tt ttp:frameRate='30' ttp:frameRateMulitiplier='1000 1001' ttp:timeBase='media' ...>

TTML time expressions allow specifying time using either a offset-time or clock-time format [2]. In the case of offset time expressions, one can use a variety of representations, such as a fractional number of seconds, a fractional number of frames, etc. In the case of clock time expressions, one can use a COLON (:) separated expression that includes hours, minutes, seconds, and, optionally, fraction of seconds or frames and optional sub-frames.

Valid time expressions include:

30f
1s
1.25125s
00:00:01
00:00:01.25125
00:00:01:01
02:00:00:00

In general, it should be possible to convert between the different time expression formats without loss of information; for example, it should be possible to unambiguously convert all valid expressions to a fractional seconds offest or a fractional frames offset expression. However, because TTML does not clearly define this conversion, some ambiguity has appeared among readers as to how to interpret these expressions in the context of non-integral frame rates. In the following sections, two of these example expressions are interpreted according to two different methods.

Method #1

According to this method, the time related components of time expressions refer directly to media time coordinates, while frames refer to time intervals computed from the effective (possibly non-integral) frame rate. Using this method, the following formula applies, where effectiveFrameRate = frameRate * frameRateMultiplier:

M = 60^2 * hours + 60^1 * minutes + 60^0 * seconds + ( frames / effectiveFrameRate )

Therefore, in order to convert the example time expression 00:00:01:01 to media time in fractional seconds, we have:

M = 60^2 * 0 + 60^1 * 0 + 60^0 * 1 + ( 1 / 29.97002997002997 ) = 1.03336666666667s

To further convert this to a fractional frames, we have

M = 1.03336666666667s * 29.97002997002997 (frames per second) = 30.97002997003007f

Finally, to convert this to an integral frame number, where the first frame is frame 1, we have

frame number = floor(30.97002997003007) + 1 = frame 31

Let us compute for one more value, 02:00:00:00, which yields:

M = 60^2 * 2 + 60^1 * 0 + 60^0 * 0 + ( 0 / 29.97002997002997 ) = 7200s
  = 7200s * 29.97002997002997 (frames per second) = 215784.215784215784f

frame number = floor(215784.215784215784) + 1 = frame 215785

Method #2

According to this method, the time components of time expressions denote their equivalent in integral frames, i.e., without making use of the frame rate multiplier. As such, time expressions using this method operate like a frame counter, or, by analogy, like an odometer which counts off frames. Only when converting from this frame count to media time is the frame rate multiplier used (to compute the effective frame rate). Using this method, the following formula applies, where effectiveFrameDuration = 1 / (frameRate * frameRateMultiplier):

M = (((60^2 * hours + 60^1 * minutes + 60^0 * seconds) * frameRate) + frames) * effectiveFrameDuration

Therefore, in order to convert the example time expression 00:00:01:01 to media time in fractional seconds, we have:

M = (((60^2 * 0 + 60^1 * 0 + 60^0 * 1) * 30) + 1) * 0.03336666666667 seconds/frame = 1.03436666666677s

To further convert this to a fractional frames, we have

M = 1.03436666666677s / 0.03336666666667 seconds/frame = 31f

Finally, to convert this to an integral frame number, where the first frame is frame 1, we have

frame number = floor(31) + 1 = frame 32

Let us compute the results for one more example, 02:00:00:00, which yields:

M = (((60^2 * 2 + 60^1 * 0 + 60^0 * 0) * 30) + 0) * 0.03336666666667 seconds/frame = 7207.20000000072s
  = 7207.20000000072s / 0.03336666666667 seconds/frame = 216000f

frame number = floor(216000) + 1 = frame 216001

Comparison

As can be seen by comparing the values above, these two different methods yield different values when converting a time expression to media time and frame number:

Example 1 - 00:00:01:01
Method M(s) F#
#1 1.033367s 31
#2 1.034367s 32
Example 2 - 02:00:00:00
Method M(s) F#
#1 7200.0s 215785
#2 7207.2s 216001

It is apparent that when method #1 is used, the time expression corresponds more directly with media time. Example #2 shows how a time expression that appears to denote 2 hours corresponds exactly to 7200 seconds (2 hours) when using method #1, but corresponds with 7207.2 seconds (2 hours and 7.2 seconds) when using method #2.

Proposal

I recommend that appropriate language be added to TTML 1.0 SE that explicitly defines the use of method #1 above to convert between time expressions and media time.

Impact

At present, a number of applications of TTML have implicitly read the TTML specification assuming that method #1 applies. These applications will not be impacted by this proposal. In contrast, those applications and implementations of TTML that assumed that method #2 applies will either need to change or will become non-compliant with TTML 1.0 SE semantics.

If method #2 were to be adopted, then the converse would apply.

References