TTML/changeProposal004
Clarifications for Time Expression Semantics - CLOSED-IMPLEMENTED
- The following is a Change Proposal for Issue 199
- Editor: Glenn Adams.
- Date: January 28, 2013.
Summary
TTML does not adequately define how to relate a time expression to media time when the effective frame rate is not an integer.
Details
TTML defines the media
time base as follows [1]:
If the time base is designated as media, then a time expression denotes a coordinate in some media object's time line, where the media object may be an external media object with which the content of a document instance is to be synchronized, or it may be the content of a document instance itself in a case where the timed text content is intended to establish an independent time line.
The media time base is related to local real time in accordance to the related media play rate and the related media real start time (i.e., the real time when the related media playback started), parameters not modeled by TTML itself. The relationship between media time (M) and local real time (R) is as follows:
R = playRate * M + realStartTime
where
M ∈ ℜ | 0 ≤ M < ∞ | M in seconds
playRate ∈ ℜ | −∞ < playRate < ∞ | playRate is unit-less
realStartTime ∈ ℜ | 0 ≤ realStartTime < ∞ | realStartTime in seconds, with 0 being start of epoch
Without loss of generality, we will assume playRate is 1 (one) and realStartTime is 0 for the remainder of this document, which simplifies this relationship to R = M.
Problem Example
A number of common non-integral frame rates occur in common use in the U.S., originally deriving from NTSC video formats. An example of this is an effective frame rate of 30 * 1000/1001 = 29.970029970029… frames per second, or 41.708333... effective frame duration (in milliseconds). In TTML, this frame rate would be denoted as follows:
<tt ttp:frameRate='30' ttp:frameRateMulitiplier='1000 1001' ttp:timeBase='media' ...>
TTML time expressions allow specifying time using either a offset-time or clock-time format [2]. In the case of offset time expressions, one can use a variety of representations, such as a fractional number of seconds, a fractional number of frames, etc. In the case of clock time expressions, one can use a COLON (:
) separated expression that includes hours, minutes, seconds, and, optionally, fraction of seconds or frames and optional sub-frames.
Valid time expressions include:
30f 1s 1.25125s 00:00:01 00:00:01.25125 00:00:01:01 02:00:00:00
In general, it should be possible to convert between the different time expression formats without loss of information; for example, it should be possible to unambiguously convert all valid expressions to a fractional seconds offest or a fractional frames offset expression. However, because TTML does not clearly define this conversion, some ambiguity has appeared among readers as to how to interpret these expressions in the context of non-integral frame rates. In the following sections, two of these example expressions are interpreted according to two different methods.
Method #1
According to this method, the time related components of time expressions refer directly to media time coordinates, while frames refer to time intervals computed from the effective (possibly non-integral) frame rate. Using this method, the following formula applies, where effectiveFrameRate = frameRate * frameRateMultiplier:
M = 60^2 * hours + 60^1 * minutes + 60^0 * seconds + ( frames / effectiveFrameRate )
Therefore, in order to convert the example time expression 00:00:01:01
to media time in fractional seconds, we have:
M = 60^2 * 0 + 60^1 * 0 + 60^0 * 1 + ( 1 / 29.97002997002997 ) = 1.03336666666667s
To further convert this to a fractional frames, we have
M = 1.03336666666667s * 29.97002997002997 (frames per second) = 30.97002997003007f
Finally, to convert this to an integral frame number, where the first frame is frame 1, we have
frame number = floor(30.97002997003007) + 1 = frame 31
Let us compute for one more value, 02:00:00:00
, which yields:
M = 60^2 * 2 + 60^1 * 0 + 60^0 * 0 + ( 0 / 29.97002997002997 ) = 7200s = 7200s * 29.97002997002997 (frames per second) = 215784.215784215784f frame number = floor(215784.215784215784) + 1 = frame 215785
Method #2
According to this method, the time components of time expressions denote their equivalent in integral frames, i.e., without making use of the frame rate multiplier. As such, time expressions using this method operate like a frame counter, or, by analogy, like an odometer which counts off frames. Only when converting from this frame count to media time is the frame rate multiplier used (to compute the effective frame rate). Using this method, the following formula applies, where effectiveFrameDuration = 1 / (frameRate * frameRateMultiplier):
M = (((60^2 * hours + 60^1 * minutes + 60^0 * seconds) * frameRate) + frames) * effectiveFrameDuration
Therefore, in order to convert the example time expression 00:00:01:01
to media time in fractional seconds, we have:
M = (((60^2 * 0 + 60^1 * 0 + 60^0 * 1) * 30) + 1) * 0.03336666666667 seconds/frame = 1.03436666666677s
To further convert this to a fractional frames, we have
M = 1.03436666666677s / 0.03336666666667 seconds/frame = 31f
Finally, to convert this to an integral frame number, where the first frame is frame 1, we have
frame number = floor(31) + 1 = frame 32
Let us compute the results for one more example, 02:00:00:00
, which yields:
M = (((60^2 * 2 + 60^1 * 0 + 60^0 * 0) * 30) + 0) * 0.03336666666667 seconds/frame = 7207.20000000072s = 7207.20000000072s / 0.03336666666667 seconds/frame = 216000f frame number = floor(216000) + 1 = frame 216001
Comparison
As can be seen by comparing the values above, these two different methods yield different values when converting a time expression to media time and frame number:
Method | M(s) | F# |
#1 | 1.033367s | 31 |
#2 | 1.034367s | 32 |
Method | M(s) | F# |
#1 | 7200.0s | 215785 |
#2 | 7207.2s | 216001 |
It is apparent that when method #1 is used, the time expression corresponds more directly with media time. Example #2 shows how a time expression that appears to denote 2 hours corresponds exactly to 7200 seconds (2 hours) when using method #1, but corresponds with 7207.2 seconds (2 hours and 7.2 seconds) when using method #2.
Proposal
I recommend that appropriate language be added to TTML 1.0 SE that explicitly defines the use of method #1 above to convert between time expressions and media time.
Impact
At present, a number of applications of TTML have implicitly read the TTML specification assuming that method #1 applies. These applications will not be impacted by this proposal. In contrast, those applications and implementations of TTML that assumed that method #2 applies will either need to change or will become non-compliant with TTML 1.0 SE semantics.
If method #2 were to be adopted, then the converse would apply.