TTML in HTML 5 - OPEN

The following is a Change Proposal for Issue 329
Owner: Jerry Smith.
Date: 15 May 2013.

Abstract

This document specifies Mapping to the HTML5 Track Model and the rules for updating the text track rendering when referencing a TTML file from a <track> element in HTML5. The TTML format (Timed Text Markup Language is a W3C format intended for marking up external timed track resources.

Document format

This specification defines the operation of a file referenced by an HTML5 track element that is capable of being reduced to a TTML Infoset as defined by A Reduced XML Infoset. The operations described herein are defined on the TTML Infoset and do not imply any specific syntax.

Display of TTML in HTML 5

The specification of TTML does not define a specific rendering technology, but allows that:

Processors are not required to present TTML documents in any particular way; but an implementation of this model by a TTML Presentation Processor that provides externally observable results that are consistent with this model is likely to lead to a user experience that closely resembles the experience intended by the documents' authors TTML Styling

This document describes a mechanism for presenting TTML in HTML 5 with CSS in a manner that will lead to such a user experience in a typical user agent. This consists of the following steps:

On retrieving the track src document
- Compute inline style sets for the source TTML
- Construct the list of event times for the source TTML
- For each event time:
  - Calculate the TTML cue object list
  - For each TTML cue object
    - Construct the corresponding styled HTML document fragment.
While media is playing:
- Generate the CSS blocks for the active cue objects at current media playback time.
- Insert CSS blocks into rendered output.

TTML Style resolution

TTML offers three mechanisms for defining the equivalent of HTML inline style. The nested and referential styles of TTML being used to avoid having large numbers of repeated attributes on each element, and allow groups of styles to be applied all at once; this shorthand mechanism is however entirely equivalent to CSS inline styles. TTML does not define an applicative mode of style application itself, but does not preclude the use of a mechanism such as CSS additionally being used for this purpose, in such a case TTML style application would all have the same specificity as inline styles. TTML Style inheritance is that of CSS, applied to the intermediate synchronic documents. That is the document produced after the content is selected into the relevant region and inactive content timing rules applied. Thus external stylesheets are also be applied to this intermediate document.

The Specified Style Set of properties is computed for each TTML element:

The initial specified style set for each element is set to empty
Style properties referenced by the affected element using the style attribute (referential styling) are processed in the following manner:
- For each style element referenced by a style attribute on the affected element and in the order specified in that style attribute;
  - if the referenced style element is a descendant of a styling element, merge the specified style set of the referenced element into the specified style set of the affected element.
Style properties referenced through child nodes of the affected element (nested styling) are processed in the following manner:
- For each style element child of the affected element, and in the specified order of child elements, merge the specified style set of the child element, into the specified style set of the affected element.
Style properties that are applies as attributes on the the affected element using the style attribute (inline styling) are processed in the following manner:
- For each style property expressed as a specified styling attribute of on the affected element merge that property into the specified style set of the affected element.

TTML cue object construction

A set of TTML cue objects are constructed from the referenced TTML file by evaluating the TTML Infoset at the TTML cue event times, that is, the set of time coordinates where some element becomes temporally active or inactive. The TTML Infoset is mapped once for each time coordinate in the TTML cue event times to a list of TTML cue objects as defined below. This list is then generates a list of HTML5 TextTrackCues which are appended to the HTML 5 TextTrackCueList under construction for the TTML Infoset instance.

Each region active at the TTML cue event time in the TTML Infoset will map to one TTML cue object in the list. If there is no region specified in the TTML Infoset, then the default region is used, and there will be at most one TTML cue object in the list for each event time.

Evaluating the TTML cue event times

Map the TTML Infoset to a set of event times by recursively walking the Information Elements starting at the root, and annotating each Information Element with its computed start and end times, based on the begin, end and dur attributes; and recursively for each of the nodes children. The initial time containement context is par, and the initial reference start and end times are that of the external time context to which the timed track applies. In the context of HTML5 <track> elements, this external time context is the media duration.

Compute time intervals for an Information Element based on the time containment context, a reference start time and a reference end time in the following manner:

	   Set startTime and endTime to the zero time
			
	   Compute the beginning of the current element interval:
		  Set begin to the value of the "begin" attribute if present,
                  or the zero time otherwise

		  Set startTime to the reference start time + begin;

	   Compute the simple duration of the interval:  
                  Set defaultDur, dur and end to 0.
  (Note that par children have indefinite default duration, while seq children have 
   zero default duration. indefinite is truncated to the reference end time)
		   
	   If the "dur" attribute is not set and the "end" attribute is not set
               If the time container context is Par
	           If startTime is less than the reference end time
		       Set defaultDur to the greater of zero and reference end time - startTime 
                       Set endTime to the reference end time
		   Else
		       Set endTime to the zero time


           Else If the "dur" attribute is set and the "end" attribute is set
                Set dur to the value of the "dur" attribute 
                Set end to the value of the "end" attribute 
                Set endTime to the least of startTime + dur, the reference start time + end, and the reference  end time
            
           Else If the "end" attribute is set 
                Set end to the value of the "end" attribute
                Set endTime to the least of the reference start time + end, and the reference end time
            
           Else where only "dur" is present
                Set dur to the value of the "dur" attribute 
                Set endTime to the least of the reference start time + dur, and the reference end time


           If endTime is less than startTime set endTime to startTime

           Add startTime and endTime to the set of cue event times if not already present

           Set seqTime to startTime

           For each child element
               If the "timeContainer" attribute is not set to "seq"
                    Compute the time intervals for child where the reference start and end times are startTime and endTime respectively and the time container context is Par. 
               Else
                    Compute the time intervals for child where the reference start and end times are seqTime and endTime respectively and the time container context is Seq. 
                    Set seqTime to the endTime of child.

TTML cue event times

TTML cue event time

An element is temporally active at time t, if the computed start time of the element is less than or equal to t, and t is less than the computed end time of the element. The TTML cue event times are those times where some element changes state from temporally inactive to temporally active or vice versa; that is, the set of computed start and end times in the annotated tree placed in order.

Evaluating the TTML document instance at event time

TTML Intermediate Document

Create the TTML Intermediate Document by mapping the TTML source document to a set of active regions at each TTML cue event time as follows:

For each temporally active region element replicate the sub-tree of the source document headed by the body element;
Evaluating this sub-tree in a post-order traversal, prune elements if they are: not a content element, if they are temporally inactive, if they are empty, or if they aren't associated with the current active region;
If the pruned sub-tree is non-empty, then reparent it to the current active region element
Add the current active region to the output list.

A content element is associated with a region according to the following ordered rules, where the first rule satisfied is used and remaining rules are skipped:

If the element specifies a region attribute, then the element is associated with the region referenced by that attribute;
If some ancestor of that element specifies a region attribute, then the element is associated with the region referenced by the most immediate ancestor that specifies this attribute;
If the element contains a descendant element that specifies a region attribute, then the element is associated with the region referenced by that attribute;
If a default region was implied (due to the absence of any region element), then the element is associated with the default region;
The element is not associated with any region.

TTML Intermediate Document Object

Each top level region element in the TTML Intermediate Document is a TTML Intermediate Document Object and implements the TTMLEventRegionElement interface.

Example

An example of the processing steps described above is elaborated below, starting with an Infoset corresponding to the XML Example Source Document.

Example Source Document

 <tt tts:extent="640px 480px" xml:lang="en"
  xmlns="http://www.w3.org/ns/ttml"
  xmlns:tts="http://www.w3.org/ns/ttml#styling">
  <head>
	<layout>
	  <region xml:id="r1">
		<style tts:origin="10px 100px"/>
		<style tts:extent="300px 96px"/>
	  </region>
	  <region xml:id="r2">
		<style tts:origin="10px 300px"/>
		<style tts:extent="300px 96px"/>
	  </region>
	</layout>
  </head>
  <body xml:id="b1">
	<div xml:id="d1" begin="0s" dur="2s">
	  <p xml:id="p1" region="r1">Text 1</p>
	  <p xml:id="p2" region="r2">Text 2</p>
	</div>
	<div xml:id="d2" begin="1s" dur="2s">
	  <p xml:id="p3" region="r2">Text 3</p>
	  <p xml:id="p4" region="r1">Text 4</p>
	</div>
  </body>
</tt>

The event times for this document are 0s, 1s, 2s and 3s. The result of performing the processing described above for eah of these times will be an intermediate document containing a sequence of region elements, each region corresponding to a single cue; for example at media time of 0s the following intermediate document containing two cues would be produced:

Example Intermediate Document at 0s

	  <region xml:id="r1" 
		   tts:origin="10px 100px" 
		   tts:extent="300px 96px"> 
		<body xml:id="b1"> 
		    <div xml:id="d1"> 
		        <p xml:id="p1">Text 1</p>
		    </div>
		</body> 
	  </region>
	  <region xml:id="r2"
		  tts:origin="10px 300px" 
		  tts:extent="620px 96px"> 
		  <body xml:id="b1"> 
		     <div xml:id="d1"> 
			<p xml:id="p2">Text 2</p> 
		     </div>
		  </body> 
	  </region>

TTML cue to HTML cue construction rules

To support the timed track model of HTML 5, each TTML Intermediate Document Object in the intermediate document is converted to one or more TextTrackCue objects.

Where the DocumentFragment type is defined in the DOM specification

The HTML5 TextTrackCue members are initialised with the following assignments:

The timed track cue id is set to the value of xml:id of the region used to construct the cue, or "" if the default region is used.
- Regions without id's do not get converted into TextTrackCue's

The timed track cue pause-on-exit flag is set to false unless the role "x-extended-description" is set anywhere in the region's subtree, in which case it is set to true.

The cue start time is set to the current event time.

The cue end time is set to the next event time after the current event time, or the end of the media if the current event time is the last event time.

The getCueAsHTML() method must convert the TTML Intermediate Document Object to a DocumentFragment by applying the TTML cue text DOM construction rules.

The text attribute is a string of the outer XML representation of the region in the intermediate document object corresponding to the cue. TTML Intermediate Document Object.

The display state is disregarded for he subset of video's list of timed tracks that have as their rules for updating the timed track rendering these rules.

The values of attributes vertical, snapToLines, line, position, size, align are not defined by this specification;

TTML cue text DOM construction rules

The DocumentFragment is constructed by converting the TTML Intermediate Document Object into a DOM tree for the Document owner. User agents must create one DocumentFragment node for each TimedTextTrackCue , and populate it with a tree of DOM nodes that is isomorphic to the tree of TTML Intermediate Document Object Tree, using the following mapping of TTML Intermediate Object to DOM nodes:

TTML Intermediate Document Object	DOM node
TTMLEventRegionElement	HTMLElement element node with localName "div" and the namespaceURI set to the HTML namespace.
TTMLEventBodyElement	HTMLElement element node with localName "div" and the namespaceURI set to the HTML namespace.
TTMLEventDivElement	HTMLElement element node with localName "div" and the namespaceURI set to the HTML namespace.
TTMLEventPElement	HTMLElement element node with localName "p" and the namespaceURI set to the HTML namespace.
TTMLEventSpanElement	HTMLElement element node with localName "span" and the namespaceURI set to the HTML namespace.
TTMLEventSetElement	The Specified Style Set of properties of the set element is applied into the Specified Style Set of properties of its parent.
TTMLEventBrElement	HTMLElement element node with localName "br" and the namespaceURI set to the HTML namespace.
TTMLEventMetadataElement	If the TTML source domain is not the same as the referencing HTML domain, then ignore. Otherwise if the metadata contains only text content, append a "data-metadata" attribute to the HTMLElement element associated with the containing TTML node, whose character data is the text of the metadata node, otherwise serialise the child nodes of the metadata element to text and add to the HTMLElement element associated with the containing TTML node in a script element type="text/xml".
TTMLEventAnonymousSpan	Text node whose character data is the text of the anonymous span.
Other Element types	If the TTML source domain is the same as the referencing HTML domain, then copy the nodes in their existing namespace; otherwise ignore.

The ownerDocument attribute of all nodes in the DOM tree must be set to the given document owner.

Style application using CSS

For each HTMLElement in the document fragment constructed above , create a CSSStyleDeclaration and add to it the styles as defined by the ordered rules below and add the CSSStyleDeclaration to the style attribute on the HTMLElement.

apply the following default styles:
1. If the corresponding TTML element was region:
  1. call setProperty with propertyName="display", value="table", priority="".
  2. call setProperty with propertyName="table-layout", value="fixed", priority="".
2. If the corresponding TTML element was body:
  1. call setProperty with propertyName="display", value="table-cell", priority="".
  2. call setProperty with propertyName="height", value="100%", priority="".
If the specified style set computed for the corresponding TTML element is not empty
1. If the specified set contains the property backgroundColor call setProperty with propertyName="background-color", value=<color value>, priority="".
2. If the specified set contains the property color, call setProperty with propertyName="color", <color value>, priority="".
3. If the specified set contains the property direction, call setProperty with propertyName="direction", value=<direction value>, priority="".
4. If the specified set contains the property display, call setProperty with propertyName="display", value=<display value>, priority="".
5. If the specified set contains the property displayAlign, then for the body element child of the element call setProperty with propertyName="vertical-align", value=<align value>, priority="".
6. If the specified set contains the property extentand the TTML element was region, call setProperty with: propertyName="width", value=<width value>, priority="" and propertyName="height", value=<height value>, priority="". If extent is not set and the TTML element was region (e.g. the region is the default region), set height and width of the div to auto.
7. If the specified set contains the property fontFamily, call setProperty with propertyName="font-family", value=<font-family value>, priority="".
8. If the specified set contains the property fontSize, call setProperty with propertyName="font-size", value=<font-size value>, priority="".
9. If the specified set contains the property fontStyle, call setProperty with propertyName="font-style", value=<font-style value>, priority="".
10. If the specified set contains the property fontWeight, call setProperty with propertyName="font-weight", value=<font-weight value>, priority="".
11. If the specified set contains the property lineHeight, call setProperty with propertyName="line-height", value=<line-height value>, priority="".
12. If the specified set contains the property opacity and the TTML element was region, call setProperty with propertyName="opacity", value=<opacity value>, priority="" (CSS3).
13. If the specified set contains the property origin and the TTML element was region, call setProperty with: propertyName="position", value="absolute", priority="", propertyName="left", value=<left value>, priority="" and propertyName="top", value=<top value>, priority="".
14. If the specified set contains the property overflow and the TTML element was region, call setProperty with propertyName="overflow", value=<overflow value>, priority="".
15. If the specified set contains the property padding and the TTML element was region, call setProperty with propertyName="padding", value=<fpadding value>, priority="".
16. If the specified set contains the property showBackground and the TTML element was region, then if the div has no children call setProperty with propertyName="display", value="none", priority="".
17. If the specified set contains the property textAlign, call setProperty with propertyName="text-align", value=<text-align value>, priority="".
18. If the specified set contains the property textDecoration, call setProperty with propertyName="text-decoration", value=<text-decoration value>, priority="".
19. If the specified set contains the property textOutline, call setProperty with propertyName="text-outline", value=<text-outline value>, priority="" (CSS3).
20. If the specified set contains the property unicodeBidi, call setProperty with propertyName="unicode-bidi", value=<bidi value>, priority="".
21. If the specified set contains the property visibility, call setProperty with propertyName="visibility", value=<visibility value>, priority="".
22. If the specified set contains the property wrapOption with value noWrap, call setProperty with propertyName="whitespace", value="nowrap", priority="".
23. If the specified set contains the property writingMode, call setProperty with propertyName="writing-mode", value=<writing-mode value>, priority="" and call setProperty with propertyName="text-orientation", value="upright", priority="" (CSS3).
24. If the specified set contains the property zIndexand the TTML element was region, call setProperty with propertyName="z-index", value=<z-index value>, priority="".

Map the following elements in the #metadata namespace to attributes on the parent HTMLElement as follows:

ttm:title : copy text content to the title attribute

Map attributes in the #metadata namespace on the TTML DOM element to attributes on the HTMLElement as follows:

ttm:agent : add the value of this attribute to the class attribute.
ttm:role : add the value of this attribute to the class attribute.

If a role attribute on a span or p element contains:

x-term - call setProperty with propertyName="display", value="ruby-base", priority="" on the CSSStyleDeclaration referenced by the style attribute on the HTMLElement.
x-gloss - call setProperty with propertyName="display", value="ruby-text", priority="" on the CSSStyleDeclaration referenced by the style attribute on the HTMLElement.
x-gloss-paren - call setProperty with propertyName="display", value="none", priority="" on the CSSStyleDeclaration referenced by the style attribute on the HTMLElement.
x-nav-section insert anchor element as prior sibling with href set to the element id .(tentative)
x-hyperlink wrap element in <a> (tentative)

In addition add the value "cue" to the class attribute on the the HTMLElement created for the region element.

Copy xml:lang attribute if present on the TTML DOM element to the HTMLElement as the lang attribute.

If the host element importing the TTML has an id attribute, then copy xml:id attribute value if present on the TTML DOM element to the HTMLElement as the id attribute if such addition preserves the unique id requirements of the importing document. Otherwise do nothing

The xml:space attribute on an element, if the value is 'preserve', then call setProperty with propertyName="white-space'", value="pre", priority="".

The following properties should be set to the given values:

propertyName="margin'", value="0pt",
propertyName="border'", value="0pt",
propertyName="padding'", value="0pt",
propertyName="overflow'", value="hidden",

All characteristics of the DOM nodes that are not described above or dependent on characteristics defined above must be left at their initial values.

Assign aditional styles to the HTMLElement using the sylesheets that apply to the referencing document using the CSS cascade as if the element were a sibling element immediately after the HTML element in the DOM that referenced the TTML document. [Note I'd prefer this to be child, however I think the HTML rules will mean it is not displayed]

For example in the reference below, the specified stylesheet will set the default text colour of all cues to green.

Example TTML styling with external style

HTML fragment:
	  <video class ="v1" controls src='example.mp4'>
            <track kind='subtitles' srclang='en' label='English'src='example.ttml' default >
	  </video>

External style:
      
      video.v1 + div p { color: green }  // equivalent to the below..
      video.v1:cue(p) { color: green }

Continuing the prior example, the two HTML fragments returned by GetCueAsHtml() for the resulting TimedTextTrackCue objects will be as follows:

Example HTML Fragments Output

Fragment1:
	<div id="r1" 
         style="margin:0pt; border:0pt; padding:0pt; overflow:hidden; 
		        position:absolute; left:10px; top:100px;
		        width:300px; height:96px"> 
	   <div id="b1"> 
	     <div d="d1"> 
		<p id="p1">Text 1</p>
	     </div>
          </div> 
	</div>

Fragment 2:
	<div id="r2"
	     style="margin:0pt; border:0pt; padding:0pt; overflow:hidden; 
		        position:absolute; left:10px; top:300px;
		        width:620px; height:96px"> 
	    <div id="b1"> 
	        <div id="d1"> 
		    <p id="p2">Text 2</p> 
		</div>
	    </div> 
	</div>

Style values

The mapping from TTML style values into the equivalent CSS is as follows:

color value - Convert the TTML color to its RGBA equivalent, and set value to equivalent CSS rgba(R,G,B,A). [CSSCOLOR]
direction value - map to like named CSS values.[CSS]
display value - map none to like named CSS value, map auto to CSS block if the HTML element is not span or text node, CSS inline otherwise.[CSS]
align value - map before to top, map center to middle and map after to bottom.
font-family value - copy value.[CSS]
font-size value - map to like named metrics from CSS.[CSS], except for cell based values which are calculated as <value> * Rh/Cv or value * Rw/Ch(see Rendering rules) and denoted as CSS px metric.
font-style value - map to like named values from CSS.[CSS]
font-weight value - map to like named values from CSS.[CSS]
height value - Use the second value in the extent pair, cell based values are calculated as <value> * Rh/Cv or value * Rw/Ch(see Rendering rules) and denoted as CSS px metric.
left value - Use the first value in the origin pair], cell based values are calculated as <value> * Rh/Cv or value * Rw/Ch(see Rendering rules) and denoted as CSS px metric.
line-height value - map to like named values from CSS.[CSS]], except for cell based values which are calculated as <value> * Rh/Cv or value * Rw/Ch(see Rendering rules) and denoted as CSS px metric.
opacity value - map to like named values from CSS.[CSS3]
overflow value - map to like named values from CSS.[CSS]
padding value - map to like named values from CSS.[CSS]
text-align value - map left, center and right to like named values from CSS. If direction is ltr, map start and end to left and right respectivley, if direction is rtl map start and end to right and left respectively. [CSS]
text-decoration value - not map noUnderline noLineThrough and noOverline to none; map lineThrough to line-through, otherwise map to like named values from CSS.[CSS]
text-outline value - map to text-shadow.[CSS3]
top value - Use the second value in the origin pair, cell based values are calculated as <value> * Rh/Cv or value * Rw/Ch(see Rendering rules) and denoted as CSS px metric.
bidi value - map bidiOverride to bidi-override, otherwise map to like named values from CSS.[CSS]
visibility value - map to like named values from CSS.[CSS]
width value - Use the first value in the extent pair], cell based values are calculated as <value> * Rh/Cv or value * Rw/Ch(see Rendering rules) and denoted as CSS px metric.
writing-mode value - (preliminary until CSS3 is finalized) map lr,lrtb,rl to horizontal-tb; map tb, tbrl to vertical-rl; and map tblr to vertical-lr from CSS3.[CSS3].
z-index value - The value is calculated from the z value of the media element that references the track in such a way that the media element rendering area (including any controls) will lie immediately behind the CSS boxes created for the cue elements, and the next immediately higher CSS box in the HTML page will lie in front of all CSS boxes created by cues.

Rendering Rules

Create a set of CSS boxes in relation to the rendering area of the media element as follows:

If the media element is a playback mechanism with no rendering area, abort these steps. There is nothing to render.
Let video be the media element or other playback mechanism
if the TTML document has a parameter attribute that sets the cell resolution let Ch and Cv be the horizontal and vertical resolutions respectively, otherwise let Ch be 32 and Cv be 15.
If the TTML document sets the extent attribute on the root tt element let Rw and Rh be the width and height from that extent respectively, otherwise let Rw and Rh be 0.
if Rh is not 0 then let the initial font height Fh be Rh/Cv, otherwise let Fh be 32px
Let textArea be a CSS containing block corresponding to the following style settings:
1. Let font-family be monospace;
2. Let font-size be Fhpx
3. Let line-height be Fhpx
4. Let position be absolute;
5. Let top be 0px;
6. Let left be 0px;
7. Let height be Cvem
8. Let width be calc(Ratio * Chem), where Ratio is the aspect ratio of the font height to character advance of the monospace font.
9. Let writing mode (CSS3) for textArea be horizontal-tb
10. Let transform-origin be 0% 0%;
11. Let transform be that transform which causes the block to exactly cover the rendering area for video.
12. Let background-color be transparent;
13. Let padding be 0px;
14. Let margin be 0px;
15. Let border be 0px;
Let tracks be the subset of video's list of timed tracks that have as their rules for updating the timed track rendering these rules, and whose timed track mode is showing
Let cues be an empty list of TextTrackCueList.
For each track in tracks, append to cues all the cues computed as above for each each TTML cue event time.
On each change of the media time
1. Clear the set of CSS boxes in textArea
2. For each timed track cue that is active at the current media time, run the following substeps:
  1. Let nodes be the HTML DocumentFragment corresponding to the cue.
  2. Apply the terms of the CSS specifications to nodes to obtain a set of CSS boxes relative to the CSS box created for the root div element in the HTML DocumentFragment , which is in turn relative to textArea : [CSS].
  3. Add the CSS boxes to textArea.

Active

A timed track cue is considered active if it has content selected into it because it is temporally active, in addition to the active flag mechanism in HTML5.

CSS extensions

The pseudo class :cue may be mapped to a sibling/descendant operator pair such that the selector:

E:cue(X)

is equivalent to the selector

E + div.cue X.

The :future and :past elements psuedo classes do not select content converted from TTML Infoset, as content is only present in a region (and therefore mapped to a DOM element) when it is temporally active (i.e. in the present). The functionality implied by these selectors is handled in TTML with the <set> element.

Metadata roles

The following roles may have additional meaning when used in conjunction with this specification:

x-term Indicates the marked text to be a defined term.
x-gloss Indicates the marked text to be gloss text for a defined term - for example, to be styled as Japanese furigana using CSS3 ruby style (preliminary until CSS3 is finalised)
x-gloss-paren Indicates the marked text to be a marker character around gloss text for a defined term
x-extended-description Indicates that the marked text, if read out as audio (whether pre-recorded or generated from text), may have a duration longer than the elements active duration. User agents may alter the playback of external media to compensate.
x-nav-section Indicates the marked text may be interpreted as a media reference equal to to the computed start time of the element.
x-hyperlink Indicates the marked text may be interpreted as a link to an external resource

Appendix: IDL interfaces for TTML DOM objects

None defined at this time.

TTML/changeProposal005