TTML and derivative Captions Formats.

In recent years with the convergence of traditional broadcast television and the Web, more video content from television is coming to the web and the web is now running on televisions. Timed Text Markup Language (TTML) helps to bridge the TV and web worlds.
Typical applications of timed text are the real time subtitling of foreign-language movies on the Web, captioning for people lacking audio devices or having hearing impairments, karaoke, scrolling news items or teleprompter applications.
TTML allows extensibility and profiling, therefore multiple captions format have been developed by the W3C or other external organizations.
This document explains the relations of TTML with its derivatives caption formats.

1-Introduction

The first captions appeared in analog television broadcasts, giving the hard of hearing access to the audio content of television. Since then, captions have become ubiquitous across digital television and are a growing presence in online video. Multiple languages and regulatory policies across national boundaries have led to the development of multiple caption formats with the evolution of television from analog to digital and the current era of online video content.
There are basically 3 categories of caption formats:

1.1 Captions format for Analog television

CEA-608 is the primary format for in-band captions in analog NTSC transmissions in the USA. It continues to be used in both digital and online video through "608 over 708". It is character-based and tailored for delivery/authoring.
Teletext is mainly used in Europe. It is character-based (can also be image-based). Teletext is the dominant legacy format for in-band captions in analog PAL transmissions. It is still used in the digital television and online video via DVB Teletext.
EBU-STL is a legacy, out-of-band caption file format, used primarily in Europe for authoring and interchange.

1.2 Captions format for Digital television

CEA-708 is the primary format for in-band captions in digital ATSC transmissions (digital terrestrial transmission), in the USA.
- The "608 over 708" portion of the format is one of the online caption delivery formats in use today.
- Native 708 is character-based. Tailored for delivery. In ATSC and online video, 708 captions are carried within the MPEG-2 TS stream.
DVB Teletext: (ETSI 301 775) mainly used in Europe. DVB Teletext embeds Teletext in a DVB MPEG-2 TS stream.
DVB Subtitles: mainly used in Europe. Tailored for delivery. DVB Subtitles allows either image-based or character-based captions to be embedded within a DVB MPEG-2 TS stream

1.3 Online Captions formats

With the new era of video on line, new captioning formats have arise:

TTML is an Authoring, Interchange and Distribution format. It is an advanced and complex caption format, used mainly by the broadcast industry and online applications.
WebVTT is a distribution format. It is a simple format used in Web Browsers (IE, Firefox, Chrome and Safari) and Native Players (iOS and Android). It integrates well with HTML5.
608 is the U.S. delivery alternative implemented in Native Players (iOS and Android)

2- W3C initiatives for online Video captions

2.1-TTML

Timed Text Markup Language (TTML1) is textual information that may be used directly as a distribution format for online captioning and subtitles, and as an interchange format among legacy distribution content formats. It is a superset that encompasses preceding captioning approaches. It supports the semantics of most of the closed caption files.It is used in the television industry for the purpose of authoring, transcoding and exchanging timed text information and for delivering captions, subtitles.

It is a text format, based on multiple W3C technologies such as XML, SMIL for the timing semantics, XSL-FO and CSS for presentation and layout of text.

TTML1 diagram

TTML exposes multiple types of elements and attributes:

Content elements are HTML-like elements (such as <div>, <p>, <span>) that contain the caption text.
Timing attributes specify the time interval during which content should be visible. Timing attributes may also be applied to layout and animation elements.
Style elements specify the appearance of content via a simple XML-based styling system.
Layout elements specify the layout properties (such as the bounding boxes) of content.
Animation elements can be used to alter the style of text at particular times.
Metadata elements specify additional metadata about the presentation.

Formerly known as Distribution Format eXchange Profile (DXFP).
TTML1 was specified by the W3C Timed Text Working Group and released as a W3C Recommendation in November 2010.

A second edition of TTML1 Recommendation was published in September 2013 addressing errata and comments received since the Recommendation was first published.

There is partial and full support of TTML in components used by several Web browser agents, and in a number of caption authoring tools.

2.2- TTML profiles and extensions specified by the W3C

TTML allows for the definition of profiles, and extension mechanism. The TTML1 specification defines a 3 profiles:

the Full Profile includes all features of TTML.
the Presentation Profile is a subset, intended to be used to express minimum compliance for presentation processing.
the Transformation Profile is another subset, intended to be used to express minimum compliance for transformation processing.

SDP-US

TTML Simple Delivery Profile for Closed Captions (SDP-US) profile is focused on streamlined delivery of closed captions on the Internet.
This interoperability profile supports core TTML1 features to deliver legacy formats such as CEA-608 and CEA-708 content, and as such, is targeted primarily for use in US markets. SDP-US is a proper subset of TTML1 intended to support features required for US Government closed captioning requirements for online presentation. It does not provide extensions to TTML1.
This profile was published by the Timed Text Working Group on 05 February 2013.

IMSC1

With the creation and the use of multiple TTML flavors of captions formats specified by other organizations, e.g.EBU-TT, SMPTE-TT and CFF-TT (see description following in this document) developed by other organizations, W3C to simplify interoperability among these TTML profiles, specifies IMSC.

TTML Text and Image Profiles for Internet Media Subtitles and Captions 1.0 (IMSC1) is a pair of TTML1 profiles developed primarily as a candidate format for the Interoperable Media Format (IMF) effort.
It specifies a text-only profile and an image-only profile.
These profiles are intended to be used across subtitle and caption delivery applications worldwide, thereby simplifying interoperability, consistent rendering and conversion to other subtitling and captioning formats.
The text profile is a superset of SDP-US and a superset of CFF-TT
IMSC1 defines extensions to TTML1, as well as incorporates extensions specified in :

SMPTE-TT Timed Text Format (SMPTE ST 2052-1) from the Society of Motion Picture Television Engineers
EBU-TT-D (Tech 3380, EBU-TT-D Subtitling Distribution Format Version 1.0) from theEuropean Broadcasting Union.

Both profiles are based on the Common File Format & Media Formats Specification (CFF) developed by Digital Entertainment Content Ecosystem (DECE), and benefit from the technical consensus, conformance testing and implementation experience.

IMSC1 is specified by the Timed Text Working Group and is currently a W3C Candidate Recommendation.

TTML2

TTML2 is under development by the Timed Text Working Group in the W3C.

TTML2 will make it easier to position and style some content, with stereoscopic 3D support and rich smooth animations.
It will also contain a mechanism for providing options that can be manipulated by the display processor or the user to modify the appearance of the content. For example, it will support the common use case of including both captions for deaf and hard of hearing users, and translation subtitles in a single document, and allowing the user to choose whether to show just the translations, or both the translations and the captions.
It will define in more detail the mapping into HTML and CSS fragments for presentation by a user agent.

TTML2 First Public Working Draft was published in February 2015.

3- TTML Profiles from external organizations

The TTML recommendation defines a broad set of features and the XML semantics for how a TTML document will express those features.
External specifications are expected to define profiles, each of which is a set of features, extensions (new features), and requirements to ensure interoperability.

TTML provides an extensibility mechanism to add new features using specific namespaces. The following are TTML restricting or/and extending TTML captions formats, all TTML flavors:

SMPTE-TT Timed Text Format (SMPTE ST 2052-1) from the Society of Motion Picture and Television Engineers.
EBU-TT from European Broadcasting Union (EBU)
CFF-TT developed by Digital Entertainment Content Ecosystem (DECE), Ultraviolet.t

3.1- SMTPE-TT

SMPTE-TT (SMPTE ST 2052 ) defines the SMPTE profile of the TTML1. It is used for representing captions or subtitles.

It adopted the complete set of features from TTML (the Full Profile which includes all features of TTML).
SMPTE-TT also defines some extended standard metadata terms to be used, and adds some extension features not found in TTML.

It aims interoperability with pre-existing and regionally-specific formats (such as CEA-708, CEA-608, DVB Subtitles, and WST (World System Teletext)) is provided by means of tunneling data or bit map images and adding necessary metadata.

Tunneling: The original analog/digital captions may be embedded as binary blobs (e.g. exact original CEA 608 data byte pairs), for “backwards compatibility”.
Images: Background images in order to aid translation from imagebased formats (DVB Subtitles).
Additional Metadata: Information about how a document was translated, origin format (as a URI) and the translation's fidelity level.

SMPTE also provides recommended practices for Conversion from CEA 608 data to SMPTE-TT, (SMPTE RP2052-10) and conversion from 708 to SMPTE-TT. (RP 2052-11).

I allows captions to be delivered either via TTML formatted external files or by embedding the TTML data directly into the video stream.

SMPTE-TT was adopted as "safe harbor interchange and delivery format" by the US federal Communications Commission (FCC) on January 2012.

It is the official caption format of Dynamic Adaptive Streaming over HTTP (DASH-264), an adaptive bitrate streaming technique that enables high quality streaming of media content over the Internet delivered from HTTP web servers.

3.2- EBU-TT

EBU-TT (EBU Tech 3350 v1.1) is the follow-up to the widely formally used EBU STL format (EBU Tech 3264).
It is defined for the interchange and archiving of subtitles.
Developed by the European Broadcasting Union (Europe’s 75 national broadcasters).

It is a profile which constraints the features provided by TTML1, to make it more suitable for the use with broadcast video and web video applications.

It restricts the set of feature as a subset of TTML. (e.g. animation and some style/timing/layout options removed).
It adds more metadata for archival purposes (e.g. program, episode and other information) embedded in the document header.

EBU-TT-D (Tech 3380) is the delivery format (lacks tunneling and metadata features).
EBU Tech 3360 is a mapping specifying how EBU-STL documents should be translated to EBU-TT documents.

3.3- CFF-TT

Common File Format Timed Text (CFF-TT) is a pair of TTML profiles from UltraViolet - Digital Entertainment Content Ecosystem (DECE).
Ultraviolet defines standards and provides services for the online distribution of movies and TV shows (80 member companies).

CFF-TT is a constrained profile of SMPTE-TT for captions and subtitles.
Separates Image and Text Profiles: CFF-TT introduces two profiles that represent either character-based captions or image-based captions. Both may not be used simultaneously.
It extends SMPTE-TT with specific features

Ultraviolet-enabled content is intended to be enjoyed on a wide range of devices and platforms, including UAs.

3.4- Comparaison of TTML profiles

The following diagram illustrates the relations between TTML flavored profiles.

Inclusion, intersection between different TTML based formats

3.5- IMF

The Interoperable Master Format (IMF) project specifies an interoperable set of master files and associated meta-data to enable standard interchange and distribution across multiple channels, including broadcast and internet.

It has a single and interchangeable masterfile format, and interoperable through existing constrained standards.
It minimizes storage, by using mezzanine level compression, and by storing only the differences between an original version and a version file.
It bases its data essence format on SMPTE ST 2052-1

Component-based format for high-quality masters
(SMPTE ST 2067-2)

4- WebVTT

The Web Video Text Tracks Format (WebVTT) is a format for delivery of internet captions.
WebVTT is a single specification (without profiles). Expectation that all implementations implement the entire specification. It is a delivery format for download and particularly streaming applications.
It offers simplicity for authoring and good integration into HMTL5 using the <track> element from HTML5's <video>.
It offers good compatibility with existing user agents.
WebVTT is a text format but unlike TTML, it is not XML based.

A WebVTT file contains the following:

Header: A WebVTT file begins with a header that includes document-scoped metadata.
Cues: A WebVTT file consists primarily of a sequence of cues. A cue is one or more lines of text with an associated time interval. Styles may be applied to the cue text using a simple markup syntax. Portions of the cue text may also be designated to appear at particular times in order to achieve paint-on style animations.
Regions: A cue may also be associated with a region, which specifies the bounding box of the text when rendered on screen.

WebVTT integration with HTML5 is well defined, offering for example a method for CSS to further customize the appearance of cues.

WebVTT is a W3C First Public Working Draft published in November 2014.

WebVTT derives from the format WebSRT (Web Subtitle Resource Tracks) specified by the WHATWG. Work was picked up by the W3C Web Media Text Tracks Community Group. In March 2014 the W3C Timed Text Working Group was chartered to take it to W3C Recommendation.

5- Simple code comparison of TTML and WebVTT

With this simple example, the 2 following files should display the same presentation of captions in a user agent.

TTML WebVTT

TTML	WebVTT
<tt xml:lang="en" xmlns="http://www.w3.org/ns/ttml" xmlns:tts="http://www.w3.org/ns/ttml#styling"> <head> <metadata xmlns:ttm="http://www.w3.org/ns/ttml#metadata"> <ttm:title>TTML Example</ttm:title> <ttm:copyright>Thierry Michel 2015</ttm:copyright> </metadata> <styling xmlns:tts="http://www.w3.org/ns/ttml#styling"> <style xml:id="s1" tts:color="red" tts:textAlign="center" /> </styling> </head> <body> <div> <p xml:id="c1" begin="00:00:00" end="00:00:10"> Hello I am your first line.</p> `<p xml:id="c2" begin="00:00:02" end="00:00:10"> I am your second captions<br/>but with two lines.</p>` `<p xml:id="c3" xml:lang="fr" begin="00:00:04" end="00:00:10"> Je suis le troisième sous-titre.</p>` `<p xml:id="c4" begin="00:00:06" end="00:00:10" > I am another caption with <span tts:fontWeight="bold">Bold</span> and <span tts:fontStyle="italic">Italic</span> styles.</p>` `<p xml:id="c5" begin="00:00:08" end="00:00:10" style="s1"> I am the last caption displayed in red and centered.</p> </div> </body> </tt>`	`WEBVTT title: Web-VTT Example copyright:Thierry Michel 2015 NOTE to style this document, CSS color font must be the specified in the HTML hosting page in the<style> element. c1 00:00.000 --> 00:10.000 Hello I am your first line. c2 00:02.000 --> 00:10.000 I am your second captions but with two lines. c3 00:04.000 --> 00:10.000 <lang fr>Je suis le troisième sous-titre.</lang> c4 00:06.000 --> 00:10.000 I am another caption with <b>Bold</b> and <i>Italic</i> styles. c5 00:08.000 --> 00:10.000 align:middle` `<c.red>I am the last caption displayed in red and centered.</c>` -------------------------------------- In the HTML 5, CSS style: `<style> ::cue(.red) {color:red;} </style>`


                <tt xml:lang="en"

                xmlns="http://www.w3.org/ns/ttml"

                xmlns:tts="http://www.w3.org/ns/ttml#styling">

                 <head>

                    <metadata
                xmlns:ttm="http://www.w3.org/ns/ttml#metadata">

                     <ttm:title>TTML
                Example</ttm:title>

                     <ttm:copyright>Thierry Michel
                2015</ttm:copyright>

                    </metadata>

                

                    <styling
                xmlns:tts="http://www.w3.org/ns/ttml#styling">

                     <style xml:id="s1" tts:color="red"
                tts:textAlign="center"  />

                    </styling>

                </head>

                

                 <body>

                    <div>

                        <p xml:id="c1"
                begin="00:00:00" end="00:00:10">   
                        

                            Hello I
                am your first line.</p>

<p
                  xml:id="c2" begin="00:00:02" end="00:00:10">

                            I am
                your second captions<br/>but with two lines.</p>

<p
                  xml:id="c3" xml:lang="fr" begin="00:00:04"
                end="00:00:10">

                            Je suis
                le troisième sous-titre.</p>

<p
                  xml:id="c4" begin="00:00:06" end="00:00:10" >

                            I am
                another caption with <span
                tts:fontWeight="bold">Bold</span> and <span
                tts:fontStyle="italic">Italic</span> styles.</p>

<p
                  xml:id="c5" begin="00:00:08" end="00:00:10"
                style="s1">

                            I am
                the last caption displayed in red and centered.</p>

                

                    </div>

                 </body>

                </tt>

WEBVTT

              title:  Web-VTT Example

              copyright:

Thierry Michel 2015

               

              NOTE to style this document, CSS color font must be the specified
              in the HTML hosting page in the

<style

&gt;
              element.

              

              c1

              00:00.000 --> 00:10.000

              Hello I am your first line.

              

              c2

              00:02.000 --> 00:10.000

              I am your second captions

              but with two lines.

              

              c3

              00:04.000 --> 00:10.000

              <lang fr>Je suis le troisième sous-titre.</lang>

              

              c4

              00:06.000 --> 00:10.000

              I am another caption with <b>Bold</b> and
              <i>Italic</i> styles.

              

              c5

              00:08.000 --> 00:10.000 align:middle



              <c.red>

I am the last caption displayed in
                red and centered.</c>

--------------------------------------
In the HTML 5, CSS style:

<style>  ::cue(.red) {color:red;} </style>

References:

EBU-TT-D and IMSC1 : Converging TTML for Internet Distribution
@@@