TTML and derivative Captions Formats.

In recent years with the convergence of traditional broadcast television and the Web, more video content from television is coming to the web and the web is now running on televisions.  Timed Text Markup Language (TTML) helps to bridge the TV and web worlds.
Typical applications of timed text are the real time subtitling of foreign-language movies on the Web, captioning for people lacking audio devices or having hearing impairments, karaoke, scrolling news items or teleprompter applications.
TTML allows extensibility and profiling, therefore multiple captions format have been developed by the W3C or other external organizations.
This document explains the relations of TTML with its derivatives caption formats.


The first captions appeared in analog television broadcasts, giving the hard of hearing access to the audio content of television. Since then, captions have become ubiquitous across digital television and are a growing presence in online video. Multiple languages and regulatory policies across national boundaries have led to the development of multiple caption formats with the evolution of television from analog to digital and the current era of online video content.
There are basically 3 categories of caption formats:

1.1 Captions format for Analog television

1.2 Captions format for Digital television

1.3 Online Captions formats

With the new era of video on line, new captioning formats have arise:

2- W3C initiatives for online Video captions


Timed Text Markup Language (TTML1) is textual information that may be used directly as a distribution format for online captioning and subtitles, and as an interchange format among legacy distribution content formats. It is a superset that encompasses preceding captioning approaches. It supports the semantics of most of the closed caption files.It is used in the television industry for the purpose of authoring, transcoding and exchanging timed text information and for delivering captions, subtitles.

It is a text format, based on multiple W3C technologies such as XML, SMIL for the timing semantics,  XSL-FO and CSS for presentation and layout of text.

TTML1 diagram

TTML exposes multiple types of elements and attributes:

Formerly known as Distribution Format eXchange Profile (DXFP).
TTML1 was specified by the W3C Timed Text Working Group and released as a W3C Recommendation in November 2010.

A second edition of TTML1 Recommendation was published in September 2013 addressing errata and comments received since the Recommendation was first published.

There is partial and full support of TTML in components used by several Web browser agents, and in a number of caption authoring tools.

2.2- TTML profiles and extensions specified by the W3C

TTML allows for the definition of profiles, and extension mechanism. The TTML1 specification defines a 3 profiles:


TTML Simple Delivery Profile for Closed Captions (SDP-US) profile is focused on streamlined delivery of closed captions on the Internet.
This interoperability profile supports core TTML1 features to deliver legacy formats such as CEA-608 and CEA-708 content, and as such, is targeted primarily for use in US markets. SDP-US  is a proper subset of TTML1 intended to support features required for US Government closed captioning requirements for online presentation. It does not provide extensions to TTML1.
This profile was published by the Timed Text Working Group on 05 February 2013.


With the creation and the use of multiple TTML flavors of captions formats specified by other organizations, e.g.EBU-TT, SMPTE-TT and CFF-TT (see description following in this document) developed by other organizations, W3C to simplify interoperability among these TTML profiles, specifies IMSC.

TTML Text and Image Profiles for Internet Media Subtitles and Captions 1.0 (IMSC1) is a pair of TTML1 profiles developed primarily as a candidate format for the Interoperable Media Format (IMF) effort.
It  specifies a text-only profile and an image-only profile.
These profiles are intended to be used across subtitle and caption delivery applications worldwide, thereby simplifying interoperability, consistent rendering and conversion to other subtitling and captioning formats.
The text profile is a superset of SDP-US and a superset of CFF-TT
IMSC1 defines extensions to TTML1, as well as incorporates extensions specified in :

Both profiles are based on the Common File Format & Media Formats Specification (CFF) developed by Digital Entertainment Content Ecosystem (DECE), and benefit from the technical consensus, conformance testing and implementation experience.

IMSC1 is specified by the Timed Text Working Group and is currently a W3C Candidate Recommendation.


TTML2 is under development by the Timed Text Working Group in the W3C.

TTML2 will make it easier to position and style some content, with stereoscopic 3D support and rich smooth animations.
will also contain a mechanism for providing options that can be manipulated by the display processor or the user to modify the appearance of the content. For example, it will support the common use case of including both captions for deaf and hard of hearing users, and translation subtitles in a single document, and allowing the user to choose whether to show just the translations, or both the translations and the captions.
It will define in more detail the mapping into HTML and CSS fragments for presentation by a user agent.

TTML2 First Public Working Draft was published in February 2015.

3- TTML Profiles from external organizations

The TTML recommendation defines a broad set of features and the XML semantics for how a TTML document will express those features.
External specifications are expected to define profiles, each of which is a set of features, extensions (new features), and requirements to ensure interoperability.

TTML provides an extensibility mechanism to add new features using specific namespaces. The following are TTML restricting or/and extending TTML captions formats, all TTML flavors:


SMPTE-TT (SMPTE ST 2052 ) defines the SMPTE profile of the TTML1. It is used for representing captions or subtitles.

It aims interoperability with pre-existing and regionally-specific formats (such as CEA-708, CEA-608, DVB Subtitles, and WST (World System Teletext)) is provided by means of tunneling data or bit map images and adding necessary metadata.

SMPTE also provides recommended practices for Conversion from CEA 608 data to SMPTE-TT, (SMPTE RP2052-10) and conversion from 708 to SMPTE-TT. (RP 2052-11).

I allows captions to be delivered either via TTML formatted external files or by embedding the TTML data directly into the video stream.

SMPTE-TT  was adopted as "safe harbor interchange and delivery format" by the US federal Communications Commission (FCC) on January 2012.

It is the official caption format of Dynamic Adaptive Streaming over HTTP (DASH-264), an adaptive bitrate streaming technique that enables high quality streaming of media content over the Internet delivered from HTTP web servers.

3.2- EBU-TT

EBU-TT  (EBU Tech 3350 v1.1) is the follow-up to the widely formally used EBU STL format (EBU Tech 3264).
It is defined for the interchange and archiving of subtitles.
Developed by the European Broadcasting Union (Europe’s 75 national broadcasters).

It is a profile which constraints the features provided by TTML1, to make it more suitable for the use with broadcast video and web video applications.

EBU-TT-D (Tech 3380) is the delivery format (lacks tunneling and metadata features).
EBU Tech 3360 is a mapping specifying how EBU-STL documents should be translated to EBU-TT documents.

3.3- CFF-TT

Common File Format Timed Text (CFF-TT) is a pair of TTML profiles from UltraViolet - Digital Entertainment Content Ecosystem (DECE).
Ultraviolet defines standards and provides services for the online distribution of movies and TV shows (80 member companies).

Ultraviolet-enabled content is intended to be enjoyed on a wide range of devices and platforms, including UAs.

3.4- Comparaison of TTML profiles

The following diagram illustrates the relations between TTML flavored profiles.

TTML profiles

Inclusion, intersection between different TTML based formats

3.5- IMF

The Interoperable Master Format (IMF) project specifies an interoperable set of master files and associated meta-data to enable standard interchange and distribution across multiple channels, including broadcast and internet.

IMF diagram
Component-based format for high-quality masters
(SMPTE ST 2067-2)

4- WebVTT

The Web Video Text Tracks Format (WebVTT) is a format for delivery of internet captions.
WebVTT is a single specification (without profiles). Expectation that all implementations implement the entire specification. It is a delivery format for download and particularly streaming applications.
It offers simplicity for authoring and good integration into HMTL5 using the <track> element from HTML5's <video>.
It offers good compatibility with existing user agents.
WebVTT is a text format but unlike TTML, it is not XML based.

 A WebVTT file contains the following:

WebVTT integration with HTML5 is well defined, offering for example a method for CSS to further customize the appearance of cues.

WebVTT is a W3C First Public Working Draft published in November 2014.

WebVTT derives from the format WebSRT (Web Subtitle Resource Tracks) specified by the WHATWG. Work was picked up by the W3C Web Media Text Tracks Community Group. In March 2014 the W3C Timed Text Working Group was chartered to take it to W3C Recommendation.

More reading on WebVTT.

5- Simple code comparison of TTML and WebVTT

With this simple example, the 2 following files should display the same presentation of captions in a user agent.


<tt xml:lang="en"
    <metadata xmlns:ttm="">
     <ttm:title>TTML Example</ttm:title>
     <ttm:copyright>Thierry Michel 2015</ttm:copyright>

    <styling xmlns:tts="">
     <style xml:id="s1" tts:color="red" tts:textAlign="center"  />

        <p xml:id="c1" begin="00:00:00" end="00:00:10">           
            Hello I am your first line.</p>
<p xml:id="c2" begin="00:00:02" end="00:00:10">
            I am your second captions<br/>but with two lines.</p>
<p xml:id="c3" xml:lang="fr" begin="00:00:04" end="00:00:10">
            Je suis le troisième sous-titre.</p>
<p xml:id="c4" begin="00:00:06" end="00:00:10" >
            I am another caption with <span tts:fontWeight="bold">Bold</span> and <span tts:fontStyle="italic">Italic</span> styles.</p>
<p xml:id="c5" begin="00:00:08" end="00:00:10" style="s1">
            I am the last caption displayed in red and centered.</p>


title:  Web-VTT Example
Thierry Michel 2015
NOTE to style this document, CSS color font must be the specified in the HTML hosting page in the
&lt;style&gt; element.

00:00.000 --> 00:10.000
Hello I am your first line.

00:02.000 --> 00:10.000
I am your second captions
but with two lines.

00:04.000 --> 00:10.000
<lang fr>Je suis le troisième sous-titre.</lang>

00:06.000 --> 00:10.000
I am another caption with <b>Bold</b> and <i>Italic</i> styles.

00:08.000 --> 00:10.000 align:middle

I am the last caption displayed in red and centered.</c>

In the HTML 5, CSS style:
<style>  ::cue(.red) {color:red;} </style>