TTML and derivative Captions
Formats.
In recent years with the convergence of traditional broadcast television and
the Web, more video content from television is coming to the web and the web
is now running on televisions. Timed
Text Markup Language (TTML) helps to bridge the TV and web worlds.
Typical applications of timed text are the real time subtitling of
foreign-language movies on the Web, captioning for people lacking audio
devices or having hearing impairments, karaoke, scrolling news items or
teleprompter applications.
TTML allows extensibility and profiling, therefore multiple captions format
have been developed by the W3C or other external organizations.
This document explains the relations of TTML with its derivatives caption
formats.
1-Introduction
The first captions appeared in analog television broadcasts, giving the
hard of hearing access to the audio content of television. Since then,
captions have become ubiquitous across digital television and are a
growing presence in online video. Multiple languages and regulatory
policies across national boundaries have led to the development of
multiple caption formats with the evolution of television from analog to
digital and the current era of online video content.
There are basically 3 categories of caption formats:
1.1 Captions format for Analog television
- CEA-608 is the primary format
for in-band captions in analog NTSC transmissions in the USA. It
continues to be used in both digital and online video through "608 over
708". It is character-based and tailored for delivery/authoring.
- Teletext is mainly used in
Europe. It is character-based (can also be image-based). Teletext is the
dominant legacy format for in-band captions in analog PAL transmissions.
It is still used in the digital television and online video via DVB
Teletext.
- EBU-STL is a legacy,
out-of-band caption file format, used primarily in Europe for authoring
and interchange.
1.2 Captions format for Digital television
- CEA-708 is the primary format
for in-band captions in digital ATSC transmissions (digital terrestrial
transmission), in the USA.
- The "608 over 708" portion of the format is one of the online
caption delivery formats in use today.
- Native 708 is character-based. Tailored for delivery. In ATSC and
online video, 708 captions are carried within the MPEG-2 TS stream.
- DVB Teletext: (ETSI 301 775)
mainly used in Europe. DVB Teletext embeds Teletext in a DVB MPEG-2 TS
stream.
- DVB Subtitles: mainly used in
Europe. Tailored for delivery. DVB Subtitles allows either image-based
or character-based captions to be embedded within a DVB MPEG-2 TS stream
1.3 Online Captions formats
With the new era of video on line, new captioning formats have arise:
- TTML is an Authoring,
Interchange and Distribution format. It is an advanced and complex
caption format, used mainly by the broadcast industry and online
applications.
- WebVTT is a distribution
format. It is a simple format used in Web Browsers (IE, Firefox, Chrome
and Safari) and Native Players (iOS and Android). It integrates well
with HTML5.
- 608 is the U.S. delivery
alternative implemented in Native Players (iOS and Android)
2- W3C initiatives for online Video captions
2.1-TTML
Timed Text Markup Language (TTML1) is textual
information that may be used directly as a distribution format for online
captioning and subtitles, and as an interchange format among legacy
distribution content formats. It is a superset that encompasses preceding
captioning approaches. It supports the semantics of most of the closed
caption files.It is used in the television industry for the purpose of
authoring, transcoding and exchanging timed text information and for
delivering captions, subtitles.
It is a text format,
based on multiple W3C technologies such as XML, SMIL
for the timing semantics, XSL-FO
and CSS for presentation and
layout of text.

TTML exposes multiple types of elements and attributes:
- Content elements are HTML-like elements (such as <div>,
<p>, <span>) that contain the caption text.
- Timing attributes specify the time interval during which content
should be visible. Timing attributes may also be applied to layout and
animation elements.
- Style elements specify the appearance of content via a simple
XML-based styling system.
- Layout elements specify the layout properties (such as the bounding
boxes) of content.
- Animation elements can be used to alter the style of text at
particular times.
- Metadata elements specify additional metadata about the presentation.
Formerly known as Distribution Format eXchange Profile (DXFP).
TTML1 was specified by the W3C Timed Text Working Group and released as a
W3C Recommendation in November 2010.
A second
edition of TTML1 Recommendation was published in September 2013
addressing errata and comments received since the Recommendation was first
published.
There is partial and full support of TTML in components used by several
Web browser agents, and in a number of caption authoring tools.
2.2- TTML profiles and extensions specified by the W3C
TTML allows for the definition of profiles, and extension mechanism. The
TTML1 specification defines a 3
profiles:
- the Full
Profile includes all features of TTML.
- the Presentation
Profile is a subset, intended to be used to express minimum
compliance for presentation processing.
- the Transformation
Profile is another subset, intended to be used to express minimum
compliance for transformation processing.
SDP-US
TTML Simple
Delivery Profile for Closed Captions (SDP-US) profile is focused on
streamlined delivery of closed captions on the Internet.
This interoperability profile supports core TTML1 features to deliver legacy
formats such as CEA-608
and CEA-708
content, and as such, is targeted primarily for use in US markets.
SDP-US is a proper subset of TTML1 intended to support features
required for US Government closed captioning requirements for online
presentation. It does not provide extensions to TTML1.
This profile was published by the Timed Text Working Group on 05 February
2013.
IMSC1
With the creation and the use of multiple TTML flavors of captions
formats specified by other organizations, e.g.EBU-TT, SMPTE-TT and CFF-TT
(see description following in this document)
developed by other organizations, W3C to simplify interoperability among
these TTML profiles, specifies IMSC.
TTML Text and Image Profiles for
Internet Media Subtitles and Captions 1.0 (IMSC1) is a pair of TTML1
profiles developed primarily as a candidate format for the Interoperable
Media Format (IMF) effort.
It specifies a
text-only profile and an image-only
profile.
These profiles are intended to be used across subtitle and caption delivery
applications worldwide, thereby simplifying interoperability, consistent
rendering and conversion to other subtitling and captioning formats.
The text profile is a superset of SDP-US
and a superset of CFF-TT
IMSC1 defines extensions to TTML1, as well as incorporates extensions
specified in :
Both profiles are based on the Common
File Format & Media Formats Specification (CFF) developed by Digital Entertainment Content
Ecosystem (DECE), and benefit from the technical consensus,
conformance testing and implementation experience.
IMSC1 is specified by the Timed Text Working Group and is currently a W3C
Candidate Recommendation.
TTML2
TTML2 is under development
by the Timed Text Working Group in the W3C.
TTML2 will make it easier to
position and style some content, with stereoscopic 3D support and rich
smooth animations.
It will also contain
a mechanism for providing options that can be manipulated by the display
processor or the user to modify the appearance of the content. For
example, it will support the common use case of including both captions
for deaf and hard of hearing users, and translation subtitles in a
single document, and allowing the user to choose whether to show just
the translations, or both the translations and the captions.
It will define in more detail the mapping into HTML and
CSS fragments for presentation by a user agent.
TTML2 First
Public Working Draft was published in February 2015.
3- TTML Profiles from external organizations
The TTML recommendation defines a broad set of features and the XML
semantics for how a TTML document will express those features.
External specifications are expected to define profiles, each of which is
a set of features, extensions (new features), and requirements to ensure
interoperability.
TTML provides an extensibility mechanism to add new features using
specific namespaces. The following are TTML restricting or/and extending
TTML captions formats, all TTML flavors:
3.1- SMTPE-TT
SMPTE-TT
(SMPTE ST 2052 ) defines the SMPTE profile of the TTML1. It is used for representing captions or
subtitles.
- It adopted the complete set of features from TTML (the Full
Profile which includes all features of TTML).
- SMPTE-TT also defines some extended standard metadata terms to be
used, and adds some extension features not found in TTML.
It aims interoperability with pre-existing and regionally-specific
formats (such as CEA-708, CEA-608, DVB Subtitles, and WST (World System
Teletext)) is provided by means of tunneling data or bit map images and
adding necessary metadata.
- Tunneling: The original analog/digital captions may be embedded as
binary blobs (e.g. exact original CEA 608 data byte pairs), for
“backwards compatibility”.
- Images: Background images in order to aid translation from imagebased
formats (DVB Subtitles).
- Additional Metadata: Information about how a document was translated,
origin format (as a URI) and the translation's fidelity level.
SMPTE also provides recommended practices for Conversion from CEA 608 data
to SMPTE-TT, (SMPTE
RP2052-10) and conversion from 708 to SMPTE-TT. (RP
2052-11).
I allows captions to be delivered either via TTML formatted external
files or by embedding the TTML data directly into the video stream.
SMPTE-TT was adopted as "safe harbor interchange and delivery
format" by the US federal Communications Commission (FCC) on January 2012.
It is the official caption format of Dynamic Adaptive
Streaming over HTTP (DASH-264),
an adaptive bitrate streaming technique that enables high quality
streaming of media content over the Internet delivered from HTTP web
servers.
3.2- EBU-TT
EBU-TT (EBU Tech 3350
v1.1) is the follow-up to the widely formally used EBU STL format (EBU
Tech
3264).
It is defined for the interchange and archiving of subtitles.
Developed by the European Broadcasting Union (Europe’s 75 national
broadcasters).
It is a profile which constraints the features provided by TTML1,
to make it more suitable for the use with broadcast video and web video
applications.
- It restricts the set of feature as a subset of TTML. (e.g. animation
and some style/timing/layout options removed).
- It adds more metadata for archival purposes (e.g. program, episode and
other information) embedded in the document header.
EBU-TT-D (Tech 3380)
is the delivery format (lacks tunneling and metadata features).
EBU Tech 3360 is
a mapping specifying how EBU-STL documents should be translated to EBU-TT
documents.
3.3- CFF-TT
Common File Format Timed Text (CFF-TT)
is a pair of TTML profiles from UltraViolet
- Digital Entertainment Content Ecosystem (DECE).
Ultraviolet defines standards and provides services for the online
distribution of movies and TV shows (80 member companies).
- CFF-TT is a constrained profile of SMPTE-TT for captions and
subtitles.
- Separates Image and Text Profiles: CFF-TT introduces two profiles that
represent either character-based captions or image-based captions. Both
may not be used simultaneously.
- It extends SMPTE-TT with specific features
Ultraviolet-enabled content is intended to be enjoyed on a wide range of
devices and platforms, including UAs.
3.4- Comparaison of TTML profiles
The following diagram illustrates the relations between TTML flavored
profiles.
Inclusion, intersection between different TTML based formats
3.5- IMF
The Interoperable Master Format (IMF) project specifies an interoperable
set of master files and associated meta-data to enable standard
interchange and distribution across multiple channels, including broadcast
and internet.
- It has a single and interchangeable masterfile format, and
interoperable through existing constrained standards.
- It minimizes storage, by using mezzanine level compression, and by
storing only the differences between an original version and a version
file.
- It bases its data essence format on SMPTE
ST 2052-1
Component-based
format for high-quality masters
(SMPTE ST 2067-2)
4- WebVTT
The Web Video Text Tracks Format (WebVTT) is a format for delivery of
internet captions.
WebVTT is a single specification (without profiles). Expectation that all
implementations implement the entire specification. It is a delivery format
for download and particularly streaming applications.
It offers simplicity for authoring and good integration into HMTL5 using the
<track>
element from HTML5's <video>
.
It offers good compatibility with existing user agents.
WebVTT is a text format but unlike TTML, it is not XML based.
A WebVTT file contains the following:
- Header: A WebVTT file begins with a header that includes
document-scoped metadata.
- Cues: A WebVTT file consists primarily of a sequence of cues. A cue is
one or more lines of text with an associated time interval. Styles may
be applied to the cue text using a simple markup syntax. Portions of the
cue text may also be designated to appear at particular times in order
to achieve paint-on style animations.
- Regions: A cue may also be associated with a region, which specifies
the bounding box of the text when rendered on screen.
WebVTT integration with HTML5 is well defined, offering for example a method
for CSS to further customize the appearance of cues.
WebVTT is a
W3C First Public Working Draft published in November 2014.
WebVTT derives from the format WebSRT (Web Subtitle Resource Tracks)
specified by the WHATWG. Work
was picked up by the W3C Web
Media Text Tracks Community Group. In March 2014 the W3C
Timed Text Working Group was chartered
to take it to W3C Recommendation.
More
reading on WebVTT.
5- Simple code comparison of TTML and WebVTT
With this simple example, the 2 following files should display the same
presentation of captions in a user agent.
TTML |
WebVTT |
<tt xml:lang="en"
xmlns="http://www.w3.org/ns/ttml"
xmlns:tts="http://www.w3.org/ns/ttml#styling">
<head>
<metadata
xmlns:ttm="http://www.w3.org/ns/ttml#metadata">
<ttm:title>TTML
Example</ttm:title>
<ttm:copyright>Thierry Michel
2015</ttm:copyright>
</metadata>
<styling
xmlns:tts="http://www.w3.org/ns/ttml#styling">
<style xml:id="s1" tts:color="red"
tts:textAlign="center" />
</styling>
</head>
<body>
<div>
<p xml:id="c1"
begin="00:00:00" end="00:00:10">
Hello I
am your first line.</p>
<p
xml:id="c2" begin="00:00:02" end="00:00:10">
I am
your second captions<br/>but with two lines.</p>
<p
xml:id="c3" xml:lang="fr" begin="00:00:04"
end="00:00:10">
Je suis
le troisième sous-titre.</p>
<p
xml:id="c4" begin="00:00:06" end="00:00:10" >
I am
another caption with <span
tts:fontWeight="bold">Bold</span> and <span
tts:fontStyle="italic">Italic</span> styles.</p>
<p
xml:id="c5" begin="00:00:08" end="00:00:10"
style="s1">
I am
the last caption displayed in red and centered.</p>
</div>
</body>
</tt>
|
WEBVTT
title: Web-VTT Example
copyright: Thierry Michel 2015
NOTE to style this document, CSS color font must be the specified
in the HTML hosting page in the
< style
>
element.
c1
00:00.000 --> 00:10.000
Hello I am your first line.
c2
00:02.000 --> 00:10.000
I am your second captions
but with two lines.
c3
00:04.000 --> 00:10.000
<lang fr>Je suis le troisième sous-titre.</lang>
c4
00:06.000 --> 00:10.000
I am another caption with <b>Bold</b> and
<i>Italic</i> styles.
c5
00:08.000 --> 00:10.000 align:middle
<c.red>
I am the last caption displayed in
red and centered. </c>
--------------------------------------
In the HTML 5, CSS style:
<style> ::cue(.red) {color:red;} </style>
|
References: