Copyright © 2003 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
This document specifies usage scenarios and requirements for a timed text authoring format. A timed text authoring format is a content type that represents timed text media for the purpose of interchange among authoring systems. Timed text is textual information that is intrinsically or extrinsically associated with timing information.
This is a W3C Working Draft for review by W3C members and other interested parties. It is a draft document and may be updated, replaced, or obsoleted by other documents at any time. Publication as a Working Draft does not imply endorsement by the W3C Membership. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress". A list of current W3C Recommendations and other technical documents, including Working Drafts and Notes, can be found at http://www.w3.org/TR/.
This is the first working draft of the Timed Text Authoring Format 1.0 Requirements. It is expected that this document will progress through a number of working drafts before being published in final form as a W3C Note.
This document was developed by the Timed Text (TT) working group as part of the W3C Synchronized Multimedia Activity. The authors of this document are the TT Working Group members.
Feedback on this document should be sent to the email list public-tt@w3.org, which is the public
mailing list of Timed Text Working Group (list archives). To
subscribe, send an email to public-tt-request@w3.org with the
word subscribe
in the subject line.
The latest information regarding patent disclosures related to this document is available on the Web.
This section represents the status of this document at the time this version was published. It will become outdated if and when a new version is published. The latest status is maintained at the W3C.
1 Introduction
1.1 Motivation
1.2 System Model
2 Definitions
2.1 Acronyms
2.2 Terminology
2.3 Notations
3 Use Case Scenarios
4 Requirements
4.1 General
4.2 Content
4.3 Styling
4.4 Timing
4.5 Animation
4.6 Metadata
A References
B Other References (Non-Normative)
C Acknowledgments (Non-Normative)
This document specifies usage scenarios and requirements for a timed text authoring format. A timed text authoring format is a content type that represents timed text media for the purpose of interchange among authoring systems. Timed text is textual information that is intrinsically or extrinsically associated with timing information.
A principal motivation for the development of a common authoring format for timed text is the lack of a standard content format that supports the representation and interchange of textual information which is synchronized with other media elements or which serves as a synchronization master itself.
Popular proprietary multimedia systems and their corresponding player components have defined distinct timed text formats for each proprietary use. As a consequence there is no common authoring interchange format that serves as a portable interchange format between such systems. A goal of the present work is to define such a portable interchange format to ease the burden of authoring tool developers and users as well as enhance portability of timed text content.
A side effect of the development and deployment of a common timed text authoring format is that it simplifies the creation and distribution of synchronized text for use with a multitude of devices, such as multimedia players, caption, subtitle, and teletext encoders and decoders, character generators, LED displays, and other text display devices.
Editorial note: GA | 2003-04-29 |
Need to discuss system model as pertains to presentation. In particular, need to indicate that TT content may be composed with more or less intended precision regarding presentation semantics. Need to indicate that presentation may occur via visual or aural means and that text may be rendered by text to speech or text to braille devices. |
TT | Timed Text |
TT AS | Timed Text Authoring System |
TT AF | Timed Text Authoring Format |
TT WG | Timed Text Working Group |
Textual information that is intrinsically or extrinsically associated with timing information.
A content type that represents timed text media for the purpose of interchange among authoring systems.
A content authoring system capable if importing and exporting timed text authoring format content.
A caption service provider needs a common content authoring format by means of which a textual expression of audio information may be associated with such audio information in a time synchronized manner.
Note:
In the context of captioning an aggregate audio/video service, both audio and caption information are typically synchronized to the video track as the timebase master.
A subtitle service provider needs a common content authoring format by means of which a textual expression of the original or a translation of the original natural language (speech) audio information may be associated with such audio information in a time synchronized manner.
Note:
In the context of subtitling an aggregate audio/video service, both audio and subtitle information are typically synchronized to the video track as the timebase master.
Note:
The distinction between captioning and subtitling is best expressed as follows: captioning is expressly intended to serve the needs of deaf and hard of hearing users, and typically contains transcriptions of speech and non-speech audio information; in contrast, subtitling is generally intended to serve the needs of hearing users who don't have access to an audio track (e.g., in muting situations) or don't understand the natural language of the speech contained in the audio track. Subtitling is often viewed as a paraphrase or a translation of speech information, as opposed to a transcription of all audio information.
In the absence of captioning information, subtitling information may also be used by hearing impaired users, provided that it is available in the original natural language.
A video description service provider needs a common content authoring format in which a textual description of video information or a textual expression of an audio description of video information may be associated with such video information in a time synchronized manner.
Note:
In the present context, the term video description is intended to capture the notion that visual information present in a video track is being described. An alternative term, audio description, is sometimes used for the same purpose, wherein the form of the description of visual information is itself aural in nature. The focus of this scenario is upon a timed text description of visual information, regardless of whether or not there is an accompanying aural form of that description.
Note:
In the use of aural forms of visual description, it may be the case that the duration of an aural form of a description exceeds the duration of the visual information being described. In a presentation device, this necessitates manual or automatic pausing of the video track in order to fully render the aural form of description. It is likely that similar modes of presentation will be required for timed text representations of video descriptions.
A generic timed text service provider needs a common content authoring format in which textual information can be presented in a time synchronized manner.
Note:
In the context of using a generic timed text service, timed text information serves as the timebase master, with which other possible timed media may be associated.
Note:
Examples of the use of generic time text include (but are not limited to): marquee signs, timed text oriented presentations, scrolling text presentation, etc.
The TT AF specification(s) shall be authored using XML and XSL Stylesheets based on [XML Spec] and shall adhere to best current practices in the W3C for specification style and quality assurance.
The TT AF specification(s) shall be defined in a modular manner that logically separates significant areas of functionality to as great extent as is practical.
The TT AF specification(s) shall be organized in such a manner as to separate the following aspects:
TT Framework
TT Core Vocabulary
TT Core Document Types
TT Extension Vocabulary(ies)
TT Extension Document Type(s)
The TT AF specification(s) shall be defined in such a manner that core functionality is logically separated from peripheral functionality.
The TT AF specification(s) shall be defined in such a manner that core functionality can evolve over time, e.g., by the specification of multiple levels (or versions) of core functionality.
The TT AF specification(s) shall be defined in such a manner that core functionality be specified soley by the TT WG or, in the event that the TT WG is terminated, its successors within the W3C.
Note:
It is assumed that one or more appropriate namespace mechanisms will be used to segregate core functionality defined or adopted in the TT AF from peripheral functionality defined or adopted by clients of the TT AF.
The TT AF specification(s) shall be defined in such a manner that for every item in the TT AF core vocabulary, there shall be at least one TT AF core document type that makes use of that item, i.e., there exists a surjection from the set of TT AF core document types to the set of TT AF vocabulary items referenced by those document types.
Note:
The TT AF specification(s) may define standardized peripheral vocabulary that is not referenced by any TT AF core document type.
The TT AF specification(s) shall be defined in such a manner that peripheral functionality can evolve over time, e.g., by the future specification of one or more peripheral functionality modules.
The TT AF specification(s) shall be defined in such a manner that peripheral functionality need not be specified by the TT WG or the W3C, but may be specified by other W3C WGs as well as non-W3C clients of the TT AF.
The TT AF shall be capable of being transformed, without undue complexity, into one or more legacy timed text content formats, e.g., [3GPP], [QText], [RealText], [SAMI], etc.
Note:
The above list of potential target timed text content formats is strictly informative, and is not intended to be exhuastive.
The TT AF shall be capable of being transformed into an idealized streamable representation format.
Note:
It is not required that an idealized stramable representation format be defined by the TT AF specification(s); however, the definition of such a format may be the subject of future activities by the TT WG.
Note:
It is intended that existing closed captioning and subtitle streaming formats used by analog and digital television services as well as timed text used in the context of streaming audio and motion video formats be potentially targeted by one or more transformations of the TT AF. These formats include [EIA-608B], [EIA-708B], [EN 300 706], [EN 300 743], etc.
The TT AF shall include the following accessibility related features:
Support for a mechanism to explicitly associate Equivalent Alternatives to the textual information in the TT presentation in accordance with [WAI XML AG] Guideline 1.
Support for Content Rendering Adaption in accordance with [WAI SMIL AG], Section 5. See also Conditional Content.
Use of a default text vocabulary that satisfies guideline 2 of [WAI XML AG] regarding structural and semantical stringency.
Ability to extend or replace the default text vocabulary with other XML dialects to represent the textual information of the TT presentation. See also Intrinsic and Extrinsic Text Content.
Support for explicit definition of a Navigational Structure associated with the TT presentation in accordance with [WAI SMIL AG], Section 4.3.
The TT AF specification(s) shall be defined in such a manner as to require a TT AS to adhere to all applicable aspects of [ATAG 1.0].
The TT AF shall be capable of being created and modified using a plain text editor, e.g., emacs, vi, etc.
The TT AF shall be capable of representing content of different natural languages, where the content of distinct languages may be segregated into separate document instances or may be integrated into a single document instance.
The TT AF shall be capable of representing content of at least those specific natural languages that may be represented with [Unicode 3.2].
The TT AF shall be capable of associating natural language binding information with plain text information at the granularity of a single coded character.
The TT AF shall be capable of representing every coded character available in [Unicode 3.2] by using only those characters in [ASCII (ANSI X3.4)].
Note:
This requirement facilitates the entry and editing of characters in a TT AF document instance that would otherwise not be permitted due to lack of an appropriate character input method or lack of support for a non-ASCII character encoding system.
It is assumed that every TT AS will provide a means to enter and edit TT AF document instances represented in the ASCII character set.
The TT AF shall be capable of expressing text content intrinsically within a TT AF document instance, extrinsically by referencing from a TT AF document instance to text content in one or more external resources, or in any combination of these two modes.
The TT AF shall be capable of associating structural markup with intrinsic and extrinsic text content, where such markup may denote either or both semantic (functional) and presentational (formal) properties of the content.
Note:
In this context, presentational properties designate both stylistic and timing related presentation information.
The TT AF shall be capable of expressing conditional content, where each alternative content choice is governed by one or more test expressions such that exactly one or zero content choice is selected when evaluating each choice in a predefined order.
The TT AF shall be capable of expressing authorial intention to flow (layout) text content in an idealized, but unspecified user agent.
Note:
In this context, the concept of flow refers to an implied process by means of which textual information expressed in the character domain is mapped to a positioned glyph codomain.
Note:
It is not required that an idealized user agent or behavior of such a user agent be defined by the TT AF specification(s); however, the definition of such a user agent or user agent behavior may be the subject of future activities by the TT WG.
The TT AF shall be capable of expressing the following vocabulary as pertains to logical flowed text content:
body
division
paragraph
phrase
Note:
One possible mapping for this vocabulary is to
xhtml:body
, xhtml:div
, xhtml:p
,
and xhtml:span
, respectively, as defined by [XHTML 1.0].
The TT AF shall be capable of expressing the following vocabulary as pertains to presentational flowed text content:
block
block container
character
flow
inline
inline container
region
viewport
Note:
The items enumerated above are drawn in part from similarly named items defined by [XSL 1.0], Section 6, Formatting Objects.
Note:
The viewport and region items are intended to be
analogous to the [XSL 1.0] vocabulary
fo:simple-page-master
and fo:region-*
,
respectively.
The TT AF shall be defined in such a manner that a default relationship between logical and presentational flowed text vocabulary may be assumed as follows:
body ⇔ flow
division (display: block) ⇔ block container
paragraph ⇔ block
division (display: inline) ⇔ inline container
phrase ⇔ inline
Note:
Parsed character data (#PCDATA
) that appears in logical
flowed text content should be assumed to map by default to parsed
character data or character in presentational flowed text
content.
The TT AF shall be defined in such a manner that use of logical flowed text vocabulary is separated from use of presentational flowed text vocabulary.
Note:
It is not required that the TT AF specification(s) define a document type the supports the simultaneous use of both logical and presentational flowed text vocabulary.
The TT AF shall be capable of expressing authorial intention to render non-flowed text content in an idealized, but unspecified user agent.
Note:
In this context, the concept of non-flowed text refers to textual information that is explicitly associated with positioned glyph information at authoring time; i.e., all bidirectional processing and character to glyph substitution processing and glyph position assignment has already occurred.
Note:
It is not required that an idealized user agent or behavior of such a user agent be defined by the TT AF specification(s); however, the definition of such a user agent or user agent behavior may be the subject of future activities by the TT WG.
The TT AF shall be capable of expressing the following vocabulary as pertains to non-flowed text content:
area
glyph
glyph sequence
Note:
The glyph and glyph sequence vocabulary items are intended to make direct reference to specific glyphs in specific fonts, having already been transformed from the character domain to the glyph domain. References to glyphs would typically take the form of a glyph identifier or a glyph code. For further information on character to the glyph mapping process, see [CharMod], Section 3.1.3, Units of Visual Rendering.
The TT AF shall be capable of expressing authorial intention to create a hybrid of flowed and non-flowed text content; however, such an expression may require that these two types of content be segregated at a specific level of granularity.
The TT AF shall be capable of expressing simple hyperlinks, where the ending (destination) resource is either the starting (source) resource or an external resource.
The TT AF shall be capable of expressing inline, embedded graphics of both bitmap and vector or outline formats.
Note:
It is not required that the TT AF support the expression of block level graphics.
The TT AF shall be capable of expressing inline, non-embedded graphics of both bitmap and vector or outline formats, where a graphic is represented by an external resource.
The TT AF shall be capable of expressing embedded fonts of both bitmap and outline formats.
The TT AF shall be capable of expressing non-embedded fonts of both bitmap and outline formats, where a font is represented by an external resource.
The TT AF shall be capable of expressing the following vocabulary as pertains to content description:
act
actor
cast item
cast list
kinesic
loudness
pause
pitch
role
role description
rhythm
scene
setting
sound
speaker
speech
stage direction
tempo
tension
utterance
voice
Note:
The items enumerated above are drawn from similarly named items defined by [TEI], Chapter 10, Base Tag Set for Drama, and Chapter 11, Transcription of Speech.
The TT AF shall support the use of both [XML 1.0] and [XML 1.1] as serialized forms of a TT AF XML information set.
Note:
See [XML InfoSet] for further information on an XML information set.
The TT AF shall require or recommend adherence to the practices recommended by [Unicode in XML].
The TT AF shall support the use of [XLink] for the purpose of referencing external resources.
The TT AF specification(s) shall be defined in such a manner that the normative validity of markup content be specificed in terms of [XML Schema Part 1] and [XML Schema Part 2].
The TT AF shall be capable of inline styling, where inline styling means the inclusion of stylistic presentation information in a TT AF document instance.
The TT AF shall be capable of specifying inline styling by means of
(1) distinct attributes, (2) a generic attribute, e.g.,
style
, and (3) one or more inline stylesheets.
The TT AF shall be capable of out-of-line styling, where out-of-line styling means the association of stylistic presentation information with TT AF content via some mechanism external to a TT AF document instance.
The TT AF shall be capable of specifying out-of-line styling by means of one or more external stylesheets.
The TT AF shall be capable of associating priorities with stylistic presentation information in order to permit the resolution of multiple style specifications that apply to the same content.
The TT AF shall be capable of associating the following aural style parameters with timed text content:
azimuth
cue before, during, after
elevation
pause before, after
pitch
pitch range
richness
speaking mode
speech rate
stress
voice family
volume
Note:
For further information on these style parameters, see [XSL 1.0], Section 7.6, Common Aural Properties and [CSS Level 2].
Note:
For further information on the speaking mode aural style
parameter, see discussion of speak
,
speak-numeral
, and speak-punctuation
properties described in [XSL 1.0], Section 7.6.
The TT AF shall be capable of associating the following visual style parameters with timed text content:
absolute position
background color
baseline alignment point
baseline alignment
baseline dominance
baseline shift
bidirectional treatment
block progression dimension
block scroll amount
border before, after, start, end
color
color profile name
display none, block, inline
display alignment
font family
font size
font style
font weight
height
indent start, end
inline progression dimension
inline scroll amount
line feed treatment
line height
line stacking strategy
line wrapping option
opacity
origin
overflow
padding before, after, start, end
reference orientation
relative position
space before, after, start, end
text alignment
text altitude (ascent)
text decoration
text depth (descent)
text indent (first line)
text shadow
visibility
white space collapse
white space treatment
width
writing mode
z-index
Note:
For further information on these style parameters, see [XSL 1.0], Section 7, Formatting Properties and [CSS Level 2].
Note:
A style parameter is intended to convey the notion of any type of style specification or declaration whether it is expressed as an attribute or a property.
Editorial note: GA | 2003-04-28 |
Need to describe origin style parameter in terms that effectively equate with [x,y] coordinate of top, left position of an allocation rectangle, content rectangle, reference area, viewport area, etc. |
Editorial note: GA | 2003-04-28 |
Need to describe block scroll amount and inline scroll amount style parameters in a manner that permits their use for scroll animation. |
The TT AF shall be defined in such a manner that if a stylistic presentation parameter may be specified as a style property, then that parameter shall also be specifiable as a style attribute, and vice-versa.
Note:
In this context, a style attribute refers to an attribute expressed in a markup language (e.g., an XML attribute), while a style property refers to a property expressed in a style language (e.g., a CSS property).
The TT AF shall be defined in such a manner that if there is a conflict when adopting the name or value semantics of a style parameter specification, then the following order shall hold for resolving such a conflict:
XSL FO
SVG
SMIL
CSS Level 2
CSS Level 3
The TT AF shall be defined in such a manner that to the extent that stylistic oriented markup element types are defined or adopted, then such element types shall be defined as shorthand equivalents of non-stylistic oriented element types in combination with specific style parameters.
The TT AF shall be capable of inline timing, where inline timing means the inclusion of temporal presentation markup in a TT AF document instance.
The TT AF shall be capable of out-of-line timing, where out-of-line timing means the association of temporal presentation information with TT AF content via some mechanism external to a TT AF document instance.
The TT AF shall be capable of expressing synchronization parameters in terms of any legal combination of begin, duration, and end parameters that express a single simple interval.
Note:
For further information on these synchronization parameters, see [SMIL 2.0], Section 10, The SMIL 2.0 Timing and Syncrhonization Module.
Note:
It is not required that the TT AF support the specification of multiple simple intervals, i.e., multiple start, duration, or begin values.
The TT AF shall be capable of expressing the following synchronization parameter value space semantics:
Offset Values – a clock offset from an implied or explicit synchronization timebase;
Event Values – a clock offset from a named event associated with an implied or explicit element node, including, at a minimum, named events that indicate the beginning or end of a timed element's active interval;
Access Key Values – a clock offset from a specific key press event;
Media Marker Values – a clock offset from a media marker, including, at a minimum, a media marker that denotes a SMPTE time code;
Wallclock Values – a clock offset from an absolute wallclock time in an implied or explicit time zone.
Note:
It is not required that the TT AF support the specification of negative offset values.
The TT AF shall be capable of expressing sequential, parallel, and exclusive time containment semantics of consituent timed text content.
Note:
For further information on these time containment semantics, see [SMIL 2.0], Section 10, The SMIL 2.0 Timing and Syncrhonization Module.
The TT AF shall be capable of expressing animation according to the following modes:
continuous – linear
continuous – non-linear
discrete
Note:
By animation is meant the ability to alter some parameter or value over time.
The TT AF shall be capable of expressing animated scrolling of content, both in block and inline progression directions, with independent expression of scroll in, scroll out, and scroll repetition.
Editorial note: GA | 2003-04-28 |
Explain that scroll animation may be achieved by animation of block scroll amount and inline scroll amount style parameters as described below in Animated Style Parameters – Visual. |
The TT AF shall be capable of expressing animated highlighting of content, with granularity at the level of individual characters or glyphs.
Editorial note: GA | 2003-04-28 |
Explain that highlight animation may be achieved by animation of background color style parameter as described below in Animated Style Parameters – Visual. |
The TT AF shall be capable of expressing animated fade transitions of content, with granularity at the level of individual regions or areas.
Note:
See Presentational Flowed Text Vocabulary and Non-Flowed Text Vocabulary for information on region and area vocabulary items, respectively.
Editorial note: GA | 2003-04-28 |
Explain that fade transition animation may be achieved by animation of opacity style parameter as described below in Animated Style Parameters – Visual. |
The TT AF shall be capable of animating the following aural style parameters:
azimuth
elevation
speaking mode
speech rate
volume
The TT AF shall be capable of animating the following visual style parameters:
background color
block scroll amount
border color
color
inline scroll amount
opacity
origin
visibility
Note:
It is possible to express fade-in and fade-out transitions by means of animating the opacity style parameter.
The TT AF shall be capable of associating arbitrary metadata, expressed as metadata items, with (1) a TT AF document instance and (2) any element contained within a TT AF document instance.
Note:
It is not required that metadata be able to be associated with an element's attributes or with any other child of an element other than a child that is characterized as an element itself.
The TT AF shall be capable of expressing the following constituents of individual metadata items:
name
value type
value
The TT AF shall be capable of denoting the type of the value of a metadata item by means of at least the following primitive, simple datatypes:
binary
boolean
date
day
duration
month
notation
number
qualified name
resource identifier
string
time
year
In addition, the TT AF shall be capable of denoting the type of the value of a metadata item to be a specific derived or complex value type.
Note:
For further information on simple, derived, and complex datatypes, see [XML Schema Part 2].
The TT AF shall give preference to the representation of metadata item values as element content as opposed to attribute content.
Note:
By element content is meant those children of an element information item that are characterized as elements or as character data. By attribute content is meant the normalized values of the attributes of an element information item.
The TT AF shall be capable of expressing metadata items whose names, value types, and semantics are defined externally to the TT AF specification(s).
The TT AF specification(s) shall be defined in such a manner as to permit and potentially require the ability to validate metadata.
The TT AF shall be capable of expressing the following core metadata items as pertains to a TT AF document instance:
change history
contributor
coverage
creator
date
description
format
identifier
keyword
language
producer
production date
production tool
publisher
relation
source
subject
title
type
Note:
The items enumerated above are drawn in part from similarly named items defined by [DCMES 1.1].
The rights metadata item defined by [DCMES 1.1] has not been included here, pending further consideration of whether and what intellectual property rights management (IPRM) related metadata to explicitly support in the TT AF.
Resolution:
None recorded.
The TT AF should be capable of expressing the following core additional metadata items as pertains to a TT AF document instance:
abstract
alternative
audience
available
bibliographicCitation
conformsTo
created
dateAccepted
dateCopyrighted
dateSubmitted
educationLevel
extent
hasFormat
hasPart
hasVersion
isFormatOf
isPartOf
isReferencedBy
isReplacedBy
isRequiredBy
issued
isVersionOf
mediator
medium
modified
references
replaces
requires
revision
spatial
tableOfContents
temporal
valid
Note:
The items enumerated above are drawn in part from similarly named items defined by [DCMI Terms], Section 3, Other Terms and Element Refinements.
The accessRights metadata item defined by [DCMES 1.1] has not been included here, pending further consideration of whether and what intellectual property rights management (IPRM) related metadata to explicitly support in the TT AF.
Resolution:
None recorded.
The TT AF shall be capable of expressing the following media related metadata items as pertains to a TT AF document instance:
episode
related media
related media parameters
Editorial note: GA | 2003-04-29 |
Need to either give some definition to or reference a document that defines the above media related metadata items. Perhaps MPEG-7 or SMPTE Metadata Dictionaries? |
The TT AF specification(s) shall be defined in such a manner as to give preference to those metadata items defined by [DCMES 1.1] in case that a conflict exists with another candidate metadata representation.
The editor acknowledges the members of the Timed Text Working Group, the members of other W3C Working Groups, and industry experts in other forums who have contributed directly or indirectly to the process or content of creating this document.
The current members of the Timed Text Working Group are:
Glenn Adams, Extensible Formatting Systems, Inc. (chair); Brad Botkin, Invited Expert; Michael Dolan, Invited Expert; Gerry Fields, Invited Expert; Geoff Freed, Invited Expert; Markus Gylling, DAISY Consortium; Markku Hakkinen, Japanese Society for Rehabilitation of Persons with Disabilities; Sean Hayes, Microsoft; Erik Hodge, RealNetworks; Masahiko Kaneko, Microsoft; George Kerscher, DAISY Consortium; Thierry Michel, W3C (team contact); Patrick Schmitz, Invited Expert; David Singer, Apple Computer.
The Timed Text Working Group has benefited in its work from the participation and contributions of a number of people not currently members of the Working Group, including in particular those named below. Affiliations given are those current at the time of their work with the WG.
Bert Bos, W3C (chair, CSS WG); Martin Dürst, W3C (leader, I18N Activity); Al Gilman (chair, WAI Protocol and Formats WG); Philipp Hoschka, W3C (leader, Interaction Domain); Chris Lilley, W3C (chair, SVG WG).