Dubbing and Audio description Profiles of TTML2

W3C Working Draft

More details about this document
This version:
https://www.w3.org/TR/2024/WD-dapt-20240412/
Latest published version:
https://www.w3.org/TR/dapt/
Latest editor's draft:
https://w3c.github.io/dapt/
History:
https://www.w3.org/standards/history/dapt/
Commit history
Editors:
(Netflix)
(British Broadcasting Corporation)
Feedback:
GitHub w3c/dapt (pull requests, new issue, open issues)
public-tt@w3.org with subject line [dapt] … message topic … (archives)

Abstract

This specification defines DAPT, a TTML-based file format for the exchange of timed text content in dubbing and audio description workflows.

Status of This Document

This section describes the status of this document at the time of its publication. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.

This document incorporates a registry section and defines registry tables, as defined in the [w3c-process] requirements for w3c registries. Updates to the document that only change registry tables can be made without meeting other requirements for Recommendation track updates, as set out in Updating Registry Tables; requirements for updating those registry tables are normatively specified within G. Registry Section.

The Working Group has identified the following at risk features:

Issue 218: At-risk: support for `src` attribute in `<audio>` for external resource PR-must-have

Possible resolution to #113.

Issue 219: At-risk: support for `<source>` element child of `<audio>` for external resource PR-must-have

Possible resolution to #113.

Issue 220: At-risk: support for `src` attribute of `<audio>` element pointing to embedded resource PR-must-have

Possible resolution to #114 and #115.

The link to #115 is that this implies the existence of some referenceable embedded audio resource too, which one of the options described in #115.

Issue 221: At-risk: support for `<source>` child of `<audio>` element pointing to embedded resource PR-must-have

Possible resolution to #114 and #115.

The link to #115 is that this implies the existence of some referenceable embedded audio resource too, which one of the options described in #115.

Issue 222: At-risk: support for inline audio resources PR-must-have

Possible resolution to #115.

Issue 223: At-risk: each of the potential values of `encoding` in `<data>` PR-must-have

Possible resolution to #117.

Issue 224: At-risk: support for the `length` attribute on `<data>` PR-must-have

Possible resolution to #117.

At risk features may be be removed before advancement to Proposed Recommendation.

This document was published by the Timed Text Working Group as a Working Draft using the Recommendation track.

Publication as a Working Draft does not imply endorsement by W3C and its Members.

This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress. Future updates to this specification may incorporate new features.

This document was produced by a group operating under the W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 03 November 2023 W3C Process Document.

1. Scope

This specification defines a text-based profile of the Timed Text Markup Language version 2.0 [TTML2] intended to support dubbing and audio description workflows worldwide, to meet the requirements defined in [DAPT-REQS], and to permit usage of visual presentation features within [TTML2] and its profiles, for example those in [TTML-IMSC1.2].

2. Introduction

This section is non-normative.

2.1 Transcripts and Scripts

In general usage, one meaning of the word script is the written text of a film, television programme, play etc. A script can be either a record of the completed production, also known as a transcript, or as a plan for a yet to be created production. In this document, we use domain-specific terms, and define more specifically that:

The term DAPT script is used generically to refer to both transcripts and scripts, and is a point of conformance to the formal requirements of this specification. DAPT Scripts consist of timed text and associated metadata, such as the character speaking.

In dubbing workflows, a transcript is generated and translated to create a script. In audio description workflows, a transcript describes the video image, and is then used directly as a script for recording an audio equivalent.

DAPT is a TTML-based format for the exchange of transcripts and scripts (i.e. DAPT Scripts) among authoring, prompting and playback tools in the localization and audio description pipelines. A DAPT document is a serializable form of a DAPT Script designed to carry pertinent information for dubbing or audio description such as type of DAPT script, dialogue, descriptions, timing, metadata, original language transcribed text, translated text, language information, and audio mixing instructions, and to be extensible to allow user-defined annotations or additional future features.

This specification defines the data model for DAPT scripts and its representation as a [TTML2] document (see 4. DAPT Data Model and corresponding TTML syntax) with some constraints and restrictions (see 5. Constraints).

A DAPT script is expected to be used to make audio visual media accessible or localized for users who cannot understand it in its original form, and to be used as part of the solution for meeting user needs involving transcripts, including accessibility needs described in [media-accessibility-reqs], as well as supporting users who need dialogue translated into a different language via dubbing.

The authoring workflow for both dubbing and audio description involves similar stages, that share common requirements as described in [DAPT-REQS]. In both cases, the author reviews the content and writes down what is happening, either in the dialogue or in the video image, alongside the time when it happens. Further transformation processes can change the text to a different language and adjust the wording to fit precise timing constraints. Then there is a stage in which an audio rendering of the script is generated, for eventual mixing into the programme audio. That mixing can occur prior to distribution, or in the player directly.

2.1.1 Dubbing scripts

The dubbing process which consists in creating a dubbing script is a complex, multi-step process involving:

  • Transcribing and timing the dialogue in its own language from a completed programme to create a transcript;
  • Notating dialogue with character information and other annotations;
  • Generating localization notes to guide further adaptation;
  • Translating the dialogue to a target language script;
  • Adapting the translation to the dubbing; for example matching the actor’s lip movements in the case of dubs.

A dubbing script is a transcript or script (depending on workflow stage) used for recording translated dialogue to be mixed with the non-dialogue programme audio, to generate a localized version of the programme in a different language, known as a dubbed version, or dub for short.

Dubbing scripts can be useful as a starting point for creation of subtitles or closed captions in alternate languages. This specification is designed to facilitate the addition of, and conversion to, subtitle and caption documents in other profiles of TTML, such as [TTML-IMSC1.2], for example by permitting subtitle styling syntax to be carried in DAPT documents. Alternatively, styling can be applied to assist voice artists when recording scripted dialogue.

2.1.2 Audio Description scripts

Creating audio description content is also a multi-stage process. An audio description, also known as video description or in [media-accessibility-reqs] as described video, is an audio service to assist viewers who can not fully see a visual presentation to understand the content. It is the result of mixing the main programme audio with the audio rendition of each description, authored to be timed when it does not clash with dialogue, to deliver an audio description mixed audio track. Main programme audio refers to the audio associated with the programme prior to any further mixing. A description is a set of words that describes an aspect of the programme presentation, suitable for rendering into audio by means of vocalisation and recording or used as a text alternative source for text to speech translation, as defined in [WCAG22]. More information about what audio description is and how it works can be found at [BBC-WHP051].

Writing the audio description script typically involves:

  • watching the video content of the programme, or series of programmes,
  • identifying the key moments during which there is an opportunity to speak descriptions,
  • writing the description text to explain the important visible parts of the programme at that time,
  • creating an audio version of the descriptions, either by recording a human actor or using text to speech,
  • defining mixing instructions (applied using [TTML2] audio styling) for combining the audio with the programme audio.

The audio mixing can occur prior to distribution of the media, or in the client. If the audio description script is delivered to the player, the text can be used to provide an alternative rendering, for example on a Braille display, or using the user's configured screen reader.

2.1.3 Other uses

DAPT Scripts can be useful in other workflows and scenarios. For example, Original language transcripts could be used as:

  • the output format of a speech to text system, even if not intended for translation, or for the production of subtitles or captions;
  • a document known in the broadcasting industry as a "post production script", used primarily for preview, editorial review and sales purposes;

Both Original language transcripts and Translated transcripts could be used as:

  • an accessible transcript presented alongside audio or video in a web page or application; in this usage, the timings could be retained and used for synchronisation with, or navigation within, the media or discarded to present a plain text version of the entire timeline.

2.2 Example documents

2.2.1 Basic document structure

The top level structure of a document is as follows:

  • The <tt> root element in the namespace http://www.w3.org/ns/ttml indicates that this is a TTML document and the ttp:contentProfiles attribute indicates that it adheres to the DAPT content profile defined in this specification.
  • The daptm:represents attribute indicates what the contents of the document are an alternative for, within the original programme.
  • The daptm:scriptType attribute indicates the type of transcript or script but in this empty example, it is not relevant, since only the structure of the document is shown.
  • The daptm:langSrc attribute indicates the default text language source, for example the original language of the content, while the xml:lang attribute indicates the default language in this script, which in this case is the same. Both of these attributes are inherited and can be overridden within the content of the document.

The structure is applicable to all types of DAPT scripts, dubbing or audio description.

<tt xmlns="http://www.w3.org/ns/ttml" 
    xmlns:ttp="http://www.w3.org/ns/ttml#parameter"
    xmlns:daptm="http://www.w3.org/ns/ttml/profile/dapt#metadata"
    ttp:contentProfiles="http://www.w3.org/ns/ttml/profile/dapt1.0/content"
    xml:lang="en"
    daptm:langSrc="en"
    daptm:represents="dialogue"
    daptm:scriptType="originalTranscript">
  <head>
    <metadata>
      <!-- Additional metadata may be placed here -->
      <!-- Any characters must be defined here as a set of ttm:agent elements -->
    </metadata>
    <styling>
      <!-- Styling is optional and consists of a set of style elements -->
    </styling>
    <layout>
      <!-- Layout is optional and consists of a set of region elements -->
    </layout>
  </head>
  <body>
    <!-- Content goes here -->
  </body>
</tt>

The following examples correspond to the timed text transcripts and scripts produced at each stage of the workflow described in [DAPT-REQS].

The first example shows an early stage transcript in which timed opportunities for descriptions or transcriptions have been identified but no text has been written:

...
  <body>
    <div xml:id="d1" begin="10s" end="13s">
    </div>
    <div xml:id="d2" begin="18s" end="20s">
    </div>
  </body>
...

The following examples will demonstrate different uses in dubbing and audio description workflows.

2.2.2 Audio Description Examples

When descriptions are added this becomes a Pre-Recording Script. Note that in this case, to reflect that most of the audio description content transcribes the video image where there is no inherent language, the Text Language Source, represented by the daptm:langSrc attribute, is set to the empty string at the top level of the document. It would be semantically equivalent to omit the attribute altogether, since the default value is the empty string:

<tt xmlns="http://www.w3.org/ns/ttml"
  xmlns:ttp="http://www.w3.org/ns/ttml#parameter"
  xmlns:daptm="http://www.w3.org/ns/ttml/profile/dapt#metadata"
  xmlns:xml="http://www.w3.org/XML/1998/namespace"
  ttp:contentProfiles="http://www.w3.org/ns/ttml/profile/dapt1.0/content"
  xml:lang="en"
  daptm:langSrc=""
  daptm:represents="visualNonText, visualText"
  daptm:scriptType="preRecording">
  <body>
    <div begin="10s" end="13s">
      <p>
        A woman climbs into a small sailing boat.
      </p>
    </div>
    <div begin="18s" end="20s">
      <p>
        The woman pulls the tiller and the boat turns.
      </p>
    </div>
  </body>
</tt>

After creating audio recordings, if not using text to speech, instructions for playback mixing can be inserted. For example, The gain of "received" audio can be changed before mixing in the audio played from inside the <span> element, smoothly animating the value on the way in and returning it on the way out:

<tt ...
  daptm:represents="visualNonText, visualText"
  daptm:scriptType="asRecorded"
  xml:lang="en"
  daptm:langSrc="">
  ...
    <div begin="25s" end="28s">
      <p>
        <animate begin="0.0s" end="0.3s" tta:gain="1;0.39" fill="freeze"/>
        <animate begin="2.7s" end="3s" tta:gain="0.39;1"/>
        <span begin="0.3s" end="2.7s">
          <audio src="clip3.wav"/>
          The sails billow in the wind.</span>
      </p>
    </div>
...

In the above example, the <div> element's begin attribute defines the time that is the "syncbase" for its child, so the times on the <animate> and <span> elements are relative to 25s here. The first <animate> element drops the gain from 1 to 0.39 over 0.3s, freezing that value after it ends, and the second one raises it back in the final 0.3s of this description. Then the <span> element is timed to begin only after the first audio dip has finished.

If the audio recording is long and just a snippet needs to be played, that can be done using clipBegin and clipEnd. If we just want to play the part of the audio from file from 5s to 8s it would look like:

...
  <audio src="long_audio.wav" clipBegin="5s" clipEnd="8s"/>
  A woman climbs into a small sailing boat.</span>
...

Or audio attributes can be added to trigger the text to be spoken:

...
    <div begin="18s" end="20s">
      <p>
        <span tta:speak="normal">
          The woman pulls the tiller and the boat turns.</span>
      </p>
    </div>
...

It is also possible to embed the audio directly, so that a single document contains the script and recorded audio together:

...
    <div begin="25s" end="28s">
      <p>
        <animate begin="0.0s" end="0.3s" tta:gain="1;0.39" fill="freeze"/>
        <animate begin="2.7s" end="3s" tta:gain="0.39;1"/>
        <span begin="0.3s" end="2.7s">
          <audio><source><data type="audio/wave">
            [base64-encoded audio data]
          </data></source></audio>
          The sails billow in the wind.</span>
      </p>
    </div>
...

2.2.3 Dubbing Examples

From the basic structure of Example 1, transcribing the audio produces an original language dubbing transcript, which can look as follows. No specific style or layout is defined, and here the focus is on the transcription of the dialogue. Characters are identified within the <metadata> element. Note that the language and the text language source are defined using xml:lang and daptm:langSrc attributes respectively, which have the same value because the transcript is not translated.

<tt xmlns="http://www.w3.org/ns/ttml" 
    xmlns:ttm="http://www.w3.org/ns/ttml#metadata"
    xmlns:ttp="http://www.w3.org/ns/ttml#parameter"
    xmlns:daptm="http://www.w3.org/ns/ttml/profile/dapt#metadata"
    ttp:contentProfiles="http://www.w3.org/ns/ttml/profile/dapt1.0/content"
    xml:lang="fr"
    daptm:langSrc="fr"
    daptm:represents="dialogue"
    daptm:scriptType="originalTranscript">
  <head>
    <metadata>
      <ttm:agent type="character" xml:id="character_1">
        <ttm:name type="alias">ASSANE</ttm:name>
      </ttm:agent>
    </metadata>
  </head>
  <body>
    <div begin="10s" end="13s">
      <p ttm:agent="character_1">
        <span>Et c'est grâce à ça qu'on va devenir riches.</span>
      </p>
    </div>
  </body>
</tt>

After translating the text, the document is modified. It includes translation text, and in this case the original text is preserved. The main document's default language is changed to indicate that the focus is on the translated language. The combination of the xml:lang and daptm:langSrc attributes are used to mark the text as being original or translated. In this case, they are present on both the <tt> and <p> elements to make the example easier to read, but it would also be possible to omit them in some cases, making use of the inheritance model:

<tt xmlns="http://www.w3.org/ns/ttml"
    xmlns:ttm="http://www.w3.org/ns/ttml#metadata"
    xmlns:ttp="http://www.w3.org/ns/ttml#parameter"
    xmlns:daptm="http://www.w3.org/ns/ttml/profile/dapt#metadata"
    ttp:contentProfiles="http://www.w3.org/ns/ttml/profile/dapt1.0/content"
    xml:lang="en"
    daptm:langSrc="fr"
    daptm:represents="dialogue"
    daptm:scriptType="translatedTranscript">
  <head>
    <metadata>
      <ttm:agent type="character" xml:id="character_1">
        <ttm:name type="alias">ASSANE</ttm:name>
      </ttm:agent>
    </metadata>
  </head>
  <body>
    <div begin="10s" end="13s" ttm:agent="character_1">
      <p xml:lang="fr" daptm:langSrc="fr"> <!-- original -->
        <span>Et c'est grâce à ça qu'on va devenir riches.</span>
      </p>
      <p xml:lang="en" daptm:langSrc="fr"> <!-- translated -->
        <span>And thanks to that, we're gonna get rich.</span>
      </p>
    </div>
  </body>
</tt>

The process of adaptation, before recording, could adjust the wording and/or add further timing to assist in the recording. The daptm:scriptType attribute is also modified, as in the following example:

<tt xmlns="http://www.w3.org/ns/ttml"
    xmlns:ttm="http://www.w3.org/ns/ttml#metadata"
    xmlns:ttp="http://www.w3.org/ns/ttml#parameter"
    xmlns:daptm="http://www.w3.org/ns/ttml/profile/dapt#metadata"
    ttp:contentProfiles="http://www.w3.org/ns/ttml/profile/dapt1.0/content"
    xml:lang="en"
    daptm:langSrc="fr"
    daptm:represents="dialogue"
    daptm:scriptType="preRecording">
  <head>
    <metadata>
      <ttm:agent type="character" xml:id="character_1">
        <ttm:name type="alias">ASSANE</ttm:name>
      </ttm:agent>
    </metadata>
  </head>
  <body>
    <div begin="10s" end="13s" ttm:agent="character_1" daptm:onScreen="ON_OFF">
      <p xml:lang="fr" daptm:langSrc="fr">
        <span>Et c'est grâce à ça qu'on va devenir riches.</span>
      </p>
      <p xml:lang="en" daptm:langSrc="fr">
        <span begin="0s">And thanks to that,</span><span begin="1.5s"> we're gonna get rich.</span>
      </p>
    </div>
  </body>
</tt>

3. Documentation Conventions

This document uses the following conventions:

4. DAPT Data Model and corresponding TTML syntax

This section specifies the data model for DAPT and its corresponding TTML syntax. In the model, there are objects which can have properties and be associated with other objects. In the TTML syntax, these objects and properties are expressed as elements and attributes, though it is not always the case that objects are expressed as elements and properties as attributes.

Figure 1 illustrates the DAPT data model, hyperlinking every object and property to its corresponding section in this document. Shared properties are shown in italics. All other conventions in the diagram are as per [uml].

DAPT Script Represents Script Type Default Language (optional) Text Language Source Character Character Identifier Name (optional) Talent Name Script Event Script Event Identifier (optional) Begin (optional) End (optional) Duration (optional) Script Event Type (optional) On Screen Script Event Description Description (optional) Description Type (optional) Language Text Text content (optional) Text Language Source (optional) Language Audio Synthesized Audio Rate (optional) Pitch Audio Recording Source [ ] Type [ ] (optional) Begin (optional) End (optional) Duration (optional) In Time (optional) Out Time Mixing Instruction (optional) Gain (optional) Pan (optional) Begin (optional) End (optional) Duration (optional) Fill contains   0..* contains 0..* contains   0..* contains   0..* contains   0..* 0..* 0..* contains   0..* contains   0..* contains 0..* is   is  
Figure 1 Class diagram showing main entities in the DAPT data model.
Issue 116: Add non-inlined embedded audio resources to the Data Model? questionPR-must-have

See also #115 - if we are going to support non-inline embedded audio resources, should we make an object for them and add it into the Data Model?

4.1 DAPT Script

A DAPT Script is a transcript or script that corresponds to a document processed within an authoring workflow or processed by a client, and conforms to the constraints of this specification. It has properties and objects defined in the following sections: Represents, Script Type, Default Language, Text Language Source, Script Events and, for Dubbing Scripts, Characters.

A DAPT Document is a TTML Document Instance representing a DAPT Script. A DAPT Document has the structure and constraints defined in the following sections.

4.1.1 Represents

The Represents property is a mandatory property of a DAPT Script which indicates which components of the related media object the contents of the document represent. The contents of the document could be used as part of a mechanism to provide an accessible alternative for those components.

To represent this property, the daptm:represents attribute MUST be present on the <tt> element:

daptm:represents
: <content-descriptor> ( <lwsp>? "," <lwsp>? <content-descriptor>)*

<content-descriptor>  # see registry table below

<lwsp>                # as TTML2

The permitted values for <content-descriptor> are listed in the following registry table:

Registry table for the <content-descriptor> component whose Registry Definition is at G.2.3 <content-descriptor> registry table definition
<content-descriptor> Status Description Used in Notes
dialogue Provisional Verbal communication, in audio Dubbing, translation and hard of hearing subtitles and captions, pre- and post- production scripts For example, a spoken conversation.
nonDialogueSounds Provisional Sounds that are not verbal communication, in audio Translation and hard of hearing subtitles and captions, pre- and post- production scripts For example, significant sounds, such as a door being slammed in anger.
visualNonText Provisional Parts of the visual image that are not textual Audio Description For example, a significant object in the scene
visualText Provisional Textual parts of the visual image Audio Description For example, a signpost, a clock, a newspaper headline, an instant message etc.

4.1.2 Default Language

The Default Language is a mandatory property of a DAPT Script which represents the default language for the Text content of Script Events. This language may be one of the original languages or a Translation language. When it represents a Translation language, it may be the final language for which a dubbing or audio description script is being prepared, called the Target Recording Language or it may be an intermediate, or pivot, language used in the workflow.

The Default Language is represented in a DAPT Document by the following structure and constraints:

  • the xml:lang attribute MUST be present on the <tt> element and its value MUST NOT be empty.

Note

All text content in a DAPT Script has a specified language. When multiple languages are used, the Default Language can correspond to the language of the majority of Script Events, to the language being spoken for the longest duration, or to a language arbitrarily chosen by the author.

4.1.3 Script Type

The Script Type property is a mandatory property of a DAPT Script which describes the type of documents used in Dubbing and Audio Description workflows, among the following: Original Language Transcript, Translated Transcript, Pre-recording Script, As-recorded Script.

To represent this property, the daptm:scriptType attribute MUST be present on the <tt> element:

daptm:scriptType
  : "originalTranscript"
  | "translatedTranscript"
  | "preRecording"
  | "asRecorded"

The definitions of the types of documents and the corresponding daptm:scriptType attribute values are:

Editor's note

The following example is orphaned - move to the top of the section, before the enumerated script types?

<tt daptm:scriptType="originalTranscript">
...
</tt>

4.1.4 Script Events

A DAPT Script MAY contain zero or more Script Event objects, each corresponding to dialogue, on screen text, or descriptions for a given time interval.

4.1.5 Characters

A DAPT Script MAY contain zero or more Character objects, each describing a character that can be referenced by a Script Event.

4.1.6 Shared properties

Some of the properties in the DAPT data model are common within more than one object type, and carry the same semantic everywhere they occur. These shared properties are listed in this section.

Editor's note

Would it be better to make a "Timed Object" class and subclass Script Event, Mixing Instruction and Audio Recording from it?

4.1.6.1 Timing Properties

The following timing properties define when the entities that contain them are active:

  • The Begin property defines when an object becomes active, and is relative to the active begin time of the parent object. DAPT Scripts begin at time zero on the media timeline.
  • The End property defines when an object stops being active, and is relative to the active begin time of the parent object.
  • The Duration property defines the maximum duration of an object.
    Note

    If both an End and a Duration property are present, the end time is the earlier of End and Begin + Duration, as defined by [TTML2].

Note
If any of the timing properties is omitted, the following rules apply, paraphrasing the timing semantics defined in [TTML2]:
  • The default value for Begin is zero, i.e. the same as the begin time of the parent object.
  • The default value for End is indefinite, i.e. it resolves to the same as the end time of the parent timed object, if there is one.
  • The default value for Duration is indefinite, i.e. the end time resolves to the same as the end time of the parent object.
Note

The end time of a DAPT Script is for practical purposes the end of the Related Media Object.

4.2 Character

This section is mainly relevant to Dubbing workflows.

A character in the programme can be described using a Character object which has the following properties:

A Character is represented in a DAPT Document by the following structure and constraints:

Note
Issue 44: Define DAPT-specific conformant implementation types CR must-have

We should define our own classes of conformant implementation types, to avoid using the generic "presentation processor" or "transformation processor" ones. We could link to them.
At the moment, I can think of the following classes:

  • DAPT Authoring Tool: tool that produces compliant DAPT documents or consumes DAPT compliant document. I don't think they map to TTML2 processors.
  • DAPT Audio Recorder/Renderer: tool that takes DAPT Audio Description scripts, e.g. with mixing instruction, and produces audio output, e.g. a WAVE file. I think it is a "presentation processor"
  • DAPT Validator: tool that verify that a DAPT document is compliant to the specification. I'm not sure what it maps to in TTML2 terminology.

4.3 Script Event

A Script Event object represents dialogue, on screen text or audio descriptions to be spoken and has the following properties:

A Script Event is represented in a DAPT Document by the following structure and constraints:

4.4 Text

The Text object contains text content typically in a single language. This language may be the Original language or a Translation language.

Text is defined as Original if it is any of:

  1. the same language as the dialogue that it represents in the original programme audio;
  2. a transcription of text visible in the programme video, in the same language as that text;
  3. an untranslated representative of non-dialogue sound;
  4. an untranslated description of the scene in the programme video.

Note

Text is defined as Translation if it is a representation of an Original Text object in a different language.

Text can be identified as being Original or Translation by inspecting its language and its Text Language Source together, according to the semantics defined in Text Language Source.

The source language of Translation Text objects and, where applicable, Original Text objects is indicated using the Text Language Source property.

A Text object may be styled.

Zero or more Mixing Instruction objects used to modify the programme audio during the Text MAY be present.

A Text object is represented in a DAPT Document by a <p> element with the following constraints:

4.5 Text Language Source

The Text Language Source property is an annotation indicating the source language of a Text object, if applicable, or that the source content had no inherent language:

Text Language Source is an inheritable property.

The Text Language Source property is represented in a DAPT Document by a daptm:langSrc attribute with the following syntax, constraints and semantics:

daptm:langSrc
: <empty-string> | <language-identifier>

<empty-string>
: ""                    # default

<language-identifier>   # valid BCP-47 language tag
Note

An example of the usage of Text Language Source in a document is present in the Text section.

4.6 On Screen

The On Screen property is an annotation indicating the position in the scene relating to the subject of a Script Event, for example of the character speaking:

If omitted, the default value is "ON".

Note

The On Screen property is represented in a DAPT Document by a daptm:onScreen attribute on the <div> element, with the following constraints:

4.7 Script Event Description

The Script Event Description object is an annotation providing a human-readable description of a Script Event. Script Event Descriptions can themselves be classified with a Description Type.

A Script Event Description object is represented in a DAPT Document by a <ttm:desc> element at the <div> element level.

Zero or more <ttm:desc> elements MAY be present.

Note

The Script Event Description does not need to be unique, i.e. it does not need to have a different value for each Script Event. For example a particular value could be re-used to identify in a human-readable way one or more Script Events that are intended to be processed together, e.g. in a batch recording.

The <ttm:desc> element MAY specify its language using the xml:lang attribute.

Note
...
  <body>
    <div begin="10s" end="13s">
      <ttm:desc>Scene 1</ttm:desc>
      <p xml:lang="en">
        <span>A woman climbs into a small sailing boat.</span>
      </p>
      <p xml:lang="fr" daptm:langSrc="en">
        <span>Une femme monte à bord d'un petit bateau à voile.</span>
      </p>
    </div>
    <div begin="18s" end="20s">
      <ttm:desc>Scene 1</ttm:desc>
      <p xml:lang="en">
        <span>The woman pulls the tiller and the boat turns.</span>
      </p>
      <p xml:lang="fr" daptm:langSrc="en">
        <span>La femme tire sur la barre et le bateau tourne.</span>
      </p>
    </div>
  </body>
...

Each Script Event Description can be annotated with one or more Description Types to categorise further the purpose of the Script Event Description.

Each Description Type is represented in a DAPT Document by a daptm:descType attribute on the <ttm:desc> element.

The <ttm:desc> element MAY have zero or one daptm:descType attributes. The daptm:descType attribute is defined below.

daptm:descType : string

Its permitted values are listed in the following registry table:

Registry table for the daptm:descType attribute whose Registry Definition is at G.2.1 daptm:descType registry table definition
daptm:descType Status Description Notes
pronunciationNote Provisional Notes for how to pronounce the content.
scene Provisional Contains a scene identifier
plotSignificance Provisional Defines a measure of how significant the content is to the plot. Contents are undefined and may be low, medium or high, or a numerical scale.
...
  <body>
    <div begin="10s" end="13s">
      <ttm:desc daptm:descType="pronunciationNote">[oːnʲ]</ttm:desc>
      <p>Eóin looks around at the other assembly members.</p>
    </div>
  </body>
...

Amongst a sibling group of <ttm:desc> elements there are no constraints on the uniqueness of the daptm:descType attribute, however it may be useful as a distinguisher as shown in the following example.

...
  <body>
    <div begin="10s" end="13s">
      <ttm:desc daptm:descType="scene">Scene 1</ttm:desc>
      <ttm:desc daptm:descType="plotSignificance">High</ttm:desc>
      <p xml:lang="en">
        <span>A woman climbs into a small sailing boat.</span>
      </p>
      <p xml:lang="fr" daptm:langSrc="en">
        <span>Une femme monte à bord d'un petit bateau à voile.</span>
      </p>
    </div>
    <div begin="18s" end="20s">
      <ttm:desc daptm:descType="scene">Scene 1</ttm:desc>
      <ttm:desc daptm:descType="plotSignificance">Low</ttm:desc>
      <p xml:lang="en">
        <span>The woman pulls the tiller and the boat turns.</span>
      </p>
      <p xml:lang="fr" daptm:langSrc="en">
        <span>La femme tire sur la barre et le bateau tourne.</span>
      </p>
    </div>
  </body>
...

4.8 Script Event Type

The Script Event Type property provides one or more space-separated keywords representing the type of the Script Event, i.e. spoken text, or on-screen text, and in the latter case, the type of on-screen text (title, credit, location, ...).

The Script Event Type is represented in a DAPT Document by the following attribute:

daptm:eventType : string

Its permitted values are as listed in the following registry table:

Registry table for the daptm:eventType attribute whose Registry Definition is at G.2.2 daptm:eventType registry table definition
daptm:eventType Status Description Notes
dialogue Provisional
spokenText Provisional
onScreenText Provisional
title Provisional
credit Provisional
location Provisional
...
<div xml:id="event_1"
     begin="9663f" end="9682f" 
     ttm:agent="character_4"
     daptm:eventType="dialogue">
...
</div>
...

4.9 Audio

An Audio object is used to specify an audio rendering of a Text. The audio rendering can either be a recorded audio resource, as an Audio Recording object, or a directive to synthesize a rendering of the text via a text to speech engine, which is a Synthesized Audio object. Both are types of Audio object.

It is an error for an Audio not to be in the same language as its Text.

A presentation processor that supports audio plays or inserts the Audio at the specified time on the related media object's timeline.

Note

The Audio object is "abstract": it only can exist as one of its sub-types, Audio Recording or Synthesized Audio.

4.9.1 Audio Recording

An Audio Recording is an Audio object that references an audio resource. It has the following properties:

  • One or more alternative Sources, each of which is either 1) a link to an external audio resource or 2) an embedded audio recording;
  • For each Source, one mandatory Type that specifies the type ([MIME-TYPES]) of the audio resource, for example audio/basic;
  • An optional Begin property and an optional End and an optional Duration property that together define the Audio Recording's time interval in the programme timeline, in relation to the parent element's time interval;
  • An optional In Time and an optional Out Time property that together define a temporal subsection of the audio resource;

    The default In Time is the beginning of the audio resource.

    The default Out Time is the end of the audio resource.

    If the temporal subsection of the audio resource is longer than the duration of the Audio Recording's time interval, then playback MUST be truncated to end when the Audio Recording's time interval ends.

    Note

    If the temporal subsection of the audio resource is shorter than the duration of the Audio Recording's time interval, then the audio resource plays once.

  • Zero or more Mixing Instructions that modify the playback characteristics of the Audio Recording.

When a list of Sources is provided, a presentation processor MUST play no more than one of the Sources for each Audio Recording.

This feature may contribute to browser fingerprintability. Implementations can use the Type, and if present, any relevant additional formatting information, to decide which Source to play. For example, given two Sources, one being a WAV file, and the other an MP3, an implementation that can play only one of those formats, or is configured to have a preference for one or the other, would select the playable or preferred version.

An Audio Recording is represented in a DAPT Document by an <audio> element child of a <p> or <span> element corresponding to the Text to which it applies. The following constraints apply to the <audio> element:

  • The begin, end and dur attributes represent respectively the Begin, End and Duration properties;
  • The clipBegin and clipEnd attributes represent respectively the In Time and Out Time properties, as illustrated by Example 5;
  • For each Source, if it is a link to an external audio resource, the Source and Type properties are represented by exactly one of:
    1. A src attribute that is not a fragment identifier, and a type attribute respectively;

      This mechanism cannot be used if there is more than one Source.

      <audio src="https://example.com/audio.wav" type="audio/wave"/>
    2. A <source> child element with a src attribute that is not a fragment identifier and a type attribute respectively;
      <audio>
        <source src="https://example.com/audio.wav" type="audio/wave"/>
        <source src="https://example.com/audio.aac" type="audio/aac"/>
      </audio>

    A src attribute that is not a fragment identifier is a URL that references an external audio resource, i.e. one that is not embedded within the DAPT Script. No validation that the resource can be located is specified in DAPT.

    Editor's note

    Do we need both mechanisms here? It's not clear what semantic advantage the child <source> element carries in this case. Consider marking use of that child <source> element as "at risk"?

    Issue 113: Support both `@src` and `<source>` child of `<audio>` (external resources)? questionPR-must-have
              While working on the specification for adding audio recordings I reminded myself of the various ways in which an audio recording can be embedded and referenced, of which there are at least 5 in total. Requirement R15 of [DAPT](https://www.w3.org/TR/dapt-reqs/#requirements) is clear that both referenced and embedded options need to be available, but should we be syntactically restricting the options for each? Will raise as separate issues.
    

    Originally posted by @nigelmegitt in #105 (comment)

    The following two options exist in TTML2 for referencing external audio resources:

    1. src attribute in <audio> element.
    <audio src="https://example.com/audio_recording.wav" type="audio/wave"/>
    1. <source> element child of <audio> element.
    <audio>
        <source src="https://example.com/audio_recording.wav" type="audio/wave"/>
    </audio>

    This second option has an additional possibility of specifying a format attribute in case type is inadequate. It also permits multiple <source> child elements, and we specify that in this case the implementation must choose no more than one.

    [Edited 2023-03-29 to account for the "play no more than one" constraint added after the issue was opened]

    Issue 218: At-risk: support for `src` attribute in `<audio>` for external resource PR-must-have

    Possible resolution to #113.

    Issue 219: At-risk: support for `<source>` element child of `<audio>` for external resource PR-must-have

    Possible resolution to #113.

  • If the Source is an embedded audio resource, the Source and Type properties are represented together by exactly one of:
    1. A src attribute that is a fragment identifier that references either an <audio> element or a <data> element, where the referenced element is a child of /tt/head/resources and specifies a type attribute and the xml:id attribute used to reference it;

      This mechanism cannot be used if there is more than one Source.

      <tt>
        <head>
          <resources>
            <data type="audio/wave" xml:id="audio1">
              [base64-encoded WAV audio resource]
            </data>
          </resources>
        </head>
        <body>
          ..
          <audio src="#audio1"/>
          ..
        </body>
      </tt>
    2. A <source> child element with a src attribute that is a fragment identifier that references either an <audio> element or a <data> element, where the referenced element is a child of /tt/head/resources and specifies a type attribute and the xml:id attribute used to reference it;
      <tt>
        <head>
          <resources>
            <data type="audio/wave" xml:id="audio1wav">
              [base64-encoded WAV audio resource]
            </data>
            <data type="audio/mpeg" xml:id="audio1mp3">
              [base64-encoded MP3 audio resource]
            </data>
          </resources>
        </head>
        <body>
          ..
          <audio>
            <source src="#audio1wav"/>
            <source src="#audio1mp3"/>
          </audio>
          ..
        </body>
      </tt>
    3. A <source> child element with a <data> element child that specifies a type attribute and contains the audio recording data.
      <audio>
        <source>
          <data type="audio/wave">
              [base64-encoded WAV audio resource]
          </data>
        </source>
      </audio>

    In each of the cases above the type attribute represents the Type property.

    A src attribute that is a fragment identifier is a pointer to an audio resource that is embedded within the DAPT Script

    If <data> elements are defined, each one MUST contain either #PCDATA or <chunk> child elements and MUST NOT contain any <source> child elements.

    <data> and <source> elements MAY contain a format attribute whose value implementations MAY use in addition to the type attribute value when selecting an appropriate audio resource.

    Editor's note

    Do we need all 3 mechanisms here? Do we need any? There may be a use case for embedding audio data, since it makes the single document a portable (though large) entity that can be exchanged and transferred with no concern for missing resources, and no need for e.g. manifest files. If we do not need to support referenced embedded audio then only the last option is needed, and is probably the simplest to implement. One case for referenced embedded audio is that it more easily allows reuse of the same audio in different document locations, though that seems like an unlikely requirement in this use case. Another is that it means that all embedded audio is in an easily located part of the document in tt/head/resources, which potentially could carry an implementation benefit? Consider marking the embedded data features as "at risk"?

    Issue 114: Support both `@src` and `<source>` child of `<audio>` (embedded resources)? questionPR-must-have
              While working on the specification for adding audio recordings I reminded myself of the various ways in which an audio recording can be embedded and referenced, of which there are at least 5 in total. Requirement R15 of [DAPT](https://www.w3.org/TR/dapt-reqs/#requirements) is clear that both referenced and embedded options need to be available, but should we be syntactically restricting the options for each? Will raise as separate issues.
    

    Originally posted by @nigelmegitt in #105 (comment)

    Given some embedded audio resources:

    <head>
      <resources>
        <audio xml:id="audioRecording1" type="audio/wave">
          <source>
            <data>[base64 encoded audio data]</data>
          </source>
        </audio>
        <data xml:id="audioRecording2" type="audio/wave">
          [base64 encoded audio data]
        </data>
      </resources>
    </head>

    The following two options exist in TTML2 for referencing embedded audio resources:

    1. src attribute in <audio> element referencing embedded <audio> or <data>:
    <audio src="#audioRecording1"/>
    ...
    <audio src="#audioRecording2"/>
    1. <source> element child of <audio> element.
    <audio>
        <source src="#audioRecording1"/>
    </audio>

    This second option has an additional possibility of specifying a format attribute in case type is inadequate. It also permits multiple <source> child elements, though it is unclear what the semantic is intended to be if multiple resources are specified - presumably, the implementation gets to choose one somehow.

    Issue 115: Support both referenced and inline embedded audio recordings? questionPR-must-have
              While working on the specification for adding audio recordings I reminded myself of the various ways in which an audio recording can be embedded and referenced, of which there are at least 5 in total. Requirement R15 of [DAPT](https://www.w3.org/TR/dapt-reqs/#requirements) is clear that both referenced and embedded options need to be available, but should we be syntactically restricting the options for each? Will raise as separate issues.
    

    Originally posted by @nigelmegitt in #105 (comment)

    If we are going to support embedded audio resources, they can either be defined in /tt/head/resources and then referenced, or the data can be included inline.

    Do we need both options?

    Example of embedded:

    <head>
      <resources>
        <audio xml:id="audioRecording1" type="audio/wave">
          <source>
            <data>[base64 encoded audio data]</data>
          </source>
        </audio>
        <data xml:id="audioRecording2" type="audio/wave">
          [base64 encoded audio data]
        </data>
      </resources>
    </head>

    This would then be referenced in the body content using something like (see also #114):

    <audio src="#audioRecording2"/>

    Example of inline:

    <audio type="audio/wave">
      <source type="audio/wave">
        <data>[base64 encoded audio data]</data>
      </source>
    </audio>
    Issue 220: At-risk: support for `src` attribute of `<audio>` element pointing to embedded resource PR-must-have

    Possible resolution to #114 and #115.

    The link to #115 is that this implies the existence of some referenceable embedded audio resource too, which one of the options described in #115.

    Issue 221: At-risk: support for `<source>` child of `<audio>` element pointing to embedded resource PR-must-have

    Possible resolution to #114 and #115.

    The link to #115 is that this implies the existence of some referenceable embedded audio resource too, which one of the options described in #115.

    Issue 222: At-risk: support for inline audio resources PR-must-have

    Possible resolution to #115.

    Issue 116: Add non-inlined embedded audio resources to the Data Model? questionPR-must-have

    See also #115 - if we are going to support non-inline embedded audio resources, should we make an object for them and add it into the Data Model?

    Issue 117: Embedded data: Do we need to support all the permitted encodings? What about length? questionPR-must-have

    In TTML2's <data> element, an encoding can be specified, being one of:

    • base16
    • base32
    • base32hex
    • base64
    • base64url

    Do we need to require processor support for all of them, or will the default base64 be adequate?

    Also, it is possible to specify a length attribute that provides some feasibility of error checking, since the decoded data must be the specified length in bytes. Is requiring support for this a net benefit? Would it be used?

    Issue 223: At-risk: each of the potential values of `encoding` in `<data>` PR-must-have

    Possible resolution to #117.

    Issue 224: At-risk: support for the `length` attribute on `<data>` PR-must-have

    Possible resolution to #117.

  • Mixing Instructions MAY be applied as specified in their TTML representation;
  • The computed value of the xml:lang attribute MUST be identical to the computed value of the xml:lang attribute of the parent element and any child <source> elements and any referenced embedded <data> elements.

4.9.2 Synthesized Audio

A Synthesized Audio is an Audio object that represents a machine generated audio rendering of the parent Text content. It has the following properties:

  • A mandatory Rate that specifies the rate of speech, being normal, fast or slow;
  • An optional Pitch that allows adjustment of the pitch of the speech.

A Synthesized Audio is represented in a DAPT Document by the application of a tta:speak style attribute on the element representing the Text object to be spoken, where the computed value of the attribute is normal, fast or slow. This attribute also represents the Rate Property.

The tta:pitch style attribute represents the Pitch property.

The TTML representation of a Synthesized Audio is illustrated by Example 6.

Note

A tta:pitch attribute on an element whose computed value of the tta:rate attribute is none has no effect. Such an element is not considered to have an associated Synthesized Audio.

Note

The semantics of the Synthesized Audio vocabulary of DAPT are derived from equivalent features in [SSML] as indicated in [TTML2]. This version of the specification does not specify how other features of [SSML] can be either generated from DAPT or embedded into DAPT documents. The option to extend [SSML] support in future versions of this specification is deliberately left open.

4.10 Mixing Instruction

A Mixing Instruction object is a static or animated adjustment of the audio relating to the containing object. It has the following properties:

A Mixing Instruction is represented by applying audio style attributes to the element that corresponds to the relevant object, either inline, by reference to a <style> element, or in a child (inline) <animate> element:

If the Mixing Instruction is animated, that is, if the adjustment properties change during the containing object's active time interval, then it is represented by one or more child <animate> elements. This representation is required if more than one Gain or Pan property is needed, or if any timing properties are needed.

The <animate> element(s) MUST be children of the element corresponding to the containing object, and have the following constraints:

The TTML representation of animated Mixing Instructions is illustrated by Example 4.

See also D. Audio Mixing.

5. Constraints

5.1 Document Encoding

A DAPT Document MUST be serialised as a well-formed XML 1.0 [xml] document encoded using the UTF-8 character encoding as specified in [UNICODE].

The resulting [xml] document MUST NOT contain any of the following physical structures:

Note

The resulting [xml] document can contain character references, and entity references to predefined entities.

The predefined entities are (including the leading ampersand and trailing semicolon):

  • &amp; for an ampersand & (unicode code point U+0026)
  • &apos; for an apostrophe ' (unicode code point U+0027)
  • &gt; for a greater than sign > (unicode code point U+003E)
  • &lt; for a less than sign < (unicode code point U+003C)
  • &quot; for a quote symbol " (unicode code point U+0022)
Note

A DAPT Document can also be used as an in-memory model for processing, in which case the serialisation requirements do not apply.

5.2 Foreign Elements and Attributes

A DAPT Document MAY contain elements and attributes that are neither specifically permitted nor forbidden by a profile.

Note

DAPT Documents remain subject to the content conformance requirements specified at Section 3.1 of [TTML2]. In particular, a DAPT Document can contain elements and attributes not in any TT namespace, i.e. in foreign namespaces, since such elements and attributes are pruned by the algorithm at Section 4 of [TTML2] prior to evaluating content conformance.

Note

For validation purposes it is good practice to define and use a content specification for all foreign namespace elements and attributes used within a DAPT Document.

A transformation processor SHOULD preserve such elements or attributes whenever possible.

Editor's note

Do we need to say that a presentation processor may ignore foreign vocab?

5.2.1 Proprietary Metadata

Many dubbing and audio description workflows permit annotation of Script Events or documents with proprietary metadata. Metadata vocabulary defined in this specification or in [TTML2] MAY be included. Additional vocabulary in other namespaces MAY also be included.

Note

It is possible to add information such as the title of the programme using [TTML2] constructs.

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Error</title>
</head>
<body>
<pre>Cannot GET /examples/metadata-TTML2.xml</pre>
</body>
</html>
Note

It is possible to add workflow-specific information using a foreign namespace. In the following example, a fictitious namespace vendorm from an "example vendor" is used to provide document-level information not defined by DAPT.

...
  <metadata xmlns:vendorm="http://www.example-vendor.com/ns/ttml#metadata">
    <vendorm:programType>Episode</vendorm:programType>
    <vendorm:episodeSeason>5</vendorm:episodeSeason>
    <vendorm:episodeNumber>8</vendorm:episodeNumber>
    <vendorm:internalId>15734</vendorm:internalId>
    <vendorm:information>Some proprietary information</vendorm:information>
  </metadata>
...

5.3 Namespaces

The following namespaces (see [xml-names]) are used in this specification:

Name Prefix Value Defining Specification
XML xml http://www.w3.org/XML/1998/namespace [xml-names]
TT tt http://www.w3.org/ns/ttml [TTML2]
TT Parameter ttp http://www.w3.org/ns/ttml#parameter [TTML2]
TT Feature none http://www.w3.org/ns/ttml/feature/ [TTML2]
TT Audio Style tta http://www.w3.org/ns/ttml#audio [TTML2]
DAPT Metadata daptm http://www.w3.org/ns/ttml/profile/dapt#metadata This specification
DAPT Extension none http://www.w3.org/ns/ttml/profile/dapt/extension/ This specification

The namespace prefix values defined above are for convenience and DAPT Documents MAY use any prefix value that conforms to [xml-names].

The namespaces defined by this proposal document are mutable [namespaceState]; all undefined names in these namespaces are reserved for future standardization by the W3C.

5.5 Synchronization

If the DAPT Document is intended to be used as the basis for producing an [TTML-IMSC1.2] document, the synchronization provisions of [TTML-IMSC1.2] apply in relation to the video.

Timed content within the DAPT Document is intended to be rendered starting and ending on specific audio samples.

Note

In the context of this specification rendering could be visual presentation of text, for example to show an actor what words to speak, or could be audible playback of an audio resource, or could be physical or haptic, such as a Braille display.

In constrained applications, such as real-time audio mixing and playback, if accurate synchronization to the audio sample cannot be achieved in the rendered output, the combined effects of authoring and playback inaccuracies in timed changes in presentation SHOULD meet the synchronization requirements of [EBU-R37], i.e. audio changes are not to precede image changes by more than 40ms, and are not to follow them by more than 60ms.

Likewise, authoring applications SHOULD allow authors to meet the requirements of [EBU-R37] by defining times with an accuracy such that changes to audio are less than 15ms after any associated change in the video image, and less than 5ms before any associated change in the video image.

Taken together, the above two constraints on overall presentation and on DAPT documents intended for real-time playback mean that content processors SHOULD complete audio presentation changes no more than 35ms before the time specified in the DAPT document and no more than 45ms after the time specified.

5.6 Profile Signaling

This section defines how a TTML Document Instance signals that it is a DAPT Document and how it signals any processing requirements that apply. See also 6.1 Conformance of DAPT Documents, which defines how to establish that a DAPT Document conforms to this specification.

5.6.1 Profile Designator

This profile is associated with the following profile designators:

Profile Name Profile Designator
DAPT 1.0 Content Profile http://www.w3.org/ns/ttml/profile/dapt1.0/content
DAPT 1.0 Processor Profile http://www.w3.org/ns/ttml/profile/dapt1.0/processor

5.6.2 ttp:contentProfiles

The ttp:contentProfiles attribute is used to declare the [TTML2] profiles to which the document conforms.

TTML documents representing DAPT Scripts MUST specify a ttp:contentProfiles attribute on the <tt> element including one value equal to the DAPT 1.0 Content Profile designator. Other values MAY be present to declare conformance to other profiles of [TTML2], and MAY include profile designators in proprietary namespaces.

5.6.3 ttp:profile

The ttp:profile attribute is a mechanism within [TTML1] for declaring the processing requirements for a Document Instance. It has effectively been superceded in [TTML2] by ttp:processorProfiles.

TTML documents representing DAPT Scripts MUST NOT specify a ttp:profile attribute on the <tt> element.

5.6.4 ttp:processorProfiles

The ttp:processorProfiles attribute is used to declare the processing requirements for a Document Instance.

TTML documents representing DAPT Scripts MAY specify a ttp:processorProfiles attribute on the <tt> element. If present, the ttp:processorProfiles attribute MUST include one value equal to the designator of the DAPT 1.0 Processor Profile. Other values MAY be present to declare additional processing constraints, and MAY include profile designators in proprietary namespaces.

Note

The ttp:processorProfiles attribute can be used to signal that features and extensions in additional profiles need to be supported to process the Document Instance successfully. For example, a local workflow might introduce particular metadata requirements, and signal that the processor needs to support those by using an additional processor profile designator.

Note

If the content author does not need to signal that additional processor requirements than those defined by DAPT are needed to process the DAPT document then the ttp:processorProfiles attribute is not expected to be present.

5.6.5 Other TTML2 Profile Vocabulary

[TTML2] specifies a vocabulary and semantics that can be used to define the set of features that a document instance can make use of, or that a processor needs to support, known as a Profile.

Except where specified, it is not a requirement of DAPT that this profile vocabulary is supported by processors; nevertheless such support is permitted.

The majority of this profile vocabulary is used to indicate how a processor can compute the set of features that it needs to support in order to process the Document Instance successfully. The vocabulary is itself defined in terms of TTML2 features. Those profile-related features are listed within E. Profiles as being optional. They MAY be implemented in processors and their associated vocabulary MAY be present in Document Instances.

Note

Unless processor support for these features and vocabulary has been arranged (using an out-of-band protocol), the vocabulary is not expected to be present.

The additional profile-related vocabulary for which processor support is not required (but is permitted) in DAPT is:

5.7 Timing constraints

Within a DAPT Script, the following constraints apply in relation to time attributes and time expressions:

5.7.1 ttp:timeBase

The only permitted ttp:timeBase attribute value is media, since E. Profiles prohibits all timeBase features other than #timeBase-media.

This means that the beginning of the document timeline, i.e. time "zero", is the beginning of the Related Media Object.

5.7.2 timeContainer

The only permitted value of the timeContainer attribute is the default value, par.

Documents SHOULD omit the timeContainer attribute on all elements.

Documents MUST NOT set the timeContainer attribute to any value other than par on any element.

Note

This means that the begin attribute value for every timed element is relative to the computed begin time of its parent element, or for the <body> element, to time zero.

5.7.3 ttp:frameRate

If the document contains any time expression that uses the f metric, or any time expression that contains a frames component, the ttp:frameRate attribute MUST be present on the <tt> element.

Note

5.7.4 ttp:tickRate

If the document contains any time expression that uses the t metric, the ttp:tickRate attribute MUST be present on the <tt> element.

5.7.5 Time expressions

All time expressions within a document SHOULD use the same syntax, either clock-time or offset-time as defined in [TTML2], with DAPT constraints applied.

Note

A DAPT clock-time has one of the forms:

  • hh:mm:ss.sss
  • hh:mm:ss

where hh is hours, mm is minutes, ss is seconds, and ss.sss is seconds with a decimal fraction of seconds (any precision).

Note

Clock time expressions that use frame components, which look similar to "time code", are prohibited due to the semantic confusion that has been observed elsewhere when they are used, particularly with non-integer frame rates, "drop modes" and sub-frame rates.

Note

An offset-time has one of the forms:

  • nn metric
  • nn.nn metric

where nn is an integer, nn.nn is a number with a decimal fraction (any precision), and metric is one of:

  • h for hours,
  • m for minutes,
  • s for seconds,
  • ms for milliseconds,
  • f for frames, and
  • t for ticks.

When mapping a media time expression M to a frame F of the video, e.g. for the purpose of accurately timing lip synchronization, the content processor SHOULD map M to the frame F with the presentation time that is the closest to, but not less, than M.

A media time expression of 00:00:05.1 corresponds to frame ceiling( 5.1 × ( 1000 / 1001 × 30) ) = 153 of a video that has a frame rate of 1000 / 1001 × 30 ≈ 29.97.

5.8 Layout and styles

This specification does not put additional constraints on the layout and rendering features defined in [TTML-IMSC1.2].

Note
Layout of the paragraphs may rely on the default TTML region (i.e. if no <layout> element is used in the <head> element) or may be explicit by the use of the region attribute, to refer to a <region> element present at /tt/head/layout/region.

Style references or inline styles MAY be used, using any combination of style attributes, <style> elements and inline style attributes as defined in [TTML2] or [TTML-IMSC1.2].

5.9 Bidirectional text

The following metadata elements are permitted in DAPT and specified in [TTML2] as containing #PCDATA, i.e. text data only with no element content. Where bidirectional text is required within the character content within such an element, Unicode control characters can be used to define the base direction within arbitrary ranges of text.

Note

More guidance about usage of this mechanism is available at Inline markup and bidirectional text in HTML.

The <p> and <span> content elements permit the direction of text to be specified using the tts:direction and tts:unicodeBidi attributes. Document authors should use this more robust mechanism rather than using Unicode control characters.

Note

The following example taken from [TTML2] demonstrates the syntax for bidirectional text markup within the <p> and <span> elements.

<p>
The title of the book is
"<span tts:unicodeBidi="embed" tts:direction="rtl">نشاط التدويل، W3C</span>"
</p>

An example rendering of the above fragment is shown below.

Example rendition of direction example showing from left to right The title of the book is W3C and then from right to left the applicable Arabic text

6. Conformance

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

The key words MAY, MUST, MUST NOT, SHOULD, and SHOULD NOT in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

[TTML2] specifies a formal language for expressing document and processor requirements, within the Profiling sub-system. The normative requirements of this specification are defined using the conformance terminology described above, and are also defined using this TTML2 profile mechanism. Where TTML2 vocabulary is referenced, the syntactic and semantic requirements relating to that vocabulary as defined in [TTML2] apply.

Whilst there is no requirement for a DAPT processor to implement the TTML2 profile processing semantics in general, implementers can use the TTML2 profiles defined in E. Profiles as a means of verifying that their implementations meet the normative requirements of DAPT, for example as a checklist.

Conversely, a general purpose [TTML2] processor that does support the TTML2 profile processing semantics can use the TTML2 profiles defined in E. Profiles directly to determine if it is capable of processing a DAPT document.

6.1 Conformance of DAPT Documents

Conformant DAPT Documents are [TTML2] Document Instances that conform to the normative provisions of this specification. Those provisions are expressed using the profile vocabulary of [TTML2] in the content profile defined in E. Profiles.

6.2 Conformance of DAPT Processors

Conformant DAPT Processors are [TTML2] content processors that conform to the normative provisions of this specification. Those provisions are expressed using the profile vocabulary of [TTML2] in the processor profile defined in E. Profiles.

A. Index

A.1 Terms defined by this specification

A.2 Terms defined by reference

B. Privacy Considerations

This section is non-normative.

With the exception of the following, the privacy considerations of [ttml2] apply:

B.1 Personal Information

DAPT documents typically contain the names of characters or people who feature within the associated media, either fictional or real. In general this information would be present within the media itself or be public via other routes. If there is sensitivity associated with their being known to people with access to the DAPT documents in which their identity is contained, then such access should be managed with appropriate confidentiality. For example those documents could be available within a closed authoring environment and edited to remove the sensitive information prior to distribution to a wider audience. If this scenario arises, information security good practices within the closed environment should be applied, such as encryption of the document "at rest" and when being moved, access controlled by authentication platforms, etc.

B.2 Audio format preference

This feature may contribute to browser fingerprintability. DAPT documents can reference a set of alternate external audio resources for the same fragment of audio, where the processor is expected to select one of the alternatives based on features such as format support. If this pattern is used, it is possible that the processor's choice of audio resource, being exposed to the origin, reveals information about that processor, such as its preferred audio format.

Note

C. Security Considerations

This section is non-normative.

The security considerations of [ttml2] apply.

D. Audio Mixing

This section is non-normative.

Applying the Mixing Instructions can be implemented using [webaudio]. Figure 2 shows the flow of programme audio, and how, when audio-generating elements are active, the pan and gain (if set) on the Script Event are applied, then the output is passed to the Text, which mixes in the audio from any active Audio Recording, itself subject to its own Mixing Instructions, then the result has the Text's Mixing Instructions applied, prior to the output being mixed on to the master bus.

(active) ScriptEventPanGain(active) AudioRecordingPanGain(active) TextPanGainprogrammeaudioWhen audiomixing is activeWhen no audiomixing is active
Figure 2 Example simple audio routing between objects

This example is shown as [webaudio] nodes in Figure 3.

GainNode (ScriptEvent)PanNode (ScriptEvent)GainNode (Text)PanNode (Text)GainNode (AudioRecording)PanNode (AudioRecording)Implicit mixerMaster busProgramme AudioAudioRecordingSource AudioOutput Audio
Figure 3 Web audio nodes representing the audio processing needed.

The above examples are simplified in at least two ways:

E. Profiles

This section defines a [TTML2] content profile and a processor profile by expressing dispositions against a set of features and extensions. The DAPT extensions are defined in F. Extensions.

The Profile Semantics specified in [TTML2] apply.

A TTML Profile specification is a document that lists all the features of TTML that are required / optional / prohibited within “document instances” (files) and “processors” (things that process the files), and any extensions or constraints.

A Document Instance that conforms to the content profile defined herein:

Note

A Document Instance, by definition, satisfies the requirements of Section 3.1 at [TTML2], and hence a Document Instance that conforms to a profile defined herein is also a conforming TTML2 Document Instance.

A Presentation processor that conforms to the processor profile defined in this specification:

A Transformation processor that conforms to the processor profile defined in this specification:

The dispositions required, permitted, optional and prohibited as used in this specification map to the [TTML2] <ttp:feature> and <ttp:extension> elements' value attribute values as follows:

DAPT disposition <ttp:feature> or <ttp:extension> element value attribute value in
content profile processor profile
required required required
permitted optional required
optional optional optional
prohibited prohibited optional
Note

The use of the terms presentation processor and transformation processor within this document does not imply conformance per se to any of the Standard Profiles defined in [TTML2]. In other words, it is not considered an error for a presentation processor or transformation processor to conform to the profile defined in this document without also conforming to the TTML2 Presentation Profile or the TTML2 Transformation Profile.

Note

The use of the [TTML2] profiling sub-system to describe DAPT conformance within this specification is not intended imply that DAPT processors are required to support any features of that system other than those for which support is explicitly required by DAPT.

Note

This document does not specify presentation processor or transformation processor behavior when processing or transforming a non-conformant Document Instance.

Note

The permitted and prohibited dispositions do not refer to the specification of a <ttp:feature> or <ttp:extension> element as being permitted or prohibited within a <ttp:profile> element.

E.1 Disposition of Features and Extensions

The features and extensions listed in this section express the minimal requirements for DAPT Documents, Presentation Processors, and Transformation Processors. DAPT Documents MAY additionally conform to other profiles, and include syntax not prohibited by the DAPT content profile. Presentation Processors and Transformation Processors MAY support additional syntax and semantics relating to other profiles.

Note

For example, a DAPT Script can include syntax permitted by the IMSC ([TTML-IMSC1.2]) profiles of [TTML2] to enhance the presentation of scripts to actors recording audio, or to add styling important for later usage in subtitle or caption creation.

Editor's note

Editorial task: go through this list of features and check the disposition of each. There should be no prohibited features that are permitted in IMSC.

Feature or Extension Disposition Additional provision
Relative to the TT Feature namespace
#animate-minimal permitted
#animate-fill permitted
#animation-out-of-line prohibited See 4.10 Mixing Instruction.
#audio permitted
#audio-description permitted
#audio-speech permitted
#bidi permitted
#bidi-version-2 permitted
#chunk permitted
#clockMode prohibited
#clockMode-gps prohibited
#clockMode-local prohibited
#clockMode-utc prohibited
#content permitted
#contentProfiles optional See 5.6.2 ttp:contentProfiles and F.3 #contentProfiles-root.
#contentProfiles-combined optional See 5.6.5 Other TTML2 Profile Vocabulary.
#core permitted
#data permitted
#direction permitted
#dropMode prohibited
#dropMode-dropNTSC prohibited
#dropMode-dropPAL prohibited
#dropMode-nonDrop prohibited
#embedded-audio permitted
#embedded-data permitted
#frameRate permitted See 5.7.3 ttp:frameRate.
#frameRateMultiplier permitted
#gain permitted
#markerMode prohibited
#markerMode-continuous prohibited
#markerMode-discontinuous prohibited
#metadata permitted
#metadata-item permitted
#metadata-version-2 permitted
#pan permitted
#permitFeatureNarrowing optional See 5.6.5 Other TTML2 Profile Vocabulary.
#permitFeatureWidening optional See 5.6.5 Other TTML2 Profile Vocabulary.
#pitch permitted
#presentation-audio permitted
#processorProfiles optional See 5.6.4 ttp:processorProfiles.
#processorProfiles-combined optional See 5.6.5 Other TTML2 Profile Vocabulary.
#profile partially permitted See 5.6.3 ttp:profile.
#profile-full-version-2 partially permitted See 5.6.5 Other TTML2 Profile Vocabulary.
#profile-version-2 partially permitted See 5.6.5 Other TTML2 Profile Vocabulary.
#resources permitted
#set permitted
#set-fill permitted
#set-multiple-styles permitted
#source permitted
#speak permitted
#speech permitted
#structure required
#styling permitted
#styling-chained permitted
#styling-inheritance-content permitted
#styling-inline permitted
#styling-referential permitted
#subFrameRate prohibited
#tickRate permitted See 5.7.4 ttp:tickRate.
#time-clock permitted
#time-clock-with-frames prohibited
#time-offset-with-frames permitted See 5.7.3 ttp:frameRate.
#time-offset-with-ticks permitted See 5.7.4 ttp:tickRate.
#time-offset permitted
#time-wall-clock prohibited
#timeBase-clock prohibited
#timeBase-media required

See 5.7.1 ttp:timeBase.

NOTE: [TTML1] specifies that the default timebase is "media" if the ttp:timeBase attribute is not specified on the <tt> element.

#timeBase-smpte prohibited
#timeContainer prohibited See 5.7.2 timeContainer.
#timing permitted See 5.7.5 Time expressions.
#transformation permitted See constraints at #profile.
Relative to the DAPT Extension namespace
#agent permitted This is the profile expression of 4.2 Character.
#contentProfiles-root required This is the profile expression of 5.6.2 ttp:contentProfiles.
#onScreen permitted This is the profile expression of 4.6 On Screen.
#profile-root prohibited This is the profile expression of 5.6.3 ttp:profile.
#represents-root required This is the profile expression of 4.1.1 Represents.
#scriptType-root required This is the profile expression of 4.1.3 Script Type.
#serialization required This is the profile expression of 5.1 Document Encoding.
#source-data prohibited This is the profile expression of the prohibition of <source> child elements of <data> elements as specified in 4.9.1 Audio Recording.
#textLanguageSource permitted This is the profile expression of 4.5 Text Language Source as required at 4.4 Text.
#xmlId-div required This is the profile expression of 4.3 Script Event.
#xmlLang-audio-nonMatching prohibited This is the profile expression of the prohibition of the xml:lang attribute on the <audio> element having a different computed value to the parent element and descendant or referenced <source> and <data> elements, as specified in 4.9.1 Audio Recording.
#xmlLang-root required This is the profile expression of 4.1.2 Default Language.

E.2 DAPT Content Profile

The DAPT Content Profile expresses the conformance requirements of DAPT Scripts using the profile mechanism of [TTML2]. It can be used by a validating processor that supports the DAPT Processor Profile to validate a DAPT Document.

There is no requirement to include the DAPT Content Profile within a Document Instance.

<?xml version="1.0" encoding="utf-8"?>
<!-- this file defines the "dapt-content" profile of ttml -->
<profile xmlns="http://www.w3.org/ns/ttml#parameter"
  designator="http://www.w3.org/ns/ttml/profile/dapt1.0/content"
  combine="mostRestrictive"
  type="content">
  <features xml:base="http://www.w3.org/ns/ttml/feature/">
    <!-- required (mandatory) feature support -->
    <feature value="required">#structure</feature>
    <feature value="required">#timeBase-media</feature>
    <!-- optional (voluntary) feature support -->
    <feature value="optional">#animate-fill</feature>
    <feature value="optional">#animate-minimal</feature>
    <feature value="optional">#audio</feature>
    <feature value="optional">#audio-description</feature>
    <feature value="optional">#audio-speech</feature>
    <feature value="optional">#bidi</feature>
    <feature value="optional" extends="#bidi">#bidi-version-2</feature>
    <feature value="optional">#chunk</feature>
    <feature value="optional">#content</feature>
    <feature value="optional">#contentProfiles</feature>
    <feature value="optional">#contentProfiles-combined</feature>
    <feature value="optional">#core</feature>
    <feature value="optional">#data</feature>
    <feature value="optional">#direction</feature>
    <feature value="optional">#embedded-audio</feature>
    <feature value="optional">#embedded-data</feature>
    <feature value="optional">#frameRate</feature>
    <feature value="optional">#frameRateMultiplier</feature>
    <feature value="optional">#gain</feature>
    <feature value="optional">#metadata</feature>
    <feature value="optional">#metadata-item</feature>
    <feature value="optional" extends="#metadata">#metadata-version-2</feature>
    <feature value="optional">#pan</feature>
    <feature value="optional">#permitFeatureNarrowing</feature>
    <feature value="optional">#permitFeatureWidening</feature>
    <feature value="optional">#pitch</feature>
    <feature value="optional">#presentation-audio</feature>
    <feature value="optional">#processorProfiles</feature>
    <feature value="optional">#processorProfiles-combined</feature>
    <feature value="optional">#resources</feature>
    <feature value="optional" extends="#animation">#set</feature>
    <feature value="optional">#set-fill</feature>
    <feature value="optional">#set-multiple-styles</feature>
    <feature value="optional">#source</feature>
    <feature value="optional">#speak</feature>
    <feature value="optional">#speech</feature>
    <feature value="optional">#styling</feature>
    <feature value="optional">#styling-chained</feature>
    <feature value="optional">#styling-inheritance-content</feature>
    <feature value="optional">#styling-inline</feature>
    <feature value="optional">#styling-referential</feature>
    <feature value="optional">#tickRate</feature>
    <feature value="optional">#time-clock</feature>
    <feature value="optional">#time-offset</feature>
    <feature value="optional">#time-offset-with-frames</feature>
    <feature value="optional">#time-offset-with-ticks</feature>
    <feature value="optional">#timing</feature>
    <feature value="optional">#unicodeBidi</feature>
    <feature value="optional">#unicodeBidi-isolate</feature>
    <feature value="optional" extends="#unicodeBidi">#unicodeBidi-version-2</feature>
    <feature value="optional">#xlink</feature>
    <!-- prohibited feature support -->
    <feature value="prohibited">#animation-out-of-line</feature>
    <feature value="prohibited">#clockMode</feature>
    <feature value="prohibited">#clockMode-gps</feature>
    <feature value="prohibited">#clockMode-local</feature>
    <feature value="prohibited">#clockMode-utc</feature>
    <feature value="prohibited">#dropMode</feature>
    <feature value="prohibited">#dropMode-dropNTSC</feature>
    <feature value="prohibited">#dropMode-dropPAL</feature>
    <feature value="prohibited">#dropMode-nonDrop</feature>
    <feature value="prohibited">#markerMode</feature>
    <feature value="prohibited">#markerMode-continuous</feature>
    <feature value="prohibited">#markerMode-discontinuous</feature>
    <feature value="prohibited">#subFrameRate</feature>
    <feature value="prohibited">#time-clock-with-frames</feature>
    <feature value="prohibited">#time-wall-clock</feature>
    <feature value="prohibited">#timeBase-clock</feature>
    <feature value="prohibited">#timeBase-smpte</feature>
    <feature value="prohibited">#timeContainer</feature>
  </features>
  <extensions xml:base="http://www.w3.org/ns/ttml/profile/dapt/extension/">
    <!-- required (mandatory) extension support -->
    <extension value="required">#contentProfiles-root</extension>
    <extension value="required">#represents-root</extension>
    <extension value="required">#scriptType-root</extension>
    <extension value="required">#serialization</extension>
    <extension value="required">#xmlId-div</extension>
    <extension value="required">#xmlLang-root</extension>
    <!-- optional (voluntary) extension support -->
    <extension value="optional">#agent</extension>
    <extension value="optional">#onScreen</extension>
    <extension value="optional">#textLanguageSource</extension>
    <!-- prohibited extension support -->
    <extension value="prohibited">#profile-root</extension>
    <extension value="prohibited">#source-data</extension>
    <extension value="prohibited">#xmlLang-audio-nonMatching</extension>
</extensions>
</profile>

E.3 DAPT Processor Profile

The DAPT Processor Profile expresses the processing requirements of DAPT Scripts using the profile mechanism of [TTML2]. A processor that supports the required features and extensions of the DAPT Processor Profile can, minimally, process all permitted features within a DAPT Document.

There is no requirement to include the DAPT Processor Profile within a Document Instance.

<?xml version="1.0" encoding="utf-8"?>
<!-- this file defines the "dapt-processor" profile of ttml -->
<profile xmlns="http://www.w3.org/ns/ttml#parameter"
  designator="http://www.w3.org/ns/ttml/profile/dapt1.0/processor"
  combine="mostRestrictive"
  type="processor">
  <features xml:base="http://www.w3.org/ns/ttml/feature/">
    <!-- required (mandatory) feature support -->
    <feature value="required">#animate-fill</feature>
    <feature value="required">#animate-minimal</feature>
    <feature value="required">#audio</feature>
    <feature value="required">#audio-description</feature>
    <feature value="required">#audio-speech</feature>
    <feature value="required">#bidi</feature>
    <feature value="required" extends="#bidi">#bidi-version-2</feature>
    <feature value="required">#chunk</feature>
    <feature value="required">#content</feature>
    <feature value="required">#contentProfiles</feature>
    <feature value="required">#core</feature>
    <feature value="required">#data</feature>
    <feature value="required">#direction</feature>
    <feature value="required">#embedded-audio</feature>
    <feature value="required">#embedded-data</feature>
    <feature value="required">#frameRate</feature>
    <feature value="required">#frameRateMultiplier</feature>
    <feature value="required">#gain</feature>
    <feature value="required">#metadata</feature>
    <feature value="required">#metadata-item</feature>
    <feature value="required" extends="#metadata">#metadata-version-2</feature>
    <feature value="required">#pan</feature>
    <feature value="required">#pitch</feature>
    <feature value="required">#presentation-audio</feature>
    <feature value="required">#resources</feature>
    <feature value="required" extends="#animation">#set</feature>
    <feature value="required">#set-fill</feature>
    <feature value="required">#set-multiple-styles</feature>
    <feature value="required">#source</feature>
    <feature value="required">#speak</feature>
    <feature value="required">#speech</feature>
    <feature value="required">#structure</feature>
    <feature value="required">#styling</feature>
    <feature value="required">#styling-chained</feature>
    <feature value="required">#styling-inheritance-content</feature>
    <feature value="required">#styling-inline</feature>
    <feature value="required">#styling-referential</feature>
    <feature value="required">#tickRate</feature>
    <feature value="required">#time-clock</feature>
    <feature value="required">#time-offset</feature>
    <feature value="required">#time-offset-with-frames</feature>
    <feature value="required">#time-offset-with-ticks</feature>
    <feature value="required">#timeBase-media</feature>
    <feature value="required">#timing</feature>
    <feature value="required">#transformation</feature>
    <feature value="required">#unicodeBidi</feature>
    <feature value="required">#unicodeBidi-isolate</feature>
    <feature value="required" extends="#unicodeBidi">#unicodeBidi-version-2</feature>
    <feature value="required">#xlink</feature>
    <!-- optional (voluntary) feature support -->
    <feature value="optional">#animation-out-of-line</feature>
    <feature value="optional">#clockMode</feature>
    <feature value="optional">#clockMode-gps</feature>
    <feature value="optional">#clockMode-local</feature>
    <feature value="optional">#clockMode-utc</feature>
    <feature value="optional">#contentProfiles-combined</feature>
    <feature value="optional">#dropMode</feature>
    <feature value="optional">#dropMode-dropNTSC</feature>
    <feature value="optional">#dropMode-dropPAL</feature>
    <feature value="optional">#dropMode-nonDrop</feature>
    <feature value="optional">#markerMode</feature>
    <feature value="optional">#markerMode-continuous</feature>
    <feature value="optional">#markerMode-discontinuous</feature>
    <feature value="optional">#permitFeatureNarrowing</feature>
    <feature value="optional">#permitFeatureWidening</feature>
    <feature value="optional">#processorProfiles</feature>
    <feature value="optional">#processorProfiles-combined</feature>
    <feature value="optional">#subFrameRate</feature>
    <feature value="optional">#time-clock-with-frames</feature>
    <feature value="optional">#time-wall-clock</feature>
    <feature value="optional">#timeBase-clock</feature>
    <feature value="optional">#timeBase-smpte</feature>
    <feature value="optional">#timeContainer</feature>
  </features>
  <extensions xml:base="http://www.w3.org/ns/ttml/profile/dapt/extension/">
    <!-- required (mandatory) extension support -->
    <extension value="required">#agent</extension>
    <extension value="required">#contentProfiles-root</extension>
    <extension value="required">#onScreen</extension>
    <extension value="required">#represents-root</extension>
    <extension value="required">#scriptType-root</extension>
    <extension value="required">#serialization</extension>
    <extension value="required">#textLanguageSource</extension>
    <extension value="required">#xmlId-div</extension>
    <extension value="required">#xmlLang-root</extension>
    <!-- optional (voluntary) extension support -->
    <extension value="optional">#profile-root</extension>
    <extension value="optional">#source-data</extension>
    <extension value="optional">#xmlLang-audio-nonMatching</extension>
</extensions>
</profile>

F. Extensions

F.1 General

The following sections define extension designations, expressed as relative URIs (fragment identifiers) relative to the DAPT Extension Namespace base URI. These extension designations are used in E. Profiles to describe the normative provisions of DAPT that are not expressed by [TTML2] profile features.

F.2 #agent

A transformation processor supports the #agent extension if it recognizes and is capable of transforming values of the following elements and attributes on the <ttm:agent> element:

and if it recognizes and is capable of transforming each of the following value combinations:

A presentation processor supports the #agent extension if it implements presentation semantic support of the above listed elements, attributes and value combinations.

F.3 #contentProfiles-root

A transformation processor supports the #contentProfiles-root extension if it recognizes and is capable of transforming values of the ttp:contentProfiles attribute on the <tt> element.

A presentation processor supports the #contentProfiles-root extension if it implements presentation semantic support of the ttp:contentProfiles attribute on the <tt> element.

Note

F.4 #onScreen

A transformation processor supports the #onScreen extension if it recognizes and is capable of transforming values of the daptm:onScreen attribute on the <div> element.

A presentation processor supports the #onScreen extension if it implements presentation semantic support of the daptm:onScreen attribute on the <div> element.

F.5 #profile-root

A transformation processor supports the #profile-root extension if it recognizes and is capable of transforming values of the ttp:profile attribute on the <tt> element.

A presentation processor supports the #profile-root extension if it implements presentation semantic support of the ttp:profile attribute on the <tt> element.

F.6 #represents-root

A transformation processor supports the #represents-root extension if it recognizes and is capable of transforming values of the daptm:represents attribute on the <tt> element.

A presentation processor supports the #represents-root extension if it implements presentation semantic support of the daptm:represents attribute on the <tt> element.

An example of a transformation processor that supports this extension is a validating processor that reports an error if the extension is required by a content profile but the Document Instance claiming conformance to that profile either does not have a daptm:represents attribute on the <tt> element or has one whose value is not conformant with the requirements defined herein.

F.7 #scriptType-root

A transformation processor supports the #scriptType-root extension if it recognizes and is capable of transforming values of the daptm:scriptType attribute on the <tt> element.

A presentation processor supports the #scriptType-root extension if it implements presentation semantic support of the daptm:scriptType attribute on the <tt> element.

An example of a transformation processor that supports this extension is a validating processor that provides appropriate feedback, for example warnings, when the SHOULD requirements defined in 4.1.3 Script Type for a DAPT Document's daptm:scriptType are not met, and that reports an error if the extension is required by a content profile but the Document Instance claiming conformance to that profile either does not have a daptm:scriptType attribute on the <tt> element or has one whose value is not defined herein.

F.8 #serialization

A serialized document that is valid with respect to the #serialization extension is an XML 1.0 [xml] document encoded using UTF-8 character encoding as specified in [UNICODE], that contains no entity declarations and no entity references other than to predefined entities.

A transformation processor or a presentation processor supports the #serialization extension if it can read a serialized document as defined above.

A transformation processor that writes documents supports the #serialization extension if it can write a serialized document as defined above.

F.9 #source-data

A transformation processor supports the #source-data extension if it recognizes and is capable of transforming values of the <source> element child of a <data> element.

A presentation processor supports the #source-data extension if it implements presentation semantic support of the <source> element child of a <data> element.

F.10 #textLanguageSource

A transformation processor supports the #textLanguageSource extension if it recognizes and is capable of transforming values of the daptm:langSrc attribute.

A presentation processor supports the #textLanguageSource extension if it implements presentation semantic support of the daptm:langSrc attribute.

F.11 #xmlId-div

A transformation processor supports the #xmlId-div extension if it recognizes and is capable of transforming values of the xml:id attribute on the <div> element.

A presentation processor supports the #xmlId-div extension if it implements presentation semantic support of the xml:id attribute on the <div> element.

F.12 #xmlLang-audio-nonMatching

A transformation processor supports the #xmlLang-audio-nonMatching extension if it recognizes and is capable of transforming values of the xml:lang attribute on the <audio> element that differ from the computed value of the same attribute of its parent element or any of its descendant or referenced <source> or <data> elements, known as non-matching values.

A presentation processor supports the #xmlLang-audio-nonMatching extension if it implements presentation semantic support of such non-matching xml:lang attribute values.

F.13 #xmlLang-root

A transformation processor supports the #xmlLang-root extension if it recognizes and is capable of transforming values of the xml:lang attribute on the <tt> element and the additional semantics specified in 4.1.2 Default Language.

A presentation processor supports the #xmlLang-root extension if it implements presentation semantic support of the xml:lang attribute on the <tt> element and the additional semantics specified in 4.1.2 Default Language.

G. Registry Section

G.1 Registry Definition

This section specifies the registry definition, consisting of the custodianship, change process and the core requirements of the registry tables defined in this document.

G.1.1 Custodianship

The custodian of this w3c registry is the Timed Text Working Group (TTWG). If the TTWG is unable to fulfil the role of custodian, for example if it has been closed, the custodian in lieu is the W3C Team.

G.1.2 Change Process

G.1.2.1 Requesting a change

Changes to this W3C Registry MUST be requested (the change request) using any one of the following options:

The change request MUST include enough information for the custodian to be able to identify all of:

The proposer of the change MAY open a pull request (or equivalent) on the version control system, with that pull request containing the proposed changes. If a pull request is opened then a corresponding issue MUST also be opened and the pull request MUST be linked to that issue.

Note
G.1.2.2 Change request assessment process

The process for assessing a change request depends on the custodian.

G.1.2.2.1 Custodian is TTWG

If the custodian is the TTWG:

  • If the change proposer did not open a pull request on the version control system, then assessment is paused until a TTWG member has opened such a pull request, which MUST represent the requested changes and MUST be linked to a related issue.
  • The TTWG follows its Decision Policy to review the proposal in the pull request.
  • At the end of the Decision Review Period if a TTWG Chair declares that there is consensus to approve, the change request is approved.
  • In the absence of consensus to approve the expectation is that a discussion will happen, to include the change requester. The result of this discussion can be any one of:
    1. the change request is abandoned;
    2. the change request is modified for another review;
    3. if the discussion resolves the objections, and a TTWG Chair declares consensus to approve, the change request can be approved.

An approved change request is enacted by merging its related pull request into the version control system and publishing the updated version of this document.

G.1.2.2.2 Custodian is the W3C Team

If the custodian is the W3C Team, the Team MUST seek wide review of the change request and offer a review period of at least 4 weeks, before assessing from the responses received if there is consensus amongst the respondents.

The Team MAY require a pull request on the version control system to be opened as the basis of the review.

If there is such consensus, the Team MUST make the proposed changes.

G.1.3 Registry Table Constraints

This section defines constraints on the registry tables defined in this document. Each registry table consists of a set of registry entries. Each registry table has an associated registry table definition in G.2 Registry Table Definitions, which lists the fields present in each registry entry.

G.1.3.1 Registry Entries

Each registry entry has a status, a unique key, and if appropriate, other fields, for example any notes, a description, or a reference to some other defining entity.

The registry table definition MUST define the fields and the key to be used in each registry entry.

Note
G.1.3.1.1 Status

The registry entry status field reflects the maturity of that entry. Permitted values are:

Provisional
Final
Deprecated

No other values are permitted.

G.1.3.1.1.1 Provisional

Registry entries with a status of Provisional MAY be changed or deleted. Their status may be changed to Final or Deprecated.

Registry entry keys in Provisional entries that were later deleted MAY be reused.

Newly created registry entries SHOULD have status Provisional.

G.1.3.1.1.2 Final

Registry entries with a status of Final MUST NOT be deleted or changed. Their status MAY be changed to Deprecated.

Registry entry keys in Final entries MUST NOT be reused.

Newly created registry entries MAY have status Final.

G.1.3.1.1.3 Deprecated

Registry entries with a status of Deprecated MUST NOT be deleted or changed. Their status MAY be changed to Final unless that would result in a duplicate key within the set of entries whose status is either Provisional or Final.

Registry entry keys in Deprecated entries that were previously Provisional and never Final MAY be reused.

Registry entry keys in Deprecated entries that were previously Final MUST NOT be reused.

Newly created registry entries MUST NOT have status Deprecated.

G.2 Registry Table Definitions

This section defines registry tables and locates their registry entries.

G.2.1 daptm:descType registry table definition

The registry table for daptm:descType defines a set of values that can be used in the daptm:descType attribute.

The key is the "daptm:descType" field. The "description" field describes the intended purpose of each value.

The registry entries for this registry table are located in 4.7 Script Event Description.

G.2.2 daptm:eventType registry table definition

The registry table for daptm:eventType defines a set of values that can be used in the daptm:eventType attribute.

The key is the "daptm:eventType" field. The "description" field describes the intended purpose of each value.

The registry entries for this registry table are located in 4.8 Script Event Type.

G.2.3 <content-descriptor> registry table definition

The registry table for <content-descriptor> defines a set of values that can be used in the daptm:represents attribute.

The key is the "<content-descriptor>" field. The "Description" field describes the type of media content represented by each value. The "Used in" field describes the type of script in which the content type described are commonly found.

The registry entries for this registry table are located in 4.1.1 Represents.

H. Acknowledgments

The editors would like to thank XXX for their contributions to this specification.

I. References

I.1 Normative references

[BCP47]
Tags for Identifying Languages. A. Phillips, Ed.; M. Davis, Ed.. IETF. September 2009. Best Current Practice. URL: https://www.rfc-editor.org/rfc/rfc5646
[EBU-R37]
EBU Recommendation R37-2007. The relative timing of the sound and vision components of a television signal. EBU/UER. February 2007. URL: https://tech.ebu.ch/publications/r037
[MIME-TYPES]
Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types. N. Freed; N. Borenstein. IETF. November 1996. Draft Standard. URL: https://www.rfc-editor.org/rfc/rfc2046
[namespaceState]
The Disposition of Names in an XML Namespace. Norman Walsh. W3C. 29 March 2006. W3C Working Draft. URL: https://www.w3.org/TR/namespaceState/
[RFC2119]
Key words for use in RFCs to Indicate Requirement Levels. S. Bradner. IETF. March 1997. Best Current Practice. URL: https://www.rfc-editor.org/rfc/rfc2119
[RFC8174]
Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words. B. Leiba. IETF. May 2017. Best Current Practice. URL: https://www.rfc-editor.org/rfc/rfc8174
[TTML-IMSC1.2]
TTML Profiles for Internet Media Subtitles and Captions 1.2. Pierre-Anthony Lemieux. W3C. 4 August 2020. W3C Recommendation. URL: https://www.w3.org/TR/ttml-imsc1.2/
[TTML1]
Timed Text Markup Language 1 (TTML1) (Third Edition). Glenn Adams; Pierre-Anthony Lemieux. W3C. 8 November 2018. W3C Recommendation. URL: https://www.w3.org/TR/ttml1/
[TTML2]
Timed Text Markup Language 2 (TTML2) (2nd Edition). Glenn Adams; Cyril Concolato. W3C. 9 March 2021. W3C Candidate Recommendation. URL: https://www.w3.org/TR/ttml2/
[UNICODE]
The Unicode Standard. Unicode Consortium. URL: https://www.unicode.org/versions/latest/
[w3c-process]
W3C Process Document. Elika J. Etemad (fantasai); Florian Rivoal. W3C. 2 November 2021. URL: https://www.w3.org/Consortium/Process/
[XML]
Extensible Markup Language (XML) 1.0 (Fifth Edition). Tim Bray; Jean Paoli; Michael Sperberg-McQueen; Eve Maler; François Yergeau et al. W3C. 26 November 2008. W3C Recommendation. URL: https://www.w3.org/TR/xml/
[xml-names]
Namespaces in XML 1.0 (Third Edition). Tim Bray; Dave Hollander; Andrew Layman; Richard Tobin; Henry Thompson et al. W3C. 8 December 2009. W3C Recommendation. URL: https://www.w3.org/TR/xml-names/
[XPath]
XML Path Language (XPath) Version 1.0. James Clark; Steven DeRose. W3C. 16 November 1999. W3C Recommendation. URL: https://www.w3.org/TR/xpath-10/

I.2 Informative references

[BBC-WHP051]
BBC R&D White Paper WHP 051. Audio Description: what it is and how it works. N.E. Tanton, T. Ware and M. Armstrong. October 2002 (revised July 2004). URL: http://www.bbc.co.uk/rd/publications/whitepaper051
[DAPT-REQS]
DAPT Requirements. Cyril Concolato; Nigel Megitt. W3C. 12 October 2022. W3C Working Group Note. URL: https://www.w3.org/TR/dapt-reqs/
[EBU-TT-3390]
EBU-TT Part M, Metadata Definitions. EBU/UER. May 2017. URL: https://tech.ebu.ch/publications/tech3390
[I18N-INLINE-BIDI]
Inline markup and bidirectional text in HTML. W3C. 2021-06-25. URL: https://www.w3.org/International/articles/inline-bidi-markup
[media-accessibility-reqs]
Media Accessibility User Requirements. Shane McCarron; Michael Cooper; Mark Sadecki. W3C. 3 December 2015. W3C Working Group Note. URL: https://www.w3.org/TR/media-accessibility-reqs/
[SSML]
Speech Synthesis Markup Language (SSML) Version 1.1. Daniel Burnett; Zhi Wei Shuang. W3C. 7 September 2010. W3C Recommendation. URL: https://www.w3.org/TR/speech-synthesis11/
[uml]
OMG Unified Modeling Language. Open Management Group. OMG. 1 March 2015. Normative. URL: http://www.omg.org/spec/UML/
[WCAG22]
Web Content Accessibility Guidelines (WCAG) 2.2. Michael Cooper; Andrew Kirkpatrick; Alastair Campbell; Rachael Bradley Montgomery; Charles Adams. W3C. 5 October 2023. W3C Recommendation. URL: https://www.w3.org/TR/WCAG22/
[webaudio]
Web Audio API. Paul Adenot; Hongchan Choi. W3C. 17 June 2021. W3C Recommendation. URL: https://www.w3.org/TR/webaudio/