Standardized Timed-text Format

March 21, 2002

Why a Standardized Timed-text Format?

On the Web, there is no standard method for displaying text which is synchronized with other elements, such as video and audio. The three most popular multimedia players-- Apple's QuickTime Player, Microsoft's Windows Media Player and RealNetworks' RealPlayer-- support only their own proprietary text formats (QText, SAMI and RealText, respectively). As a result, multimedia authors must write synchronized text files in multiple formats if they wish to support more than one player. A standardized timed-text format would eliminate this duplication of work. It would also simplify the creation and distribution of synchronized text for use with a multitude of devices, both software and hardware, such as multimedia players, caption encoders and decoders (EIA-608, 708 and TeleText, for example), character generators, LED displays and other text-display devices.

Common uses for a standardized timed-text format include the following:

  1. closed captions and subtitles (on the Web, on television and in movie theaters)
  2. karaoke
  3. credit rolls
  4. ticker-tape displays (or crawls)
  5. text overlay
  6. hyperlinks and other interactivity

Standardized Timed-text Format Requirements


A timed-text format must or should...

  1. Be simple to author and easy to learn.
  2. Have a valid XML representation.
  3. Be streamable.
  4. Be cross-platform.
  5. Allow extensibility.
  6. Support streaming real-time captions. Users should be able to tune in to the text presentation at any time after it has begun.
  7. Allow for parallel languages in different documents or within the same document (e.g., via the <switch> element)
  8. Allow the language of the text to be identified using xml:lang.
  9. Support mixed-language text.
  10. Be useable in all character sets.
  11. Have a default UNICODE font.
  12. Allow clean integration with sign-language captions.
  13. Allow hyperlinks via the HTML "a" tag, XHTML or other flexible mechanism.
  14. Be searchable.
  15. Contain a timed-text version in each timed-text file and live stream.
  16. Use markup to clearly distinguish one speaker from another. This could be accomplished by a) using simple placement commands (<center>, <left>, <right>, etc.); or b) creating a persona for text which is spoken by each speaker using speaker="IDREF" attribute.
  17. Allow the creation of collated transcripts which contain, and differentiate via markup, captions and audio descriptions.
  18. Allow motion through the use of the SMIL animate element or other method.
  19. Use SVG, MathML, XHTML or other language for complex font displays (such as math equations).
  20. Allow the user to navigate through discrete timed media via SMIL interaction constructs.
  21. Allow for long-form presentation (e.g., it should support captions or subtitles for full-length movies or other long presentations).
  22. Adopt SMIL 2.0 as a base language. Also consider using other W3C recommendations as a base, including (but not limited to) XML 1.0, CSS2, SVG 1.1).
  23. Be no less functional than EIA-708 and other appropriate international standards.


A timed-text format must or should...

  1. Provide a means of giving richness or style to text.
  2. Support the display of bi-directional characters.
  3. Allow ruby markup.
  4. Allow text in different languages to be appropriately styled.
  5. Permit transparent overlay.
  6. Permit text highlighting.
  7. Allow for different display options (pop-on, roll-up, paint-on, crawl, etc.).
  8. Support unique symbols, such as the musical note or the generic closed-caption symbol ("CC" in a box).
  9. Permit user override of display.
  10. Permit unlimited positioning of text.
  11. Be able to display multiple captions simultaneously (for example, when more than one person is speaking at once).
  12. Allow other ways to display text; for example, via text balloons.


A timed-text format must or should...

  1. Allow text to appear and disappear over time.
  2. Permit the display of no text-- that is, allow for erasure of text when it is not necessary.
  3. Keep text and timing information together.
  4. Define text and timing markup in two separate modules in the specification.


  1. a "font" element. (Use SVG or other technologies instead.)

Please send corrections and additions to the timed public list at public-tt@w3.org.