This group will work on text tracks for video on the Web, applied to captioning, subtitling and other purposes. This group plans to work initially on:
1) Documenting a semantic model underlying the caption formats in use, notably TTML, CEA 608/708, EBU STL, and WebVTT.
2) Creating a community specification for WebVTT.
3) Defining the mappings between WebVTT and some selected formats, including at least TTML (W3C/SMPTE), and CEA 608/708.
4) Creating web developer reference and tutorial material, including worked examples.
5) Creating a test suite and/or tools.
A possible transition to REC-track for some of these document(s) is envisaged and that possibility will be used to guide the work and procedures.
The group may produce recommendations for work in other groups, such as CSS, HTML5, and TTWG.
The Timed Text WG charter has been updated in March 2014. It includes delivery of a WebVTT 1.0 specification as a W3C Recommendation.
This means, a snapshot of the WebVTT specification has been taken, FSA commitments have been made, and the specification was submitted to the Timed Text WG as input to the standardization process. All features in this snapshot will now be tested across the different implementations, in particular across browsers. Those features that have been implemented interoperably will be taken to Recommendation over the next year.
This Community Group will continue to work on the WebVTT specification to address new features and collaborate with the Timed Text WG on any bugs found with the submitted specification.
As a result of the 105th MPEG meeting (see Press Release), MPEG has concluded its study of the carriage of Timed Text in the ISO Base Media File Format (MP4). The study resulted in draft standards for the carriage of WebVTT and TTML content that have reached Final Draft stage (FDAM 2 for 14996-12/15444-12 and FDIS for 14496-30). They are considered complete and are submitted to National Bodies for final vote. This post gives an overview of these draft documents.
MP4 basics and timed-text related specifications
An MP4 file is logically made of tracks. An MP4 track is a logical structure organized into samples and sample descriptions. Samples carry information that is valid from a given time and for a given a duration. Samples carry data that is continuous (no gap in time between samples) and non-overlapping (the end of a sample is the start of the next sample). This has good properties, and in particular allows random access into the track. A sample description carries information that is valid for the duration of several samples, typically for the whole track. This is an example of a musica track.
The amendment to Part 12 covers the basic syntax and semantics for a set of new text track types for a broad range of timed text formats. In particular, two track types have been defined: the ‘text’ type for track content that results in text rendering only; and the ‘subt’ type for track content that may result in text and graphics rendering.
Part 30 provides specific guidance for two popular timed text format technologies defined by W3C – Timed Text Markup Language (TTML) and Web Video Text Tracks (WebVTT) enabling use of those formats in context such as MPEG-DASH or HTML5 Media Source Extensions. Continue reading →
WebVTT is quite a powerful format. It has been developed to transport timed data chunks, including captions, subtitles, video descriptions, chapters, and in fact any chunk of data that cues up with a time segment of a media element.
As can be expected, support of WebVTT in media transport formats is taking a bit longer to specify and implement. However, we are already seeing specifications for these formats, even though at the time of writing of this article, as far as we know, there are no implementations of any of these specifications yet.
WebVTT in WebM was the first specification for encapsulating WebVTT in a media container.
WebVTT in Apple’s HTTP Live Streaming has been published just recently and details how to use WebVTT for captions with m3u8. While it right now covers plain cues only and no CSS styling, it is the first format to show how HTTP adaptive streaming formats can make use of WebVTT.
Other formats where we heard that work is happening, but no public specifications have been made available yet include WebVTT in MP4 containers, WebVTT in Ogg, and WebVTT in MPEG DASH.
Apple released <track> element support in Safari 6 in July 2012. Safari also only supports WebVTT. Like Google Chrome, it builds on the implementation in WebKit, which is trying to be fairly feature-complete for WebVTT.
Firefox has some work in progress supporting WebVTT, but there aren’t any official builds with support available yet.
In any case: there is still lots to do. Several of the browser implementations don’t support the full feature set yet. Where they do, they may not be interoperable since the spec may have been interpreted slightly differently, or may have changed since the implementation.
We’re starting to create a test suite with example WebVTT files that test different features. The simple “show text at certain time interval” support is certainly interoperable, but some of the more advanced layout features may not be.
Just joined this group. Great to see some faces from the “old days”.
I’m especially interested in using text tracks to support lectures where technical text (especially programming code and markup) is being discussed and illustrated, but I’m also interested in a broader interpretation of ‘text’, for example music scores, mathematical equations, scientific calculations and other non alpha-numerical glyph-based media, which might usefully be synchonised with video.