WebVTT
The Web Video Text Tracks Format

W3C First Public Working Draft
13 November 2014

http://www.w3.org/TR/2014/WD-webvtt1-20141113/


Introduction

WebVTT, is a file format that allows to mark up external text tracks.
The primary purpose of WebVTT files is to add subtitles to a <video>.
It is used for displaying timed text tracks with the HTML5’s <track> element.

Others W3C Text Track Formats

There are others text track formats defined by the W3C, which are finalized as RECs

Both formats are XML.

Origin and Specification

Long discussion for having one or two Timed Text WGs :

Timed text tracks

Using the <track> element from HTML5, we can associate information such as

using the metadata (<track> kind ) attribute.

WebVTT Format


WebVTT file structure

WebVTT Body

A "webVTT" string:
WEBVTT - This file has cues.(a blank line follows)

WebVTT Comment

comment with NOTE string:
NOTE This is a comment

WebVTT Cues

A cue is a single subtitle block that has a single start time, end time, and textual payload.
An optional cue identifier followed by a newline.

A cue consists of five components:

  1. An optional cue identifier
  2. Cue timings indicating when the cue is shown.
    (It has a start and end time which are represented by timestamps).
  3. Optional cue settings to position cues at explicit positions in the video viewport.
  4. The cue payload text which contains the text or subtitles to be displayed
  5. Optional cue component to add a CSS class, voice label <v>, italic, bold,etc.
[idstring]
[hh:]
mm:ss.msmsms --> [hh:]mm:ss.msmsms [cue settings]
Text string

Example : consists of the header, a blank line, and then 3 cues separated by blank lines.

WEBVTT

1 00:00:01.000 --> 00:00:10.000 This is the first caption, displaying from 1-10 seconds

2 00:00:12.739 --> 00:00:24.074 This is the second caption.

NOTE This line may not translate well.

3 - Title Cue with settings 00:00:34.159 --> 00:00:35.743 line:0 position:20% size:60% align:start Third caption

Cue Identifier

The identifier is a name that identifies the cue.
It can be used to reference the cue from a script. It must not contain a newline and cannot contain the string "-->". It must end with a single newline. They do not have to be unique, although it is common to number them (e.g. 1, 2, 3, ...).

Cue Timings

A cue timing indicates when the cue is shown.
It has a start and end time which are represented by timestamps.
The end time must be greater than the start time, and the start time must be greater than or equal to all previous start times. Cues may have overlapping timings.

If the WebVTT file is being used for chapters (<track> kind is chapters) then the file cannot have overlapping timings.

Each cue timing contains five components:

The timestamps must be in one of two formats:

Example :  Overlapping cue timing examples
00:00:00.000 --> 00:00:10.000
00:00:05.000 --> 00:01:00.000
00:00:30.000 --> 00:00:50.000
Example :  Non-overlapping cue timing examples
00:00:00.000 --> 00:00:10.000
00:00:10.000 --> 00:01:00.581
00:01:00.581 --> 00:02:00.100

Cue Settings

These cue settings allow you to specify the position and alignment of the cue text, and the following options are available:
Setting Value(s) Function
vertical rl || lr Aligns text vertically to the left lr or right rl (e.g. for Japanese subtitles)
line [-][0 or more] References a particular line number that the cue is to be displayed on. Line numbers are based on the size of the first line of the cue. A negative number counts from the bottom of the frame, positive numbers from the top

[0-100]% Percentage value indicating the position relative to the top of the frame
position [0-100]% Percentage value indicating the position relative to the edge of the frame where the text begins (e.g. the left edge in English)
size [0-100]% Percentage value indicating the size of the cue box. The value is given as a percentage of the width of the frame
align start || middle || end Specifies the alignment of the text within the cue. The keywords are relative to the text direction

Note: if no cue settings are set, the positioning default to the middle, at the bottom of the frame.

Cue Payload

The payload is where the main information or content is located.
the payload contains the subtitles to be displayed.
The payload text may contain newlines but it cannot contain a blank line, which is equivalent to two consecutive newlines. A blank line signifies the end of a cue.

WebVTT Cue Components

In addition to all this, you can use “WebVTT cue components” to add further information to the actual cue text itself.
These components are similar to HTML elements, and can be used to add semantics and styling to the actual text strings.

A list of the different components available is given below:

Value Meaning
c Specifies a CSS class, which follows the c, e.g. <c.className>Cue text</c>
i Specifies italic text
b Specifies bold text
u Specifies underlined text
ruby Specifies something similar to HTML5’s <ruby> element. Within this component, one or more occurrences of a <rt> element are allowed. (The HTML5 <ruby> element in words of one syllable or less)
v Specifies a voice label (if provided) that the cue text is being “spoken in”, e.g. <v Ian>This is useful for adding subtitles</v>. Note that the voice label won’t be displayed. It’s just there as a styling hook.

Voice Label

Using Voice label

  1. The caption may display the voice (Emo) in addition to the caption text.
  2. The name of the voice can be read by a screenreader, possibly event using a different voice for male or female names.
  3. It offers a hook for styling so that, for example, all captions for Emo could be in blue.

This example uses a voice label for the cue text, Emo.
In addition, a CSS class of question is specified, which can then be used for styling purposes.
A class such as this can be styled in the usual way via CSS attached or defined in the calling HTML page.

Cue-CSS and Voice
00:00:52.000 --> 00:00:54.000 align:start size:15%
<v Emo>I don’t <i>think</i> so. <c.question>You?</c></v>

Note that to style cue text with CSS, you need to use a special pseudo-element selector, for example:

::cue(v[voice="Emo"]) {color:blue}
::cue(i) { font-style: italic } ::cue(.question) { font-size: 2em }

The following properties apply to the '::cue' pseudo-element with no argument; other properties set on the pseudo-element must be ignored:

Timestamps in cue text

It is also possible to add timestamps to cue text, indicating that different parts occur at different times.
An example of this is shown below:

Cue-paint-on captions
00:00:52.000 --> 00:00:54.000 <c>I don’t think so.</c> <00:00:53.500><c>You?</c>

This will cause all the text to be displayed at the same time, but do note that in supporting browsers you will be able to use the :past and :future pseudo classes to style text differently depending if it is in the future or past.
For example:

::cue(c:past) {color:yellow} 
::cue(c:future) {text-shadow: black 0 0 1px;}

Ruby in cue text

00:00:15.042 --> 00:00:18.042 D:vertical align:
start
<ruby>左<rt>ひだり</rt></ruby>に<ruby> 見<rt>み</rt></ruby>えるのは…
 
00:00:20.417 --> 00:00:21.917 D:vertical align:
start
..…首刈り機

Example of WebVTT file

Example using CSS style defined in header using  ::cue pseudo-element.

::cue (c.dream) {color:#ffff}
WEBVTT


00:00.000 --> 00:14.999
Elephant's <c.dream>Dream</c>

NOTE CSS class, styled with ::cue pseudo-element

00:15.000 --> 00:18.000 align:end line:10%
At the <i>left</i> we can <b>see</b>...

NOTE Relative and percentage based positioning

00:18.167 --> 00:22.000
At the right <00:20.000>we can see the...

NOTE Karaoke style split line

Example using region within the viewport


WEBVTT Region: id=fred width=50% lines=3 regionanchor=0%,100% viewportanchor=10%,90% scroll=up Region: id=bill width=50% lines=3 regionanchor=100%,100% viewportanchor=90%,90% scroll=up 00:00:00.000 --> 00:00:20.000 region:fred align:left Hi, my name is Fred 00:00:02.500 --> 00:00:22.500 region:bill align:right Hi, I'm Bill

Implementation / Support

Support for the new format is limited but growing.

http://www.w3.org/community/texttracks/

Example of WebVTT references in HTML5

 <video controls>
    <source src="elephants-dream.mp4" type="video/mp4">
    <source src="elephants-dream.webm" type="video/webm">
  <track label="English subtitles" kind="subtitles" srclang="en"
        src="elephants-dream-subtitles-en.vtt" default>
  <track label="Deutsche Untertitel" kind="subtitles" srclang="de"
        src="elephants-dream-subtitles-de.vtt">
  <track label="English chapters" kind="chapters" srclang="en"
        src="elephants-dream-chapters-en.vtt">
<track label="English captions" kind="captions" srclang="en" src="elephants-dream-captions-en.vtt">
<track label="English descriptions" kind="descriptions" srclang="en" src="elephants-dream-descriptions-en.vtt">
</video>

Demo

A nice demo using styling with class and Timestamps in cue text(karaoke style)

http://www.leanbackplayer.com/test/webvtt.html

Feedback

Resources