WebVTT, is a file format that allows to mark up external text tracks.
The primary purpose of WebVTT files is to add subtitles to a <video>
.
It is used for displaying timed text tracks with the HTML5’s <track>
element.
There are others text track formats defined by the W3C, which are finalized as RECs
Both formats are XML.
<track>
element. Long discussion for having one or two Timed Text WGs :
Using the <track>
element from HTML5, we can
associate information such as
using the metadata (<track>
kind
) attribute.
text/vtt
.
A "webVTT" string:
WEBVTT - This file has cues.(a blank line follows)
comment with NOTE string:
NOTE
This is a comment
A cue is a single subtitle block that has a single start time, end time,
and textual payload.
An optional cue identifier followed by a newline.
A cue consists of five components:
[idstring]
[hh:]mm:ss.msmsms -->
[hh:]mm:ss.msmsms [cue settings]
Text string
Example : consists of the header, a blank line, and then 3 cues separated by blank lines.
WEBVTT
1
00:00:01.000 --> 00:00:10.00
0This is the first caption, displaying from 1-10 seconds
2
00:00:12.739 --> 00:00:24.074
This is the second
caption
.NOTE This line may not translate well.
3
- Title C
ue with settings
00:00:34.159 --> 00:00:35.743
line:0 position:20% size:60% align:start
Third
caption
The identifier is a name that identifies the cue.
It can be used to reference the cue from a script. It must not contain a
newline and cannot contain the string "-->"
. It must end
with a single newline. They do not have to be unique, although it is
common to number them (e.g. 1, 2, 3, ...).
A cue timing indicates when the cue is shown.
It has a start and end time which are represented by timestamps.
The end time must be greater than the start time, and the start time must
be greater than or equal to all previous start times. Cues may have
overlapping timings.
If the WebVTT file is being used for chapters (<track>
kind
is chapters
) then the file cannot have overlapping timings.
Each cue timing contains five components:
-->"
(the escape sequence
"&" for ampersand and ">" for greater-than) The timestamps must be in one of two formats:
mm:ss.ttt
hh:mm:ss.ttt
00:00:00.000 --> 00:00:10.000
00:00:05.000 --> 00:01:00.000
00:00:30.000 --> 00:00:50.000
00:00:00.000 --> 00:00:10.000
00:00:10.000 --> 00:01:00.581
00:01:00.581 --> 00:02:00.100
Table 1 - vertical values | |
---|---|
vertical:rl |
writing direction is right to left |
vertical:lr |
writing direction is left to right |
Table 2 - line examples | |||
---|---|---|---|
vertical omitted |
vertical:rl |
vertical:lr |
|
line:0 |
top | right | left |
line:-1 |
bottom | left | right |
line:0% |
top | right | left |
line:100% |
bottom | left | right |
Table 3 - position examples | |||
---|---|---|---|
vertical omitted |
vertical:rl |
vertical:lr |
|
position:0% |
left | top | top |
position:100% |
right | bottom | bottom |
Table 4 - size examples | |||
---|---|---|---|
vertical omitted |
vertical:rl |
vertical:lr |
|
size:100% |
full width | full height | full height |
size:50% |
half width | half height | half height |
Table 5 - align values | |||
---|---|---|---|
vertical omitted |
vertical:rl |
vertical:lr |
|
align:start |
left | top | top |
align:middle |
centred horizontally | centred vertically | centred vertically |
|
Setting | Value(s) | Function |
---|---|---|
vertical | rl || lr | Aligns text vertically to the left lr or right rl
(e.g. for Japanese subtitles) |
line | [-][0 or more] | References a particular line number that the cue is to be displayed on. Line numbers are based on the size of the first line of the cue. A negative number counts from the bottom of the frame, positive numbers from the top |
[0-100]% | Percentage value indicating the position relative to the top of the frame | |
position | [0-100]% | Percentage value indicating the position relative to the edge of the frame where the text begins (e.g. the left edge in English) |
size | [0-100]% | Percentage value indicating the size of the cue box. The value is given as a percentage of the width of the frame |
align | start || middle || end | Specifies the alignment of the text within the cue. The keywords are relative to the text direction |
Note: if no cue settings are set, the positioning default to the middle, at the bottom of the frame.
The payload is where the main information or content is located.
the payload contains the subtitles to be displayed.
The payload text may contain newlines but it cannot contain a blank line,
which is equivalent to two consecutive newlines. A blank line signifies
the end of a cue.
In addition to all this, you can use “WebVTT cue components” to add
further information to the actual cue text itself.
These components are similar to HTML elements, and can be used to add
semantics and styling to the actual text strings.
A list of the different components available is given below:
Value | Meaning |
---|---|
c | Specifies a CSS class, which follows the c , e.g. <c.className>Cue
text</c> |
i | Specifies italic text |
b | Specifies bold text |
u | Specifies underlined text |
ruby | Specifies something similar to HTML5’s <ruby>
element. Within this component, one or more occurrences of a <rt>
element are allowed. (The
HTML5 <ruby> element in words of one syllable
or less) |
v | Specifies a voice label (if provided) that the cue text is being
“spoken in”, e.g. <v Ian>This is useful for adding
subtitles</v> . Note that the voice label won’t be
displayed. It’s just there as a styling hook. |
Using Voice label
This example uses a voice label for the cue text, Emo.
In addition, a CSS class of question
is specified, which can
then be used for styling purposes.
A class such as this can be styled in the usual way via CSS attached or
defined in the calling HTML page.
Cue-CSS and Voice
00:00:52.000 --> 00:00:54.000 align:start size:15%
<v Emo>I don’t
<i>
think
so. <c.question>You?</c></v>
</i>
Note that to style cue text with CSS, you need to use a special pseudo-element selector, for example:
::cue(v[voice="Emo"]) {color:blue}
::cue(i) { font-style: italic }
::cue(.question) { font-size: 2em }
The following properties apply to the '::cue' pseudo-element with no argument; other properties set on the pseudo-element must be ignored:
Cue-paint-on captions
00:00:52.000 --> 00:00:54.000 <c>I
don’t think so.</c>
<00:00:53.500><c>You?</c>
This will cause all the text to be displayed at the same time, but do
note that in supporting browsers you will be able to use the :past
and :future
pseudo classes to style text differently
depending if it is in the future or past.
For example:
::cue(c:past) {color:yellow}
::cue(c:future)
{text-shadow: black 0 0 1px;}
00:00:15.042 --> 00:00:18.042 D:verticalalign:
start <ruby>左<rt>ひだり</rt></ruby>に<ruby> 見<rt>み</rt></ruby>えるのは… 00:00:20.417 --> 00:00:21.917 D:verticalalign:
start ..…首刈り機
Example using CSS style defined in header using ::cue
pseudo-element.
::cue (c.dream) {color:#ffff}
WEBVTT
00:00.000 --> 00:14.999
Elephant's <c.dream>Dream</c>
NOTE CSS class, styled with ::cue
pseudo-element
00:15.000 --> 00:18.000 align:end line:10%
At the <i>left</i> we can <b>see</b>...
NOTE Relative and percentage based positioning
00:18.167 --> 00:22.000
At the right <00:20.000>we can see the...
NOTE Karaoke style split line
Example using region within the viewport
WEBVTT Region: id=fred width=50% lines=3 regionanchor=0%,100% viewportanchor=10%,90% scroll=up Region: id=bill width=50% lines=3 regionanchor=100%,100% viewportanchor=90%,90% scroll=up 00:00:00.000 --> 00:00:20.000 region:fred align:left Hi, my name is Fred 00:00:02.500 --> 00:00:22.500 region:bill align:right Hi, I'm Bill
<track>
tags with .vtt files for HTML5 videos
already. http://www.w3.org/community/texttracks/
<video controls>
<source src="elephants-dream.mp4" type="video/mp4">
<source src="elephants-dream.webm" type="video/webm">
<track label="English subtitles" kind="subtitles" srclang="en"
src="elephants-dream-subtitles-en.vtt" default>
<track label="Deutsche Untertitel" kind="subtitles" srclang="de"
src="elephants-dream-subtitles-de.vtt">
<track label="English chapters" kind="chapters" srclang="en"
src="elephants-dream-chapters-en.vtt">
<track label="English captions" kind="
captions
" srclang="en"src="elephants-dream-
-en.vtt">
captions
<track label="English
descriptions" kind="
descriptions
" srclang="en"src="elephants-dream-
-en.vtt">
descriptions
</video>
A nice demo using styling with class and Timestamps in cue text(karaoke style)
http://www.leanbackplayer.com/test/webvtt.html
Resources