This Wiki page is edited by participants of the HTML Accessibility Task Force. It does not necessarily represent consensus and it may have incorrect information or information that is not supported by other Task Force participants, WAI, or W3C. It may also have some very useful information.

Media WebVTT Changes

From HTML accessibility task force Wiki

Jump to: navigation, search

WebVTT as specified supports a large number of features, in particular for subtitles and captions. However, we keep coming across proposals for changes / improvements. This page has been created to collect them.

Related discussion: http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2011-January/029859.html

Related document: http://www.w3.org/WAI/PF/HTML/wiki/Media_608_WebVTT_Conversion


Contents

Allow file-wide Metadata

Allow for name-value pairs as file-wide metadata underneath the file magic string and specify the format for providing name-value pairs - an empty line determines the end of the header section.

NOTE: WebVTT is specified to ignore everything that is behind the "WEBVTT" identifier and the a blank line. Therefore, metadata can work like this:

WEBVTT
Language=zh
Kind=Caption

1
00:00:15.000 --> 00:00:17.950
first cue

Rejected: Metadata like this will not be used by browsers. A different spec can make this possible, but not WebVTT. http://dvcs.w3.org/hg/text-tracks/raw-file/default/608toVTT/608toVTT.html does it.

Default Cue Settings

Introduce default cue settings in the header part of the file, possibly as a name-value pair, or some alternative dedicated form.

NOTE: Defaults could be added like this:

WEBVTT

DEFAULTS --> D:vertical A:end

00:00.000 --> 00:02.000
This is vertical and end-aligned.

00:02.500 --> 00:05.000
As is this.

DEFAULTS --> A:start

00:05.500 --> 00:07.000
This is horizontal and start-aligned.

Allow inline CSS

A subset of CSS features is required to bring HTML5 video captions on par with TV captions. Since non-browser applications will need to support these CSS features, too, it makes sense to allow the styles to be delivered inside a WebVTT file, too.

NOTE: This could be achieved with invalid cues like this:

  WEBVTT

  STYLE -->
  ::cue(v[voice=Bob]) { color: green; }
  ::cue(c.narration) { font-style: italic; }
  ::cue(c.narration i) { font-style: normal; }

  00:00.000 --> 00:02.000
  <v Bob>Welcome.

  00:02.500 --> 00:05.000
  <c .narration>To <i>WebVTT</i>.

Time Specifiers

Allow the use of shorter time specifiers, in particular:

  • "[[h*:]mm:]ss[.[d[c[m]]] | s*[.d[c[m]]]" as the start and end time
  • "-" as the separator between start and end time instead of “-->”
  • "+s*[.d[c[m]]]" as a possible end time specifier, or a relative mid-cue timestamp; the relative mid-cure timestamp works in aggregation

Rejected: A single time format simplifies authoring and parsing and most of these should not be authored manually. The separator also seems sufficiently mnemonic.

For details see http://www.w3.org/Bugs/Public/show_bug.cgi?id=12043 and http://lists.w3.org/Archives/Public/www-archive/2011Jun/0000.html.

Excerpts:

  • Currently the syntax is [h*:]mm:ss.sss; what's the advantage of making this more complicated? It's not like most subtitled clips will be shorter than a minute. Also, why would we want to support multiple redundant ways of expressing the same time? (e.g. 01:00.000 and 60.000) Readability of VTT files seems like it would be helped by consistency, which suggests using the same format everywhere, as much as possible.
  • "-->" seems pretty mnemonic to me. I don't see why we'd want to drop it.
  • I think if anything is absolute, it doesn't really make anything much simpler for anything else to be relative. If the author were to change the first time stamp because the video gained a 30 second advertisement at the start, then he would still need to change the hundreds of subseqent timestamps for all the additional cues. It's not like he's going to be doing it by hand, and once a tool is involved, the tool can change everything just as easily.
  • Hand-authoring a "paint-on" style caption seems like a world of pain regardless of the timestamp format we end up using, so I'm not sure it's a good argument for complicating the syntax with a second timestamp format.
Personal tools