This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 15851 - WebVTT: File-wide Metadata
Summary: WebVTT: File-wide Metadata
Status: NEW
Alias: None
Product: TextTracks CG
Classification: Unclassified
Component: WebVTT (show other bugs)
Version: unspecified
Hardware: PC All
: P2 normal
Target Milestone: ---
Assignee: Silvia Pfeiffer
QA Contact: This bug has no owner yet - up for the taking
URL:
Whiteboard: v2
Keywords:
Depends on: 18657
Blocks:
  Show dependency treegraph
 
Reported: 2012-02-02 04:36 UTC by Silvia Pfeiffer
Modified: 2015-10-05 00:56 UTC (History)
7 users (show)

See Also:


Attachments

Description Silvia Pfeiffer 2012-02-02 04:36:06 UTC
For HTML-page independent use of WebVTT files the WebVTT parser [1] tolerates [2] the introduction of metadata at the top of the WebVTT file after the "WEBVTT" file identifier and the first cue, separated by a blank line.

Web browsers won't care what is in there and can ignore this header metadata. But other applications will care, including the WebVTT encapsulation specification for WebM as currently under development [3] and other players that need the information that the <track> element has.

We therefore need to define the format of header metadata and which fields should be used by default.

I propose we use the name-value approach to metadata with a "=" as a separator and one name-value per line, e.g.
Version=V1_ABC
License=CC-BY-SA

Also, in analogy to what is happening with the <track> element in HTML, we should encourage the three following standard metadata headers:
Language=en
Kind=Captions
Label=English captions

Any further metadata header schemes can be created by other organisations based on this (e.g. Dublin Core).

In summary:
I would suggest to introduce into the spec a normative note about the format of header metadata fields and an informative note about the recommended core metadata headers Language, Kind and Label.

[1] http://dev.w3.org/html5/webvtt/#webvtt-parser
[2] by virtue of parsing step 15
[3] https://docs.google.com/document/d/1-tVXd1mRlWNvZrdIkLAJEp5xt3gDVDwfVubyUm9oNJ4/edit?hl=en_US
Comment 1 David Singer 2012-02-22 18:49:12 UTC
We see a need for these items for 

a) timestamp alignment for transport streams
b) CSS style sheets by reference
c) CSS inline style sheet
d) any data that should go into the HTML when the content is there embedded (either to help authoring the HTML, or for resources played outside an HTML context)

I am concerned that an inline stylesheet doesn't fit well into the keyword=value metaphor, unless we also allow some bracketing syntax like
stylesheet=[[ and a stylesheet
goes here
without blank lines
and so on
]]
Comment 2 Silvia Pfeiffer 2012-02-22 21:16:32 UTC
I would think that a solution to the metadata need should be different from a solution for the in-line styling need. In fact, there is a separate bug for the styles at https://www.w3.org/Bugs/Public/show_bug.cgi?id=15023
Comment 3 Ralph Giles 2012-02-22 22:38:32 UTC
I think we should implement the name-value pair-per-line idea Silvia proposes in the description as a starting point.

However, I prefer RFC 822/HTTP-style message header:

Language: en
Kind: Captions
Label: English captions

I think it's easier to read in a text-based format. We picked the '=' separator for Vorbis comments because it's less common in text, and as a semi-binary format it was easier to not half to parse out the whitespace.

FWIW RFC 822 offers another method of multi-line values: indented lines are continuations of the previous. That still requires reformatting e.g. CSS to remove empty lines and ensure indents, of course.

Language: es
Kind: subtitles
Style:
  ::cue {
    font-family: Papyrus;
    color: gold;
  }
License: CC-BY
Comment 4 Silvia Pfeiffer 2012-02-22 22:51:05 UTC
I am not fussed about what separator we use.

I also like the idea of RFC 822 of using indent for continuations - that will help with multi-line metadata values, or even hierarchical structuring of metadata values, if somebody requires it.

Whether we treat in-line styling and in fact default settings (see https://www.w3.org/Bugs/Public/show_bug.cgi?id=15024) in this way is another question to solve.

I guess I would prefer if a Web browser could just ignore all metadata, but look at default settings and in-line styling as relevant. Thus, I would prefer to have them separated out and not lost amongst other metadata.
Comment 5 Ian 'Hixie' Hickson 2012-04-25 21:53:26 UTC
Rather than discuss solutions, please file bugs on use cases.

I recommend filing a separate bug for each use case, and then either closing this bug or marking this bug as depending on the bugs describing the use cases.
Comment 6 Silvia Pfeiffer 2012-04-30 01:04:46 UTC
Ian: we can register a bug for each metadata name-value pair, but we actually first need a general means of how to specify them. That's what this bug is about.

See thread starting at http://lists.w3.org/Archives/Public/public-texttracks/2012Feb/0031.html and coming to a conclusion at http://lists.w3.org/Archives/Public/public-texttracks/2012Apr/0093.html .
Comment 7 Ian 'Hixie' Hickson 2012-07-24 05:53:42 UTC
(In reply to comment #6)
> Ian: we can register a bug for each metadata name-value pair, but we actually
> first need a general means of how to specify them. That's what this bug is
> about.

You shouldn't file bugs on metadata name-value pairs. Those aren't use cases. They're solutions.

To repeat comment 5: I recommend filing a separate bug for each use case, and then either closing this bug or marking this bug as depending on the bugs describing the use cases. This bug has a scattering of poorly described use cases and mostly lots of discussion of possible solutions that (as far as I can tell) aren't really good solutions to those use cases.
Comment 8 David Singer 2012-08-29 22:24:18 UTC
see use cases and summary at <http://lists.w3.org/Archives/Public/public-texttracks/2012Aug/0063.html>
Comment 9 David Singer 2012-08-29 23:52:31 UTC
see also <http://lists.w3.org/Archives/Public/public-texttracks/2012Apr/0093.html> for more details of the proposed syntax
Comment 10 Ian 'Hixie' Hickson 2012-11-01 23:27:18 UTC
Please file each use case as a separate bug.