This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 14552 - <track>/WebVTT Add comment block support
Summary: <track>/WebVTT Add comment block support
Status: RESOLVED FIXED
Alias: None
Product: WHATWG
Classification: Unclassified
Component: HTML (show other bugs)
Version: unspecified
Hardware: Other other
: P1 blocker
Target Milestone: Unsorted
Assignee: Ian 'Hixie' Hickson
QA Contact: contributor
URL: http://www.whatwg.org/specs/web-apps/...
Whiteboard: v2
Keywords:
: 18129 (view as bug list)
Depends on:
Blocks:
 
Reported: 2011-10-24 11:53 UTC by contributor
Modified: 2012-11-21 19:25 UTC (History)
7 users (show)

See Also:


Attachments

Description contributor 2011-10-24 11:53:28 UTC
Specification: http://www.whatwg.org/specs/web-apps/current-work/multipage/the-video-element.html
Multipage: http://www.whatwg.org/C#parsing-0
Complete: http://www.whatwg.org/c#parsing-0

Comment:
<track> support // and /* */ comments everywhere except in the signature line

Posted from: 85.227.157.105
User agent: Opera/9.80 (Macintosh; Intel Mac OS X 10.5.8; U; en) Presto/2.9.168 Version/11.52
Comment 1 Simon Pieters 2011-10-24 11:58:25 UTC
It should be possible to add comments in WebVTT files. We should use an intuitive and useful comment syntax. We have implemented // for line comment and /* */ for multiline comment. Only 35 files out of 65k contain "/*" in our set of SRT files.
Comment 2 Simon Pieters 2011-10-24 12:00:27 UTC
If you want to use "/*" in cue data, we could introduce an entity for slash or asterix.
Comment 3 Philip Jägenstedt 2011-10-25 11:52:51 UTC
I want to mention another idea we had for the escaping problem. Rather than escaping < as &lt; and then by necessity having &amp;, < could be escaped as <<. <> is already treated as a bogus empty tag, so we could make that official and escape // as /<>/ and /* as /<>*. Admittedly weird, but perhaps worth considering.
Comment 4 Ralph Giles 2011-10-25 15:53:30 UTC
I've never liked having to use entities for < and >, that seems like a major opportunity for error in converting srt or just creating the files in general, so have some sympathy for <<, but using <> as an escape digraph doesn't seem any better.

I mean, it would be a great idea if everyone creating captions was an expert, but we have the opposite issue.

I would like to have a way to specify comments, and C-style is straightforward to parse, but my first thought was that I would have found shell style '#' or html-style '<!-- ... -->' comments more natural. Are there any of those in your SRT corpus?
Comment 5 Philip Jägenstedt 2011-10-25 16:13:41 UTC
(In reply to comment #4)
> I've never liked having to use entities for < and >, that seems like a major
> opportunity for error in converting srt or just creating the files in general,
> so have some sympathy for <<, but using <> as an escape digraph doesn't seem
> any better.

I agree, this is not exactly perfect. &escapes; are also not perfect, as we'd need &slash; which does not exist in HTML and almost none of the HTML escapes work in WebVTT, so it's not clear that having similar syntax is only a good thing.

> I mean, it would be a great idea if everyone creating captions was an expert,
> but we have the opposite issue.
> 
> I would like to have a way to specify comments, and C-style is straightforward
> to parse, but my first thought was that I would have found shell style '#' or
> html-style '<!-- ... -->' comments more natural. Are there any of those in your
> SRT corpus?

<!-- --> doesn't work for commenting out entire cues:

<!--
00:01.000 --> 00:02.000
Bla
-->

# occurs in 4211/65643 files. Example:

393
00:24:26,887 --> 00:24:29,276
# Hey, I know what I'm gonna do

// occurs in 2229/65643 files. Example:

510
00:40:43,500 --> 00:46:41,300
Subtitulado por aRGENTeaM
http://foro.argenteam.net
Comment 6 Ralph Giles 2011-10-25 17:15:45 UTC
Those are excellent points. I withdraw my suggestions for '#' and '<!-- -->'.

> // occurs in 2229/65643 files.

This makes me think // is a really bad idea too. I like just using plain text either within a cue before the timerange, or on its own as a heading, for one-line comments. Makes it harder to add extensions later, though.

So, what about just '/* */'? What's the use case for being able to comment out blocks? I can see it's great for testing, but does that justify all forgot-to-terminate and un-escaped '/*' blanking the rest of the subtitle file?
Comment 7 Ian 'Hixie' Hickson 2011-10-25 23:59:52 UTC
Do we really need intra-cue comments? At least one caption writer reported at some point that actually they prefer not having comments at all as they're only really useful during translations and it's better to be able to see those comments while watching the video.

inter-cue comments we can support pretty much any way we want, since so long as you don't have a "-->" anywhere in the comment and don't have any blank lines, it'll be ignored.

So for example we could define a block that starts with "COMMENT" as being a comment (and require comments not to include "-->").
Comment 8 Silvia Pfeiffer 2011-10-26 08:59:22 UTC
Please let's not use // for comments - that would make all URLs really unreadable.

Also, I've seen "#" used in live captioning similar to ">>" as speech markers, though I can't find an example now.

I could live with /* */ for intra-cue comments. IIUC, right now anything inside < and > that is not a defined tag is ignored, so that would also work, though it's not as future-proof as a separate markup.

More important than intra-cue comments, though, is a way to comment out complete cues. Since that creates a section, it would be best if that could be done the same way as the sections that we need for headers for metadata, inline cue settings, and inline CSS. I can live with the suggestion to use the "-->" marker as a landmark to separate such sections.
Comment 9 Philip Jägenstedt 2011-10-26 09:30:07 UTC
(In reply to comment #8)
> Please let's not use // for comments - that would make all URLs really
> unreadable.

As mentioned, // occurs in 2229/65643 files. There are 3056 individual lines that match, and 2939 (96%) of those look like URLs (grep -i http://).

I don't think this is a big deal of caption or subtitle tracks, but for metadata tracks I have to agree that this has the potential to make a mess of things.

> Also, I've seen "#" used in live captioning similar to ">>" as speech markers,
> though I can't find an example now.
> 
> I could live with /* */ for intra-cue comments. IIUC, right now anything inside
> < and > that is not a defined tag is ignored, so that would also work, though
> it's not as future-proof as a separate markup.

/* */ (and //) isn't really intended to be intra-cue or inter-cue, it's pre-processor-like and works as if it wasn't there at all. For example, this would work:

00:01.000 --> /*00:02.000
Foo

00:02.000 --> */00:03.000
Bar

It creates a cue from 1 to 3 seconds with the cue text Bar. (The ability to due this is in itself not important, I'm just stressing that comments work exactly the same wherever you use them.)

> More important than intra-cue comments, though, is a way to comment out
> complete cues. Since that creates a section, it would be best if that could be
> done the same way as the sections that we need for headers for metadata, inline
> cue settings, and inline CSS. I can live with the suggestion to use the "-->"
> marker as a landmark to separate such sections.

Or is this some other suggestion that the COMMENT --> blocks? If not, do you really want to comment out each cue individually?
Comment 10 Silvia Pfeiffer 2011-10-26 09:44:23 UTC
> /* */ (and //) isn't really intended to be intra-cue or inter-cue, it's
> pre-processor-like and works as if it wasn't there at all. For example, this
> would work:
> 
> 00:01.000 --> /*00:02.000
> Foo
> 
> 00:02.000 --> */00:03.000
> Bar

This can already be done as:

00:01.000 --> <00:02.000
Foo
 
00:02.000 --> 00:03.000
Bar


Now, however, the --> is an end marker, so it won't work for multiple cues.

I agree that /* */ is a more powerful and future-proof way of commenting out sections, including several cues.
Comment 11 Silvia Pfeiffer 2011-10-26 09:59:06 UTC
OK, I got that wrong - the < > commenting only works within the cue content.
Comment 12 Ian 'Hixie' Hickson 2011-10-26 20:49:36 UTC
I don't really understand what the use case is here. What kind of comments are we talking about? Why would people use them?
Comment 13 Philip Jägenstedt 2011-10-28 13:01:57 UTC
(In reply to comment #12)
> I don't really understand what the use case is here. What kind of comments are
> we talking about? Why would people use them?

WEBVTT

/* Free to modify, sell and whatever.
 * <v bm> = Batman
 * <v cw> = Catwoman
 */

00:01.000 --> 00:02.000
<v bm /* or was it cw? */>I want a sandwhich

/* these don't seem to belong, from another cut?

00:30.000 --> 00:34.000
bla

00:40.000 --> 00:44.000
bla

00:50.000 --> 00:54.000
bla

*/

45:00.000 --> 45:01.000
<v cw>Now die!</v>
<v bm>Mkay</v>
/* FIXME: did Robin say something there? */

/* Good for you...
50:00.000 --> 50:02.000
Subbed by da 
Comment 14 Philip Jägenstedt 2011-10-28 13:04:11 UTC
In writing the above, I discovered something problematic. If I had written this:

45:00.000 --> 45:01.000
<v cw>Now die!</v>
/* FIXME: did Robin say something there? */
<v bm>Mkay</v>

Then that would be equivalent to this:

45:00.000 --> 45:01.000
<v cw>Now die!</v>

<v bm>Mkay</v>

Which would have cut the cue short. That's not a great feature...
Comment 15 Silvia Pfeiffer 2011-10-28 13:56:11 UTC
(In reply to comment #14)
> In writing the above, I discovered something problematic. If I had written
> this:
> 
> 45:00.000 --> 45:01.000
> <v cw>Now die!</v>
> /* FIXME: did Robin say something there? */
> <v bm>Mkay</v>
> 
> Then that would be equivalent to this:
> 
> 45:00.000 --> 45:01.000
> <v cw>Now die!</v>
> 
> <v bm>Mkay</v>
> 
> Which would have cut the cue short. That's not a great feature...

Well, as the spec is written right now, it's simply cue content. But it seems indeed that once we introduce comments, we will need to define that if parsing of comments results in an empty line, that line needs to be removed.
Comment 16 Ian 'Hixie' Hickson 2011-11-01 04:18:29 UTC
I really don't think /*...*/ comments fit with the style of WebVTT, based on comment 13.

Comment 13 has several use cases:
 * Notes at the top of the file for things like copyright, typographic conventions, etc
 * Notes in a cue regarding uncertainty of the transcription's accuracy
 * Removing blocks of cues that don't belong, but leaving them there anyway (for some reason?)
 * Commenting out cues that were in an earlier version that didn't correspond to the actual audio, e.g. because a cue has ugly-looking credits.

The first one we can easily handle using some sort of non-cue syntax like:

   WEBVTT

   NOTES
   bla bla
   bla (this block cannot contain the string "-" "-" ">"
   since recent changes have made the parser eager to find such strings

   00:01.000 --> 00:02.000
   Bla

The second we could handle in a similar way by requiring authors to put such notes between cues, rather than before or after cues:


   WEBVTT

   00:01.000 --> 00:02.000
   Bla

   NOTE i'm not sure the next cue's timing is right

   00:02.000 --> 00:03.000
   Bla

   00:03.000 --> 00:04.000
   Bla

(such a note couldn't include --> either.)


The last two 
Comment 17 Ian 'Hixie' Hickson 2011-12-07 23:26:21 UTC
I'm leaning towards only supporting comment blocks, not supporting inline comments and not supporting commenting out cues.
Comment 18 Silvia Pfeiffer 2011-12-11 08:42:09 UTC
Philip: in your SRT collection, what types of comments are you seeing? Do you see attempts at comments inside cues and also completely commented out cues with some of the special characters that we discussed?

My thought is that even if there are just one or two examples for these, they indicate a need, which people tried to satisfy in unconventional ways.
Comment 19 Philip Jägenstedt 2011-12-11 10:55:48 UTC
(In reply to comment #18)
> Philip: in your SRT collection, what types of comments are you seeing? Do you
> see attempts at comments inside cues and also completely commented out cues
> with some of the special characters that we discussed?
> 
> My thought is that even if there are just one or two examples for these, they
> indicate a need, which people tried to satisfy in unconventional ways.

I'm not sure how to search for that given that SRT doesn't actually have a comment syntax. One trick I know exists is to create a cue with no duration, e.g.

00:00:00.000 --> 00:00:00.000
this is a comment

I don't think there's any trick for intra-cue comments that would work in SRT.
Comment 20 Silvia Pfeiffer 2012-02-02 04:02:30 UTC
Getting the information from different discussion threads in here:

An earlier suggestion by Ian was:
>
> For example:
>
>  00:00.000 --> 00:01.000
>  one <! inline comment > one
>
>  COMMENT-->
>  00:02.000 --> 00:03.000
>  two; this is entirely
>  commented out
>
>  <! this is the ID line
>  00:04.000 --> 00:05.000
>  three; last line is a ">"
>  which is part of the cue
>  and is not a comment.
>  >
>
> The above would work today in a conforming UA. The question really is what
> parts of this do we want to support and what do we not care enough about.

I'm bringing this up again because the WebM community is currently discussing how to encapsulate WebVTT into WebM, see https://docs.google.com/document/d/1-tVXd1mRlWNvZrdIkLAJEp5xt3gDVDwfVubyUm9oNJ4/edit?hl=en_US .

It has some thoughts on how to handle the features that we are still discussing, comments amongst them.

It would be time we resolve those outstanding features.
Comment 21 Silvia Pfeiffer 2012-02-07 03:31:22 UTC
Something that Glenn Maynard brought up on the mailing list:
we could support inline comments simply as a special type of class, e.g.

WEBVTT

00:00.000 --> 00:01.000
one <c hidden>inline comment</c> two


We then just need a default style for class=hidden that maps to CSS display:none and for standalone players is meaningful as a hidden class.

That would solve the inline comments case.

Then we just need to solve commenting out blocks or cues.
Comment 22 Ian 'Hixie' Hickson 2012-04-25 18:46:41 UTC
I think we shouldn't solve the inline comment case. (We can "solve" it by letting people hide bits of cues as in comment 21, but that shouldn't be an official solution, since it mixes presentation and content in a way that would break in non-CSS UAs.)

I don't really understand the use case for commenting out a cue but leaving it in.

For the remaining use cases I'm planning on going with comment 16's proposal.
Comment 23 contributor 2012-07-18 17:22:53 UTC
This bug was cloned to create bug 18129 as part of operation convergence.
Comment 24 Ian 'Hixie' Hickson 2012-08-29 23:49:35 UTC
*** Bug 18129 has been marked as a duplicate of this bug. ***
Comment 25 contributor 2012-11-01 23:27:20 UTC
Checked in as WHATWG revision r7500.
Check-in comment: Define a syntax for comments in WebVTT (doesn't affect parsers)
http://html5.org/tools/web-apps-tracker?from=7499&to=7500
Comment 26 Philip Jägenstedt 2012-11-02 08:23:17 UTC
Is there any particular reason that both NOTE and NOTES is allowed to start a comment? It seems to invite the illusion that NOTE is for one-line comments and NOTES is for multi-line comments or similar.
Comment 27 Ian 'Hixie' Hickson 2012-11-02 21:18:34 UTC
It seemed weird to have either always singular or always plural, but I don't feel strongly about it one way or the other. If you think the world would be better with just one, please reopen the bug.

The example in the spec is intended to discourage thinking of them as being for one-line vs multiple-line, FWIW.
Comment 28 Simon Pieters 2012-11-05 10:40:58 UTC
I prefer one way to do it. Having two legal syntaxes have a tendency to waste people's time in arguing which to use, more time to learn, and so forth. One legal syntax is more straightforward. I suggest we go with "NOTE".
Comment 29 Silvia Pfeiffer 2012-11-05 11:32:58 UTC
In comment 16 you wrote as one use case:
* Notes at the top of the file for things like copyright, typographic conventions, etc

Does this mean you expect us to put what we this far discussed as "metadata" in a NOTES section at the start of the file? E.g.

WEBVTT

NOTES
Kind: captions
Language: en-US
Copyright: CC-BY-SA 3.0
blah: blah

00:01.000 --> 00:02.000
Bla


As for NOTE vs NOTES - I have a slight preference for NOTES, since most typically it will be more than one line. But I'm not really fussed - I could live with both.
Comment 30 Philip Jägenstedt 2012-11-05 12:40:00 UTC
I agree with Simon that just NOTE seems slightly better. What one usually calls a note in singular can be more than one sentence, so I think that's fine. (This is bikeshedding at its finest.)
Comment 31 Ian 'Hixie' Hickson 2012-11-05 20:08:41 UTC
(In reply to comment #29)
> In comment 16 you wrote as one use case:
> * Notes at the top of the file for things like copyright, typographic
> conventions, etc
> 
> Does this mean you expect us to put what we this far discussed as "metadata"
> in a NOTES section at the start of the file?

If you have use cases, file bugs. If you don't, I don't expect you to put anythign in the file at all.

(If the use cases are just things intended for human readers and not software processing, then yes, putting that info in a comment is the way to go. For example, look at CSS, where some things need to be machine-readable, like the character encoding, and thus get special syntax, like @charset, while others do not need to be read by anyone but humans, like copyrights, and those just get put in comments.)


(In reply to comment #28)
> I prefer one way to do it. Having two legal syntaxes have a tendency to
> waste people's time in arguing which to use, more time to learn, and so
> forth. One legal syntax is more straightforward. I suggest we go with "NOTE".

Roger.
Comment 32 contributor 2012-11-21 19:25:22 UTC
Checked in as WHATWG revision r7521.
Check-in comment: Change the comment syntax to only allow NOTE, not NOTES.
http://html5.org/tools/web-apps-tracker?from=7520&to=7521