This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 18129 - <track>/WebVTT Add comment block support
Summary: <track>/WebVTT Add comment block support
Status: RESOLVED DUPLICATE of bug 14552
Alias: None
Product: TextTracks CG
Classification: Unclassified
Component: WebVTT (show other bugs)
Version: unspecified
Hardware: Other other
: P3 normal
Target Milestone: ---
Assignee: Ian 'Hixie' Hickson
QA Contact: This bug has no owner yet - up for the taking
URL: http://www.whatwg.org/specs/web-apps/...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-07-18 17:22 UTC by contributor
Modified: 2012-08-30 02:57 UTC (History)
9 users (show)

See Also:


Attachments

Description contributor 2012-07-18 17:22:48 UTC
This was was cloned from bug 14552 as part of operation convergence.
Originally filed: 2011-10-24 11:53:00 +0000

================================================================================
 #0   contributor@whatwg.org                          2011-10-24 11:53:28 +0000 
--------------------------------------------------------------------------------
Specification: http://www.whatwg.org/specs/web-apps/current-work/multipage/the-video-element.html
Multipage: http://www.whatwg.org/C#parsing-0
Complete: http://www.whatwg.org/c#parsing-0

Comment:
<track> support // and /* */ comments everywhere except in the signature line

Posted from: 85.227.157.105
User agent: Opera/9.80 (Macintosh; Intel Mac OS X 10.5.8; U; en) Presto/2.9.168 Version/11.52
================================================================================
 #1   Simon Pieters                                   2011-10-24 11:58:25 +0000 
--------------------------------------------------------------------------------
It should be possible to add comments in WebVTT files. We should use an intuitive and useful comment syntax. We have implemented // for line comment and /* */ for multiline comment. Only 35 files out of 65k contain "/*" in our set of SRT files.
================================================================================
 #2   Simon Pieters                                   2011-10-24 12:00:27 +0000 
--------------------------------------------------------------------------------
If you want to use "/*" in cue data, we could introduce an entity for slash or asterix.
================================================================================
 #3   Philip J                                        2011-10-25 11:52:51 +0000 
--------------------------------------------------------------------------------
I want to mention another idea we had for the escaping problem. Rather than escaping < as &lt; and then by necessity having &amp;, < could be escaped as <<. <> is already treated as a bogus empty tag, so we could make that official and escape // as /<>/ and /* as /<>*. Admittedly weird, but perhaps worth considering.
================================================================================
 #4   Ralph Giles                                     2011-10-25 15:53:30 +0000 
--------------------------------------------------------------------------------
I've never liked having to use entities for < and >, that seems like a major opportunity for error in converting srt or just creating the files in general, so have some sympathy for <<, but using <> as an escape digraph doesn't seem any better.

I mean, it would be a great idea if everyone creating captions was an expert, but we have the opposite issue.

I would like to have a way to specify comments, and C-style is straightforward to parse, but my first thought was that I would have found shell style '#' or html-style '<!-- ... -->' comments more natural. Are there any of those in your SRT corpus?
================================================================================
 #5   Philip J                                        2011-10-25 16:13:41 +0000 
--------------------------------------------------------------------------------
(In reply to comment #4)
> I've never liked having to use entities for < and >, that seems like a major
> opportunity for error in converting srt or just creating the files in general,
> so have some sympathy for <<, but using <> as an escape digraph doesn't seem
> any better.

I agree, this is not exactly perfect. &escapes; are also not perfect, as we'd need &slash; which does not exist in HTML and almost none of the HTML escapes work in WebVTT, so it's not clear that having similar syntax is only a good thing.

> I mean, it would be a great idea if everyone creating captions was an expert,
> but we have the opposite issue.
> 
> I would like to have a way to specify comments, and C-style is straightforward
> to parse, but my first thought was that I would have found shell style '#' or
> html-style '<!-- ... -->' comments more natural. Are there any of those in your
> SRT corpus?

<!-- --> doesn't work for commenting out entire cues:

<!--
00:01.000 --> 00:02.000
Bla
-->

# occurs in 4211/65643 files. Example:

393
00:24:26,887 --> 00:24:29,276
# Hey, I know what I'm gonna do

// occurs in 2229/65643 files. Example:

510
00:40:43,500 --> 00:46:41,300
Subtitulado por aRGENTeaM
http://foro.argenteam.net
================================================================================
 #6   Ralph Giles                                     2011-10-25 17:15:45 +0000 
--------------------------------------------------------------------------------
Those are excellent points. I withdraw my suggestions for '#' and '<!-- -->'.

> // occurs in 2229/65643 files.

This makes me think // is a really bad idea too. I like just using plain text either within a cue before the timerange, or on its own as a heading, for one-line comments. Makes it harder to add extensions later, though.

So, what about just '/* */'? What's the use case for being able to comment out blocks? I can see it's great for testing, but does that justify all forgot-to-terminate and un-escaped '/*' blanking the rest of the subtitle file?
================================================================================
 #7   Ian 'Hixie' Hickson                             2011-10-25 23:59:52 +0000 
--------------------------------------------------------------------------------
Do we really need intra-cue comments? At least one caption writer reported at some point that actually they prefer not having comments at all as they're only really useful during translations and it's better to be able to see those comments while watching the video.

inter-cue comments we can support pretty much any way we want, since so long as you don't have a "-->" anywhere in the comment and don't have any blank lines, it'll be ignored.

So for example we could define a block that starts with "COMMENT" as being a comment (and require comments not to include "-->").
================================================================================
 #8   Silvia Pfeiffer                                 2011-10-26 08:59:22 +0000 
--------------------------------------------------------------------------------
Please let's not use // for comments - that would make all URLs really unreadable.

Also, I've seen "#" used in live captioning similar to ">>" as speech markers, though I can't find an example now.

I could live with /* */ for intra-cue comments. IIUC, right now anything inside < and > that is not a defined tag is ignored, so that would also work, though it's not as future-proof as a separate markup.

More important than intra-cue comments, though, is a way to comment out complete cues. Since that creates a section, it would be best if that could be done the same way as the sections that we need for headers for metadata, inline cue settings, and inline CSS. I can live with the suggestion to use the "-->" marker as a landmark to separate such sections.
================================================================================
 #9   Philip J                                        2011-10-26 09:30:07 +0000 
--------------------------------------------------------------------------------
(In reply to comment #8)
> Please let's not use // for comments - that would make all URLs really
> unreadable.

As mentioned, // occurs in 2229/65643 files. There are 3056 individual lines that match, and 2939 (96%) of those look like URLs (grep -i http://).

I don't think this is a big deal of caption or subtitle tracks, but for metadata tracks I have to agree that this has the potential to make a mess of things.

> Also, I've seen "#" used in live captioning similar to ">>" as speech markers,
> though I can't find an example now.
> 
> I could live with /* */ for intra-cue comments. IIUC, right now anything inside
> < and > that is not a defined tag is ignored, so that would also work, though
> it's not as future-proof as a separate markup.

/* */ (and //) isn't really intended to be intra-cue or inter-cue, it's pre-processor-like and works as if it wasn't there at all. For example, this would work:

00:01.000 --> /*00:02.000
Foo

00:02.000 --> */00:03.000
Bar

It creates a cue from 1 to 3 seconds with the cue text Bar. (The ability to due this is in itself not important, I'm just stressing that comments work exactly the same wherever you use them.)

> More important than intra-cue comments, though, is a way to comment out
> complete cues. Since that creates a section, it would be best if that could be
> done the same way as the sections that we need for headers for metadata, inline
> cue settings, and inline CSS. I can live with the suggestion to use the "-->"
> marker as a landmark to separate such sections.

Or is this some other suggestion that the COMMENT --> blocks? If not, do you really want to comment out each cue individually?
================================================================================
 #10  Silvia Pfeiffer                                 2011-10-26 09:44:23 +0000 
--------------------------------------------------------------------------------
> /* */ (and //) isn't really intended to be intra-cue or inter-cue, it's
> pre-processor-like and works as if it wasn't there at all. For example, this
> would work:
> 
> 00:01.000 --> /*00:02.000
> Foo
> 
> 00:02.000 --> */00:03.000
> Bar

This can already be done as:

00:01.000 --> <00:02.000
Foo
 
00:02.000 --> 00:03.000
Bar


Now, however, the --> is an end marker, so it won't work for multiple cues.

I agree that /* */ is a more powerful and future-proof way of commenting out sections, including several cues.
================================================================================
 #11  Silvia Pfeiffer                                 2011-10-26 09:59:06 +0000 
--------------------------------------------------------------------------------
OK, I got that wrong - the < > commenting only works within the cue content.
================================================================================
 #12  Ian 'Hixie' Hickson                             2011-10-26 20:49:36 +0000 
--------------------------------------------------------------------------------
I don't really understand what the use case is here. What kind of comments are we talking about? Why would people use them?
================================================================================
 #13  Philip J                                        2011-10-28 13:01:57 +0000 
--------------------------------------------------------------------------------
(In reply to comment #12)
> I don't really understand what the use case is here. What kind of comments are
> we talking about? Why would people use them?

WEBVTT

/* Free to modify, sell and whatever.
 * <v bm> = Batman
 * <v cw> = Catwoman
 */

00:01.000 --> 00:02.000
<v bm /* or was it cw? */>I want a sandwhich

/* these don't seem to belong, from another cut?

00:30.000 --> 00:34.000
bla

00:40.000 --> 00:44.000
bla

00:50.000 --> 00:54.000
bla

*/

45:00.000 --> 45:01.000
<v cw>Now die!</v>
<v bm>Mkay</v>
/* FIXME: did Robin say something there? */

/* Good for you...
50:00.000 --> 50:02.000
Subbed by da 
================================================================================
 #14  Philip J                                        2011-10-28 13:04:11 +0000 
--------------------------------------------------------------------------------
In writing the above, I discovered something problematic. If I had written this:

45:00.000 --> 45:01.000
<v cw>Now die!</v>
/* FIXME: did Robin say something there? */
<v bm>Mkay</v>

Then that would be equivalent to this:

45:00.000 --> 45:01.000
<v cw>Now die!</v>

<v bm>Mkay</v>

Which would have cut the cue short. That's not a great feature...
================================================================================
 #15  Silvia Pfeiffer                                 2011-10-28 13:56:11 +0000 
--------------------------------------------------------------------------------
(In reply to comment #14)
> In writing the above, I discovered something problematic. If I had written
> this:
> 
> 45:00.000 --> 45:01.000
> <v cw>Now die!</v>
> /* FIXME: did Robin say something there? */
> <v bm>Mkay</v>
> 
> Then that would be equivalent to this:
> 
> 45:00.000 --> 45:01.000
> <v cw>Now die!</v>
> 
> <v bm>Mkay</v>
> 
> Which would have cut the cue short. That's not a great feature...

Well, as the spec is written right now, it's simply cue content. But it seems indeed that once we introduce comments, we will need to define that if parsing of comments results in an empty line, that line needs to be removed.
================================================================================
 #16  Ian 'Hixie' Hickson                             2011-11-01 04:18:29 +0000 
--------------------------------------------------------------------------------
I really don't think /*...*/ comments fit with the style of WebVTT, based on comment 13.

Comment 13 has several use cases:
 * Notes at the top of the file for things like copyright, typographic conventions, etc
 * Notes in a cue regarding uncertainty of the transcription's accuracy
 * Removing blocks of cues that don't belong, but leaving them there anyway (for some reason?)
 * Commenting out cues that were in an earlier version that didn't correspond to the actual audio, e.g. because a cue has ugly-looking credits.

The first one we can easily handle using some sort of non-cue syntax like:

   WEBVTT

   NOTES
   bla bla
   bla (this block cannot contain the string "-" "-" ">"
   since recent changes have made the parser eager to find such strings

   00:01.000 --> 00:02.000
   Bla

The second we could handle in a similar way by requiring authors to put such notes between cues, rather than before or after cues:


   WEBVTT

   00:01.000 --> 00:02.000
   Bla

   NOTE i'm not sure the next cue's timing is right

   00:02.000 --> 00:03.000
   Bla

   00:03.000 --> 00:04.000
   Bla

(such a note couldn't include --> either.)


The last two 
================================================================================
 #17  Ian 'Hixie' Hickson                             2011-12-07 23:26:21 +0000 
--------------------------------------------------------------------------------
I'm leaning towards only supporting comment blocks, not supporting inline comments and not supporting commenting out cues.
================================================================================
 #18  Silvia Pfeiffer                                 2011-12-11 08:42:09 +0000 
--------------------------------------------------------------------------------
Philip: in your SRT collection, what types of comments are you seeing? Do you see attempts at comments inside cues and also completely commented out cues with some of the special characters that we discussed?

My thought is that even if there are just one or two examples for these, they indicate a need, which people tried to satisfy in unconventional ways.
================================================================================
 #19  Philip J                                        2011-12-11 10:55:48 +0000 
--------------------------------------------------------------------------------
(In reply to comment #18)
> Philip: in your SRT collection, what types of comments are you seeing? Do you
> see attempts at comments inside cues and also completely commented out cues
> with some of the special characters that we discussed?
> 
> My thought is that even if there are just one or two examples for these, they
> indicate a need, which people tried to satisfy in unconventional ways.

I'm not sure how to search for that given that SRT doesn't actually have a comment syntax. One trick I know exists is to create a cue with no duration, e.g.

00:00:00.000 --> 00:00:00.000
this is a comment

I don't think there's any trick for intra-cue comments that would work in SRT.
================================================================================
 #20  Silvia Pfeiffer                                 2012-02-02 04:02:30 +0000 
--------------------------------------------------------------------------------
Getting the information from different discussion threads in here:

An earlier suggestion by Ian was:
>
> For example:
>
>  00:00.000 --> 00:01.000
>  one <! inline comment > one
>
>  COMMENT-->
>  00:02.000 --> 00:03.000
>  two; this is entirely
>  commented out
>
>  <! this is the ID line
>  00:04.000 --> 00:05.000
>  three; last line is a ">"
>  which is part of the cue
>  and is not a comment.
>  >
>
> The above would work today in a conforming UA. The question really is what
> parts of this do we want to support and what do we not care enough about.

I'm bringing this up again because the WebM community is currently discussing how to encapsulate WebVTT into WebM, see https://docs.google.com/document/d/1-tVXd1mRlWNvZrdIkLAJEp5xt3gDVDwfVubyUm9oNJ4/edit?hl=en_US .

It has some thoughts on how to handle the features that we are still discussing, comments amongst them.

It would be time we resolve those outstanding features.
================================================================================
 #21  Silvia Pfeiffer                                 2012-02-07 03:31:22 +0000 
--------------------------------------------------------------------------------
Something that Glenn Maynard brought up on the mailing list:
we could support inline comments simply as a special type of class, e.g.

WEBVTT

00:00.000 --> 00:01.000
one <c hidden>inline comment</c> two


We then just need a default style for class=hidden that maps to CSS display:none and for standalone players is meaningful as a hidden class.

That would solve the inline comments case.

Then we just need to solve commenting out blocks or cues.
================================================================================
 #22  Ian 'Hixie' Hickson                             2012-04-25 18:46:41 +0000 
--------------------------------------------------------------------------------
I think we shouldn't solve the inline comment case. (We can "solve" it by letting people hide bits of cues as in comment 21, but that shouldn't be an official solution, since it mixes presentation and content in a way that would break in non-CSS UAs.)

I don't really understand the use case for commenting out a cue but leaving it in.

For the remaining use cases I'm planning on going with comment 16's proposal.
================================================================================
Comment 1 Silvia Pfeiffer 2012-08-17 12:41:39 UTC
Moving to Text Track CG where WebVTT is being specified.
Comment 2 Ian 'Hixie' Hickson 2012-08-29 23:49:35 UTC

*** This bug has been marked as a duplicate of bug 14552 ***
Comment 3 Silvia Pfeiffer 2012-08-30 02:57:38 UTC
Hmm... this is a bug of the TextTracks CG - are you now moving all bugs that relate to WebVTT to the WHATWG?