This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 22104 - [WebVTT] Having a cue-setting keyword immediately following cuetimes (without whitespace) seems to be legal
Summary: [WebVTT] Having a cue-setting keyword immediately following cuetimes (without...
Status: RESOLVED WONTFIX
Alias: None
Product: TextTracks CG
Classification: Unclassified
Component: WebVTT (show other bugs)
Version: unspecified
Hardware: All All
: P2 normal
Target Milestone: ---
Assignee: Silvia Pfeiffer
QA Contact: This bug has no owner yet - up for the taking
URL:
Whiteboard: v1
Keywords:
Depends on:
Blocks:
 
Reported: 2013-05-21 05:52 UTC by Caitlin Potter (:caitp)
Modified: 2014-01-30 21:19 UTC (History)
7 users (show)

See Also:


Attachments

Description Caitlin Potter (:caitp) 2013-05-21 05:52:36 UTC
According to http://dev.w3.org/html5/webvtt/#dfn-collect-webvtt-cue-timings-and-settings, the string immediately following the second webvtt timestamp is treated as cuetimes, and that substring is split on whitespace. However this doesn't seem to take the initial whitespace into account, therefore the following string: `00:00.100 --> 00:05.000position:20% line:-3` is apparently totally legal.

It's not clear if this is intentional or not. Intuitively, it doesn't really make sense to not include some kind of separator between the timestamp and setting.
Comment 1 Simon Pieters 2013-05-21 07:53:02 UTC
It's not legal. The syntax section requires whitespace there:

[[
Optionally, one or more U+0020 SPACE characters or U+0009 CHARACTER TABULATION (tab) characters followed by WebVTT cue settings.
]]
http://dev.w3.org/html5/webvtt/#syntax

Do you think the parser should do something differently when it encounters a cue without whitespace before the settings?
Comment 2 Caitlin Potter (:caitp) 2013-05-21 08:08:22 UTC
According to what you've said in the previous post, this section of the spec is entirely meaningless because it does not match what the parsing algorithm is said to do.

The parsing algorithm does not say anything about requiring one or more spaces, and if a parser implementation is not supposed to care about the "authoring" spec, then how exactly is the parser implementor supposed to know that the grammar requires this space?

I will point you to the WebKit/Blink implementation, which currently do not look for a space and treat the string I showed as a demonstration as valid. And this is entirely in line with what the parsing spec actually says to do.
Comment 3 Simon Pieters 2013-05-21 10:45:01 UTC
The parser implementor isn't supposed to care about what the syntax requires of authors.

The WebKit/Blink implementation conforms to what the spec requires.

You're confusing requirements for authors and requirements for implementors. They are orthogonal.
Comment 4 Caitlin Potter (:caitp) 2013-05-21 13:00:23 UTC
This is what I'm talking about though, it is silly and not helpful to have a different definition for the language that the parser interprets, and the language that authors are supposed to create. There is no real sensible reason for this. The "authoring syntax" isn't simply a "subset of the acceptable grammar", it takes different turns on the same phrases. This creates two problems. One is the confusion for developers IE WebKit/Blink as well as Mozilla partially implementing the language defined in the "syntax" section, and partially the language defined by the parsing algorithm. They are not really the same language, one is not a subset of the other, unless you squint really hard.

I understand that there are good, optimistic intentions for these two languages, but they don't seem to be realistic, as problems have already been occurring. Even if you make it extremely clear to implementors, it's a lot harder to clear the confusion for authors.

So, when WebKit refuses to accept a cue where the cue-times do not surround the "-->" separator with whitespace characters, it is complying with the syntax section, and not the parsing algorithm. However, when it accepts a cue-setting keyword immediately adjacent to a cue-times timestamp, without a separating whitespace character, it is complying with the parsing algorithm and NOT the syntax. So you can see that there is already a mishmash of these two languages in a single implementation. WebKit/Blink aren't alone in doing this, our implementation is quite guilty of it too.

It's not so much a matter of the specifications being poorly written or confusing, it's just the fact that there are two separate definitions for the same language, and those definitions are in some cases contradictory.

The point of these bugs is to clarify the vagueness of the parsing algorithm, but it seems like I can't get into this conversation without saying how problematic it is to have two conflicting definitions for one language. Again, I know there are good intentions behind it, but is there any evidence that those good intentions actually have good results? are there not better ways to define backwards compatible extensions to the language?
Comment 5 Caitlin Potter (:caitp) 2013-05-21 13:34:43 UTC
>The WebKit/Blink implementation conforms to what the spec requires.

This is not really the case, as I've stated above, the WebKit/Blink implementation is a mishmash of the requirements for authors, and the parsing algorithm. https://github.com/WebKit/webkit/blame/master/Source/WebCore/html/track/WebVTTParser.cpp#L284 << This shows the parser discarding cues where the "-->" separator is not surrounded by whitespace (contrary to the parsing algorithm, but in compliance with the syntax section), and https://github.com/WebKit/webkit/blame/master/Source/WebCore/html/track/WebVTTParser.cpp#L306 shows the parser setting everything from after the end timestamp, to the end of the line, as cue-settings (and therefore not caring if there is whitespace separating the first cue-setting from the end timestamp). This is compliant with the parsing algorithm (although it probably shouldn't be), but is not compliant with the syntax spec.

This is a mishmash, the requirements of these two sections are contradictory, and having two different definitions for the same language is more of a hinderance than a help. I'm trying to explain this very clearly, but it seems like I'm not getting through :(
Comment 6 Simon Pieters 2013-05-21 18:03:25 UTC
(In reply to comment #5)
> >The WebKit/Blink implementation conforms to what the spec requires.
> 
> This is not really the case, as I've stated above, the WebKit/Blink
> implementation is a mishmash of the requirements for authors, and the
> parsing algorithm.
> https://github.com/WebKit/webkit/blame/master/Source/WebCore/html/track/
> WebVTTParser.cpp#L284 << This shows the parser discarding cues where the
> "-->" separator is not surrounded by whitespace (contrary to the parsing
> algorithm, but in compliance with the syntax section),

In comment 2, which was what I replied to, you claimed the opposite. If WebKit/Blink doesn't conform to the parsing section, then it's not compliant with the spec.

> and
> https://github.com/WebKit/webkit/blame/master/Source/WebCore/html/track/
> WebVTTParser.cpp#L306 shows the parser setting everything from after the end
> timestamp, to the end of the line, as cue-settings (and therefore not caring
> if there is whitespace separating the first cue-setting from the end
> timestamp). This is compliant with the parsing algorithm (although it
> probably shouldn't be), but is not compliant with the syntax spec.

Why shouldn't it be compliant with the parsing algorithm? The parsing algorithm is the requirement that applies to your conformance class. The syntax section does not apply to implementors.

> This is a mishmash, the requirements of these two sections are
> contradictory, and having two different definitions for the same language is
> more of a hinderance than a help. I'm trying to explain this very clearly,
> but it seems like I'm not getting through :(

I understand the issue, but I don't think it should be fixed by making the authoring requirements and parser requirements "match". I don't object to making a few tweaks if people get confused, like omitting whitespace around the arrow.

Note that the entire HTML spec does the same thing (not just the HTML parser). So if you think this is a problem, it exists for most things in HTML, too.
Comment 7 Caitlin Potter (:caitp) 2013-05-21 22:12:36 UTC
>Why shouldn't it be compliant with the parsing algorithm? The parsing algorithm >is the requirement that applies to your conformance class. The syntax section >does not apply to implementors.
I'm saying that <timestamp><cue-setting> without whitespace in between should not be legal. This is technically only legal for the very first cue-setting (which is inconsistent with the rest of it, which is supposed to be split on whitespace).

This particular clause in the parsing algorithm makes the grammar inconsistent and weird.

I should also add, in WebKit/Blink, <cue-setting><cue-setting> with no separating whitespace are also perfectly valid. This isn't valid in the parsing algorithm or the syntax spec, but it just shows the kinds of confusion can happen when these things aren't stated more explicitly, I guess.

>Note that the entire HTML spec does the same thing (not just the HTML parser).
>So if you think this is a problem, it exists for most things in HTML, too.
I'm certainly not saying that HTML/SGML have ever been perfect in this sense before either, considering how many quirks mode problems have come up over the years. Having a uniformity between the language that authors are expected to speak, and parsers are expected to understand, and having that uniformity expressed in a very clear, explicit way, is important -- Otherwise, you're being vague on what the syntax of the grammar actually is, which makes it essentially impossible to get people to consistently write conformant documents and parser implementations.
Comment 8 Silvia Pfeiffer 2013-05-24 11:40:08 UTC
(In reply to comment #7)
> 
> I should also add, in WebKit/Blink, <cue-setting><cue-setting> with no
> separating whitespace are also perfectly valid. This isn't valid in the
> parsing algorithm or the syntax spec, but it just shows the kinds of
> confusion can happen when these things aren't stated more explicitly, I
> guess.

That would be a Webkit/blink bug and worth registering.
Comment 9 Philip Jägenstedt 2014-01-28 04:39:00 UTC
(In reply to Silvia Pfeiffer from comment #8)
> (In reply to comment #7)
> > 
> > I should also add, in WebKit/Blink, <cue-setting><cue-setting> with no
> > separating whitespace are also perfectly valid. This isn't valid in the
> > parsing algorithm or the syntax spec, but it just shows the kinds of
> > confusion can happen when these things aren't stated more explicitly, I
> > guess.
> 
> That would be a Webkit/blink bug and worth registering.

I've tested this in stable versions of Opera (Blink) and Safari (WebKit) and both get the right line and size for

00:00.000 --> 00:01.000line:40% size:50%
text

but get the default line and size if the space between the settings is removed:

00:00.000 --> 00:01.000line:40%size:50%
text

In other words, both do what the spec says in this regard. (I don't know what it was like when this bug was filed.)
Comment 10 Silvia Pfeiffer 2014-01-28 04:46:44 UTC
(In reply to Philip Jägenstedt from comment #9)
> 
> I've tested this in stable versions of Opera (Blink) and Safari (WebKit) and
> both get the right line and size for
> 
> 00:00.000 --> 00:01.000line:40% size:50%
> text
> 
> but get the default line and size if the space between the settings is
> removed:
> 
> 00:00.000 --> 00:01.000line:40%size:50%
> text
> 
> In other words, both do what the spec says in this regard. (I don't know
> what it was like when this bug was filed.)

Hmm, that seems inconsistent. I'd prefer if both broke and reset the settings to default values, since it's the blanks that separate cue settings.

What do you think about requiring a space character after the time stamp?
Comment 11 Philip Jägenstedt 2014-01-28 05:21:10 UTC
Putting aside the question of syntax vs parser (they need not agree), I do find it interesting that the parser requires whitespace between settings, but not between timings and settings.

The way to fix this would be to add a step before "Let remainder be the trailing substring of input starting at position." + "Parse the WebVTT cue settings given by remainder for cue." to say:

"If the character at position is not one of the space characters [HTML5] then abort these steps and return failure."

In other words, "fixing" the parser would only add to it, it's not a simplification as one might have guessed.
Comment 12 Philip Jägenstedt 2014-01-28 05:26:30 UTC
I did a bit of testing, and WebKit, Blink and Presto all do what the spec says here and do not require whitespace between timings and settings.

Gecko, however, does, so I filed a bug:
https://bugzilla.mozilla.org/show_bug.cgi?id=964625

I couldn't get IE11 to parse any of my WebVTT at all, so I gave up.

Since the implementations that have actually shipped and which I could test (WebKit, Blink and Presto) all follow the spec on this point, I'm inclined to just not leave the parser alone. Will resolve this bug as WONTFIX unless Rick Eyre objects that Gecko has deliberately ignored the spec on this point. (I will email him.)
Comment 13 Silvia Pfeiffer 2014-01-28 06:39:38 UTC
(In reply to Philip Jägenstedt from comment #11)
> Putting aside the question of syntax vs parser (they need not agree), I do
> find it interesting that the parser requires whitespace between settings,
> but not between timings and settings.
> 
> The way to fix this would be to add a step before "Let remainder be the
> trailing substring of input starting at position." + "Parse the WebVTT cue
> settings given by remainder for cue." to say:
> 
> "If the character at position is not one of the space characters [HTML5]
> then abort these steps and return failure."

Why not ignore the rest of the line and return success in this case?

> In other words, "fixing" the parser would only add to it, it's not a
> simplification as one might have guessed.

I didn't expect it to simplify things, but make them more logical for authors.

> Since the implementations that have actually shipped and which I could test
> (WebKit, Blink and Presto) all follow the spec on this point, I'm inclined to
> just not leave the parser alone. Will resolve this bug as WONTFIX unless Rick
> Eyre objects that Gecko has deliberately ignored the spec on this point. (I
> will email him.)

I regard (WebKit, Blink, Presto) as one implementation, so I don't think this argument weighs that much.

I don't have IE10/11 at hand, but here's an example that should work and that you could base changes on: http://www.html5labs.com/HTML5CaptionDemo/. FAIK IE10 doesn't support cue settings but IE11 does.
Comment 14 Philip Jägenstedt 2014-01-28 09:20:37 UTC
(In reply to Silvia Pfeiffer from comment #13)
> (In reply to Philip Jägenstedt from comment #11)
> > Putting aside the question of syntax vs parser (they need not agree), I do
> > find it interesting that the parser requires whitespace between settings,
> > but not between timings and settings.
> > 
> > The way to fix this would be to add a step before "Let remainder be the
> > trailing substring of input starting at position." + "Parse the WebVTT cue
> > settings given by remainder for cue." to say:
> > 
> > "If the character at position is not one of the space characters [HTML5]
> > then abort these steps and return failure."
> 
> Why not ignore the rest of the line and return success in this case?

We could do that, but it's still more work than not doing anything.

> > In other words, "fixing" the parser would only add to it, it's not a
> > simplification as one might have guessed.
> 
> I didn't expect it to simplify things, but make them more logical for
> authors.

Are there any other cases where the parser has extra steps only in order to match the syntax?

> > Since the implementations that have actually shipped and which I could test
> > (WebKit, Blink and Presto) all follow the spec on this point, I'm inclined to
> > just not leave the parser alone. Will resolve this bug as WONTFIX unless Rick
> > Eyre objects that Gecko has deliberately ignored the spec on this point. (I
> > will email him.)
> 
> I regard (WebKit, Blink, Presto) as one implementation, so I don't think
> this argument weighs that much.

Presto is Opera's old engine, its implementation of WebVTT has no common code with Blink/WebKit.

> I don't have IE10/11 at hand, but here's an example that should work and
> that you could base changes on: http://www.html5labs.com/HTML5CaptionDemo/.
> FAIK IE10 doesn't support cue settings but IE11 does.

I tried some more, but even if I send the right Content-Type and use the same file as in their example, I just can't get IE11 to parse any cues in my test. In any event their example WebVTT files use the old single-character syntax for settings, so if that's what IE10 implemented then it's not relevant to this bug. (If someone else can determine what IE actually supports and not, that'd be nice.)
Comment 15 Silvia Pfeiffer 2014-01-28 11:39:50 UTC
I guess what bugs me most is that

> 00:00.000 --> 00:01.000line:40% size:50%
> text

gets parsed but

> 00:00.000 --> 00:01.00 line:40% size:50%
> text

and also

> 00:00.000 --> 00:01.00line:40% size:50%
> text

don't. Of the three, I'd find the middle one the most readable and the hardest to see the error.

I guess I'm finding it a bit fragile to have a change between ASCII digits and ASCII letters a field boundary, in particular for a file format that uses newlines and space characters as boundaries elsewhere.

But I can live with it if you think it introduces unnecessary extra effort.

Let's give Rick a chance to reply. :-)
Comment 16 Rick Eyre 2014-01-28 14:39:25 UTC
Just responded to Philip's email, but I'll posterify my comment as well.

I think it's good to have the parser be lenient with syntax so that more often than not the parser understands the authors intent rather than it being nitpick-y with syntax.

There isn't a specific reason we're parsing like this either. It's just how the parser's regex was written initially.

It's a one line fix for Firefox too, so it's not a big deal.
Comment 17 Philip Jägenstedt 2014-01-28 14:44:09 UTC
Thanks for the feedback, Rick! With that information, I'm going to take the path of least resistance and WONTFIX this bug. (Obviously, omitting the space is still going to be invalid in the syntax, so a validator would complain.)
Comment 18 Simon Pieters 2014-01-30 09:56:32 UTC
(In reply to Silvia Pfeiffer from comment #15)
> I guess what bugs me most is that
> 
> > 00:00.000 --> 00:01.000line:40% size:50%
> > text
> 
> gets parsed but
> 
> > 00:00.000 --> 00:01.00 line:40% size:50%
> > text
> 
> and also
> 
> > 00:00.000 --> 00:01.00line:40% size:50%
> > text
> 
> don't. Of the three, I'd find the middle one the most readable and the
> hardest to see the error.

How would you recover the middle one, though? In that example it's "obvious" that it's a 0 that is missing, but what about 00:01.50? is that 500 or 050? What about 00:5:10.000? You can't know what was intended (could be any of [0-5]5 or 5[0-9]), so trying to recover can result in much worse user experience than just dropping the cue (e.g. a cue could be visible in half a movie if you guessed wrong).
Comment 19 Silvia Pfeiffer 2014-01-30 21:19:18 UTC
(In reply to Simon Pieters from comment #18)
> (In reply to Silvia Pfeiffer from comment #15)
> > I guess what bugs me most is that
> > 
> > > 00:00.000 --> 00:01.000line:40% size:50%
> > > text
> > 
> > gets parsed but
> > 
> > > 00:00.000 --> 00:01.00 line:40% size:50%
> > > text
> > 
> > and also
> > 
> > > 00:00.000 --> 00:01.00line:40% size:50%
> > > text
> > 
> > don't. Of the three, I'd find the middle one the most readable and the
> > hardest to see the error.
> 
> How would you recover the middle one, though? In that example it's "obvious"
> that it's a 0 that is missing, but what about 00:01.50? is that 500 or 050?
> What about 00:5:10.000? You can't know what was intended (could be any of
> [0-5]5 or 5[0-9]), so trying to recover can result in much worse user
> experience than just dropping the cue (e.g. a cue could be visible in half a
> movie if you guessed wrong).

Sure. On a logical level, it all makes sense. Just saying that from an author's POV, they all seems like similarly bad authoring mistakes. Anyway, life's not fair. ;-)