This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 6125 - HTML validator fails for "--" within comment blocks
Summary: HTML validator fails for "--" within comment blocks
Status: RESOLVED INVALID
Alias: None
Product: HTML Checker
Classification: Unclassified
Component: General (show other bugs)
Version: unspecified
Hardware: All All
: P5 critical
Target Milestone: ---
Assignee: This bug has no owner yet - up for the taking
QA Contact: qa-dev tracking
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-09-29 16:04 UTC by Frank
Modified: 2015-08-23 07:07 UTC (History)
2 users (show)

See Also:


Attachments

Description Frank 2008-09-29 16:04:57 UTC
The follow html codes will fail when validating using w3 validator

<!-- this is a -- comment -->

The "--" insider comment block causes validation to fail.
Comment 1 Dean Edridge 2008-09-29 16:49:59 UTC
The (X)HTML5 spec says that the sequence of two dashes ("--") is non conforming within a comment.

quoting: http://www.whatwg.org/specs/web-apps/current-work/#comments

" ... nor contain two consecutive U+002D HYPHEN-MINUS (-) characters, ... "

Thanks
Comment 2 Frank 2008-09-29 21:02:19 UTC
It is fine that comments start with "<!--" and end with "-->". I wonder why there is a restriction for not having "-" or "--" inside the comment block.

Is there any technical reason? If the parser sees "<!--", the next thing to look for is probably "-->", I wonder why "--" in between is checked?

The major browsers have no problem to parse such html files, so it is almost impossible for webmaster to catch such error, unless our browsers show errors so webmaster would find the problem right away, otherwise, this should be fixed...

And I don't think it is difficult to fix?

Any comments?

Thanks.
Comment 3 Damien B 2008-09-29 21:10:24 UTC
"Is there any technical reason? If the parser sees "<!--", the next thing to
look for is probably "-->", I wonder why "--" in between is checked?"

Because, as stated in the specification, http://www.w3.org/TR/html4/intro/sgmltut.html#h-3.2.4 , the comment end delimiter is "--" not "-->".

Comment 4 Frank 2008-09-29 22:06:58 UTC
The specification is fine.

It says the comments end with "--" followed by ">", and there could be white spaces in between.

Which means if there is "--" but not followed by ">", it should not be considered as the end of the comment and the parser should continue looking for end of comments...

In another word, from the specification, if there are non-white space characters in between "--" and ">", it is not the end of comments and should not be considered illegal codes.

At least the majority browsers allow the codes...

If there is no restriction on "--" followed by ">", then I could understand the problem, but since there is already a definitive restriction that "--" must be followed by ">" (with white spaces allowed in between), then all the arguments above impose more restrictions not specified in the specifications...


Thanks
Comment 5 Dean Edridge 2008-09-30 04:35:31 UTC
(In reply to comment #4)
> The specification is fine.
> 
> It says the comments end with "--" followed by ">", and there could be white
> spaces in between.
> 
> Which means if there is "--" but not followed by ">", it should not be
> considered as the end of the comment and the parser should continue looking for
> end of comments...
> 
> In another word, from the specification, if there are non-white space
> characters in between "--" and ">",

> it is not the end of comments and should
> not be considered illegal codes.

The spec is both HTML and XHTML, meaning that the vocabulary will/can be parsed by both HTML and XML parsers. The HTML5 spec has been developed to have as little as possible differences between HTML and XHTML. The sequence "--" is a fatal error in XML which is perhaps why the editors of the spec decided to make it non-conforming in HTML as well. 

> 
> At least the majority browsers allow the codes...

That's not really a good reason, there may be other situations where it doesn't work correctly.

> 
> If there is no restriction on "--" followed by ">", then I could understand the
> problem, but since there is already a definitive restriction that "--" must be
> followed by ">" (with white spaces allowed in between), then all the arguments
> above impose more restrictions not specified in the specifications...
> 
> 
> Thanks
> 

If people disagree with what the spec says or don't understand the rational for decisions such as these, they are welcome to post comments on the: 
public-html-comments@w3.org mailing list. If anyone wants to subscribe to this list, just follow this link: http://lists.w3.org/#public-html-comments and you'll see the options.

Also, when replying on bugzilla you're supposed to click the "reply" link along side the last posters name, and not use the "Additional Comments:" section. That was my mistake earlier, sorry.  

Thanks
Comment 6 Frank 2008-10-01 18:01:42 UTC
(In reply to comment #5)
> (In reply to comment #4)
> > The specification is fine.
> > 
> > It says the comments end with "--" followed by ">", and there could be white
> > spaces in between.
> > 
> > Which means if there is "--" but not followed by ">", it should not be
> > considered as the end of the comment and the parser should continue looking for
> > end of comments...
> > 
> > In another word, from the specification, if there are non-white space
> > characters in between "--" and ">",
> 
> > it is not the end of comments and should
> > not be considered illegal codes.
> 
> The spec is both HTML and XHTML, meaning that the vocabulary will/can be parsed
> by both HTML and XML parsers. The HTML5 spec has been developed to have as
> little as possible differences between HTML and XHTML. The sequence "--" is a
> fatal error in XML which is perhaps why the editors of the spec decided to make
> it non-conforming in HTML as well. 
> 

I am not sure this is a valid argument. If we want to make HTML to follow XML's rules, then we probably should not have HTML...

I am not sure what language the parser is written, if it is C++, it should not really take much time to override a virtual method with the help of regular expression...


> > 
> > At least the majority browsers allow the codes...
> 
> That's not really a good reason, there may be other situations where it doesn't
> work correctly.
> 
> > 
> > If there is no restriction on "--" followed by ">", then I could understand the
> > problem, but since there is already a definitive restriction that "--" must be
> > followed by ">" (with white spaces allowed in between), then all the arguments
> > above impose more restrictions not specified in the specifications...
> > 
> > 
> > Thanks
> > 
> 
> If people disagree with what the spec says or don't understand the rational for
> decisions such as these, they are welcome to post comments on the: 
> public-html-comments@w3.org mailing list. If anyone wants to subscribe to this
> list, just follow this link: http://lists.w3.org/#public-html-comments and
> you'll see the options.
> 
> Also, when replying on bugzilla you're supposed to click the "reply" link along
> side the last posters name, and not use the "Additional Comments:" section.
> That was my mistake earlier, sorry.  
> 
> Thanks
> 
My bad, sorry...