This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 6020 - Validator incorrectly uses strings as elements
Summary: Validator incorrectly uses strings as elements
Status: RESOLVED INVALID
Alias: None
Product: Validator
Classification: Unclassified
Component: check (show other bugs)
Version: HEAD
Hardware: PC Linux
: P2 normal
Target Milestone: ---
Assignee: This bug has no owner yet - up for the taking
QA Contact: qa-dev tracking
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-09-02 18:37 UTC by Kevin Hunter
Modified: 2008-09-04 17:49 UTC (History)
2 users (show)

See Also:


Attachments
HTML 4.01 Strict compliant file (305 bytes, text/html)
2008-09-02 18:37 UTC, Kevin Hunter
Details
patch against previous attachment to highlight parser bug (458 bytes, text/plain)
2008-09-02 18:39 UTC, Kevin Hunter
Details
Example HTML (446 bytes, text/html)
2008-09-04 16:51 UTC, Jason
Details
XHTML with no ETAGO in script (443 bytes, text/html)
2008-09-04 17:30 UTC, Jason
Details

Description Kevin Hunter 2008-09-02 18:37:19 UTC
Created attachment 566 [details]
HTML 4.01 Strict compliant file

Basically, the parser (0.8.3 I think) is interpreting text inside of a Javascript string as tags.

This is best highlighted by the two attachments I'll add.  The first attachment is HTML 4.01 Strict compliant.

The second attachment is a unified diff that will break the parser.  The parser will assume that strings in line 9 and 10 begin elements, but this is incorrect.
Comment 1 Kevin Hunter 2008-09-02 18:39:24 UTC
Created attachment 567 [details]
patch against previous attachment to highlight parser bug

$ patch -o html_borked.html < html_diff.diff

Basically, the parser will think that lines 9 and 10 begin a '<script>' tag, and also end a '</scr>' tag.
Comment 2 Olivier Thereaux 2008-09-04 15:21:07 UTC
(In reply to comment #0)
> Basically, the parser (0.8.3 I think) is interpreting text inside of a
> Javascript string as tags.

Which it should, per the specification.

See e.g: http://htmlhelp.com/tools/validator/problems.html#script

Comment 3 Kevin Hunter 2008-09-04 15:46:19 UTC
Doh!  And there's even a link to an FAQ about scripts sections in the output.  Thanks.
Comment 4 Jason 2008-09-04 16:51:53 UTC
Created attachment 575 [details]
Example HTML
Comment 5 Jason 2008-09-04 16:57:59 UTC
Yes, however, it should not be semantically parsing HTML content in script
tags. In other words, as long as the content of the script tag is valid XML,
should it not be valid? For instance, in my attachment I receive two errors.
The first states I have an invalid value for my 'id' attribute. And the second
that the element 'li' does not belong there.
Comment 6 Olivier Thereaux 2008-09-04 17:22:20 UTC
(In reply to comment #5)
> Yes, however, it should not be semantically parsing HTML content in script
> tags. 

Like it or not, that is what the specifications for (X)HTML say. e.g.
http://www.w3.org/TR/html4/appendix/notes.html#h-B.3.2.1

Comment 7 Jason 2008-09-04 17:30:13 UTC
Created attachment 576 [details]
XHTML with no ETAGO in script
Comment 8 Jason 2008-09-04 17:33:51 UTC
I've added an attachment that has html content nested inside the script tag. Per the spec at http://www.w3.org/TR/html4/appendix/notes.html#h-B.3.2.1 there is no ETAGO ("</...") which terminates the script tag.  Still I am getting two validation errors which are incorrect per the spec.
Comment 9 Olivier Thereaux 2008-09-04 17:49:56 UTC
(In reply to comment #8)
> I've added an attachment that has html content nested inside the script tag.
> Per the spec at http://www.w3.org/TR/html4/appendix/notes.html#h-B.3.2.1 there
> is no ETAGO ("</...") which terminates the script tag.  Still I am getting two
> validation errors which are incorrect per the spec.

I'm afraid not. Your example has markup (inside a <script>, but the point is, it does NOT matter to an HTML parser) including an id starting with a $ sign. That's not valid.