This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 139 - Validator should not attempt to parse incorrect doctypes with SGML parser
Summary: Validator should not attempt to parse incorrect doctypes with SGML parser
Alias: None
Product: Validator
Classification: Unclassified
Component: check (show other bugs)
Version: 0.7.0
Hardware: All All
: P2 enhancement
Target Milestone: 0.7.0
Assignee: Terje Bless
QA Contact:
Depends on:
Blocks: 856
  Show dependency treegraph
Reported: 2003-02-06 16:30 UTC by David P James
Modified: 2006-09-23 19:50 UTC (History)
0 users

See Also:


Description David P James 2003-02-06 16:30:34 UTC
Many of my webpages were originally created with Netscape 4.x's composer, and,
as a result, have incorrect doctypes (bad case in certain words):
<!doctype html public "-//w3c//dtd html 4.0 transitional//en">

I have been trying to fix the errors in those pages to get them up to standard
and was attempting to use W3C's validator to help me do so. Unfortunately, when
presented with pages with such a doctype the validator balks and then sends it
through the SGML parser, causing it to essentially "invent" errors where there
aren't any.

What it should have done is simply report that an incorrect doctype was found
and that no further validation was done, and then offer a drop down box of
doctypes to choose from to continue the validation (as is done when no doctype
is present).

I created a valid test page and then made a copy with one change - I changed the
doctype to the old NS4.x doctype (plus a 1):
<!doctype html public "-//w3c//dtd html 4.01 transitional//en">

These two pages are identical except for the above difference. The first passes,
the second fails with 80 errors, flagging as it does things that are actually
valid HTML.

The validation results from the latter are less than helpful and arguably
misleading and inaccurate, as it indicates that there are 80 errors when there
are in fact less than a half a dozen. There is also no information as to where
one might find out about valid doctypes (an "explain..." link would be helpful
Comment 1 Chris Neale 2003-02-06 16:47:45 UTC
showing interest
Comment 2 Terje Bless 2003-02-07 02:08:38 UTC
David: Please attach the problematic page to this bug report so we have it on
hand unless you are absolutely certain that that URL will remain static.

The problem is that we do not detect the invalid DOCTYPE until we feed the
document to the SGML Parser; but, yes, we should detect this particular error
and emit a fatal error rather then a gazillion meaningless "undefined element"

This does have some side-effects that wants thinking about. The five first
messages reported are separate messages from the SGML Parser for a reason. There
actually are multiple separate errors here, each of which may appear independant
of the others and do not necessarily indicate a fatal error. i.e. it's possible
to have a combination of several of those messages and still get meaningfull
results for the rest of the document.

Hopefully, though, this is sufficiently rare in practice that we can allow
ourselves to overstate the importance (by emitting a fatal error) of these messages.

Setting target to 0.7.0 (aka. "When I Get a Round Tuit") since this wants
classification as a feature enhancement and 0.6.x is nominally "frozen".
Comment 3 Terje Bless 2004-09-01 13:12:18 UTC
Nominating for 0.7.0. Our behaviour has changed in this area during the 0.6.x
series, so this may now be a question of presentation now.
Comment 4 Olivier Thereaux 2004-09-09 05:29:45 UTC shows this seems to be fixed.
Comment 5 Terje Bless 2004-09-09 18:17:43 UTC
Indeed it does. And it doesn't depend on Bug #739 either. Closing as FIXED.