<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>139</bug_id>
          
          <creation_ts>2003-02-06 16:30:34 +0000</creation_ts>
          <short_desc>Validator should not attempt to parse incorrect doctypes with SGML parser</short_desc>
          <delta_ts>2006-09-23 19:50:22 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>Validator</product>
          <component>check</component>
          <version>0.7.0</version>
          <rep_platform>All</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc>http://members.rogers.com/dpjames/invalid.html</bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>enhancement</bug_severity>
          <target_milestone>0.7.0</target_milestone>
          
          <blocked>856</blocked>
          <everconfirmed>1</everconfirmed>
          <reporter name="David P James">davidpjames</reporter>
          <assigned_to name="Terje Bless">link</assigned_to>
          
          
          

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>338</commentid>
    <comment_count>0</comment_count>
    <who name="David P James">davidpjames</who>
    <bug_when>2003-02-06 16:30:34 +0000</bug_when>
    <thetext>Many of my webpages were originally created with Netscape 4.x&apos;s composer, and,
as a result, have incorrect doctypes (bad case in certain words):
&lt;!doctype html public &quot;-//w3c//dtd html 4.0 transitional//en&quot;&gt;

I have been trying to fix the errors in those pages to get them up to standard
and was attempting to use W3C&apos;s validator to help me do so. Unfortunately, when
presented with pages with such a doctype the validator balks and then sends it
through the SGML parser, causing it to essentially &quot;invent&quot; errors where there
aren&apos;t any.

What it should have done is simply report that an incorrect doctype was found
and that no further validation was done, and then offer a drop down box of
doctypes to choose from to continue the validation (as is done when no doctype
is present).

I created a valid test page and then made a copy with one change - I changed the
doctype to the old NS4.x doctype (plus a 1):
&lt;!doctype html public &quot;-//w3c//dtd html 4.01 transitional//en&quot;&gt;

These two pages are identical except for the above difference. The first passes,
the second fails with 80 errors, flagging as it does things that are actually
valid HTML.
http://members.rogers.com/dpjames/valid.html
http://members.rogers.com/dpjames/invalid.html
==&gt;http://validator.w3.org/check?uri=http%3A%2F%2Fmembers.rogers.com%2Fdpjames%2Finvalid.html

The validation results from the latter are less than helpful and arguably
misleading and inaccurate, as it indicates that there are 80 errors when there
are in fact less than a half a dozen. There is also no information as to where
one might find out about valid doctypes (an &quot;explain...&quot; link would be helpful
here).</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>339</commentid>
    <comment_count>1</comment_count>
    <who name="Chris Neale">w3cBugs</who>
    <bug_when>2003-02-06 16:47:45 +0000</bug_when>
    <thetext>showing interest</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>340</commentid>
    <comment_count>2</comment_count>
    <who name="Terje Bless">link</who>
    <bug_when>2003-02-07 02:08:38 +0000</bug_when>
    <thetext>David: Please attach the problematic page to this bug report so we have it on
hand unless you are absolutely certain that that URL will remain static.


The problem is that we do not detect the invalid DOCTYPE until we feed the
document to the SGML Parser; but, yes, we should detect this particular error
and emit a fatal error rather then a gazillion meaningless &quot;undefined element&quot;
messages.

This does have some side-effects that wants thinking about. The five first
messages reported are separate messages from the SGML Parser for a reason. There
actually are multiple separate errors here, each of which may appear independant
of the others and do not necessarily indicate a fatal error. i.e. it&apos;s possible
to have a combination of several of those messages and still get meaningfull
results for the rest of the document.

Hopefully, though, this is sufficiently rare in practice that we can allow
ourselves to overstate the importance (by emitting a fatal error) of these messages.

Setting target to 0.7.0 (aka. &quot;When I Get a Round Tuit&quot;) since this wants
classification as a feature enhancement and 0.6.x is nominally &quot;frozen&quot;.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>2204</commentid>
    <comment_count>3</comment_count>
    <who name="Terje Bless">link</who>
    <bug_when>2004-09-01 13:12:18 +0000</bug_when>
    <thetext>Nominating for 0.7.0. Our behaviour has changed in this area during the 0.6.x
series, so this may now be a question of presentation now.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>2293</commentid>
    <comment_count>4</comment_count>
    <who name="Olivier Thereaux">ot</who>
    <bug_when>2004-09-09 05:29:45 +0000</bug_when>
    <thetext>http://qa-dev.w3.org/wmvs/HEAD/check?
uri=http%3A%2F%2Fmembers.rogers.com%2Fdpjames%2Finvalid.html shows this seems to be fixed.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>2305</commentid>
    <comment_count>5</comment_count>
    <who name="Terje Bless">link</who>
    <bug_when>2004-09-09 18:17:43 +0000</bug_when>
    <thetext>Indeed it does. And it doesn&apos;t depend on Bug #739 either. Closing as FIXED.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>