<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>22</bug_id>
          
          <creation_ts>2002-10-25 02:55:04 +0000</creation_ts>
          <short_desc>Custom DTDs always treated as SGML (vs. XML).</short_desc>
          <delta_ts>2008-12-02 22:42:33 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>Validator</product>
          <component>check</component>
          <version>0.6.0b1</version>
          <rep_platform>All</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>1.0</target_milestone>
          <dependson>739</dependson>
    
    <dependson>1500</dependson>
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Terje Bless">link</reporter>
          <assigned_to name="Olivier Thereaux">ot</assigned_to>
          <cc>zzz</cc>
          
          <qa_contact name="qa-dev tracking">www-validator-cvs</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>24</commentid>
    <comment_count>0</comment_count>
    <who name="Terje Bless">link</who>
    <bug_when>2002-10-25 02:55:04 +0000</bug_when>
    <thetext>Reported by Klaus Johannes Rusch:

The beta validator at http://validator.w3.org:8001/ indicates the
following document is not valid:

&lt;!DOCTYPE html SYSTEM
          &quot;http://www.ibm.com/data/dtd/v11/ibmxhtml1-transitional.dtd&quot;&gt;
[...]
Any ideas why this is failing (the validator seems to not allow XHTML
notation with custom DTDs, validating the same document against the W3C
XHTML DTD works).</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>94</commentid>
    <comment_count>1</comment_count>
    <who name="Terje Bless">link</who>
    <bug_when>2002-10-27 11:30:06 +0000</bug_when>
    <thetext>When a document with a custom DTD (not a &quot;Well Known&quot; FPI) is served as
text/html it will always be treated as SGML and not XML. This is hard to avoid,
at least in the current code. To enable custom XML DTDs you need to use an XML
Content-Type.

Setting target to 0.7.0 to investigate this again at that time.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>125</commentid>
    <comment_count>2</comment_count>
    <who name="Terje Bless">link</who>
    <bug_when>2002-10-30 23:35:13 +0000</bug_when>
    <thetext>As per the further comments from Klaus Johannes Rusch,

perhaps we should add a &quot;Force XML Mode&quot; setting to allow the &quot;We must have text/html 
for the older browsers&quot; crowd to use custom XML DTDs.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>2214</commentid>
    <comment_count>3</comment_count>
    <who name="Terje Bless">link</who>
    <bug_when>2004-09-01 16:46:08 +0000</bug_when>
    <thetext>Retarget for 1.0. Won&apos;t make the cut for 0.7.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>3894</commentid>
    <comment_count>4</comment_count>
    <who name="Sam Minnee">sam</who>
    <bug_when>2005-03-11 04:15:43 +0000</bug_when>
    <thetext>The following page sets Content-type to application/xml, and uses a custom DTD,
but still doesn&apos;t validate.

http://validator.w3.org/check?uri=http%3A%2F%2Fdev.silverstripe.com%2Fplay%2Fcustom-dtd.php

An alternative validator succeeds in validating it:

http://www.htmlhelp.com/cgi-bin/validate.cgi?url=http%3A%2F%2Fdev.silverstripe.com%2Fplay%2Fcustom-dtd.html&amp;amp;xml=yes</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>3895</commentid>
    <comment_count>5</comment_count>
    <who name="Sam Minnee">sam</who>
    <bug_when>2005-03-11 04:19:56 +0000</bug_when>
    <thetext>It doesn&apos;t work with text/xml either :P</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>11314</commentid>
    <comment_count>6</comment_count>
    <who name="Olivier Thereaux">ot</who>
    <bug_when>2006-08-30 02:50:41 +0000</bug_when>
    <thetext>This looks very similar to the problem described in Bug #1500, where it is argued that the sgml/xml mode switch should be based on content-type, before pre-parsing, and not based on pre-parsing and detection of a known (or unknown in this case) doctype. Setting dependency accordingly.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>14457</commentid>
    <comment_count>7</comment_count>
    <who name="Olivier Thereaux">ot</who>
    <bug_when>2007-03-16 06:57:20 +0000</bug_when>
    <thetext>There were some problems in the logic of the parse mode detection, which indeed did not properly take into account the media type in choosing a parse mode for documents with unknown document types (including custom DTDs). 

Fixed now in CVS:
http://lists.w3.org/Archives/Public/www-validator-cvs/2007Mar/0092.html</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>14464</commentid>
    <comment_count>8</comment_count>
    <who name="Olivier Thereaux">ot</who>
    <bug_when>2007-03-16 12:55:15 +0000</bug_when>
    <thetext>rewriting the parse mode selection routine entirely fixes this.

http://lists.w3.org/Archives/Public/www-validator-cvs/2007Mar/0095.html
has the CVS diff for the refactoring.

The logic goes:
* if neither content type nor doctype are helpful, 
  =&gt; throw warning, use SGML as fallback
* in case of an unknown doctype but useful mime type 
  (generally, XML mime type)
  =&gt; follow the mime type
* in case of an ambiguous mime type (text/html) but well-known doctype
  (any HTML served as text/html...)
  =&gt; follow the doctype 
* if neither are ambiguous, but they collide
  =&gt; throw warning, follow the mime type

This was tested with the documents in the catalogue, documents outside the catalogue (custom DTDs), document served with the wrong mime type (XHTML 1.1 as text/html, html 4.01 as application/xhtml+xml etc.) successfully, see:
http://qa-dev.w3.org/wmvs/HEAD/dev/tests/

Considering this fixed. If reopening, please provide clear test cases, thank you. </thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>