This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 22 - Custom DTDs always treated as SGML (vs. XML).
Summary: Custom DTDs always treated as SGML (vs. XML).
Alias: None
Product: Validator
Classification: Unclassified
Component: check (show other bugs)
Version: 0.6.0b1
Hardware: All All
: P2 normal
Target Milestone: 1.0
Assignee: Olivier Thereaux
QA Contact: qa-dev tracking
Depends on: 739 1500
  Show dependency treegraph
Reported: 2002-10-25 02:55 UTC by Terje Bless
Modified: 2008-12-02 22:42 UTC (History)
1 user (show)

See Also:


Description Terje Bless 2002-10-25 02:55:04 UTC
Reported by Klaus Johannes Rusch:

The beta validator at indicates the
following document is not valid:

Any ideas why this is failing (the validator seems to not allow XHTML
notation with custom DTDs, validating the same document against the W3C
XHTML DTD works).
Comment 1 Terje Bless 2002-10-27 11:30:06 UTC
When a document with a custom DTD (not a "Well Known" FPI) is served as
text/html it will always be treated as SGML and not XML. This is hard to avoid,
at least in the current code. To enable custom XML DTDs you need to use an XML

Setting target to 0.7.0 to investigate this again at that time.
Comment 2 Terje Bless 2002-10-30 23:35:13 UTC
As per the further comments from Klaus Johannes Rusch,

perhaps we should add a "Force XML Mode" setting to allow the "We must have text/html 
for the older browsers" crowd to use custom XML DTDs.
Comment 3 Terje Bless 2004-09-01 16:46:08 UTC
Retarget for 1.0. Won't make the cut for 0.7.
Comment 4 Sam Minnee 2005-03-11 04:15:43 UTC
The following page sets Content-type to application/xml, and uses a custom DTD,
but still doesn't validate.

An alternative validator succeeds in validating it:;xml=yes
Comment 5 Sam Minnee 2005-03-11 04:19:56 UTC
It doesn't work with text/xml either :P
Comment 6 Olivier Thereaux 2006-08-30 02:50:41 UTC
This looks very similar to the problem described in Bug #1500, where it is argued that the sgml/xml mode switch should be based on content-type, before pre-parsing, and not based on pre-parsing and detection of a known (or unknown in this case) doctype. Setting dependency accordingly.
Comment 7 Olivier Thereaux 2007-03-16 06:57:20 UTC
There were some problems in the logic of the parse mode detection, which indeed did not properly take into account the media type in choosing a parse mode for documents with unknown document types (including custom DTDs). 

Fixed now in CVS:
Comment 8 Olivier Thereaux 2007-03-16 12:55:15 UTC
rewriting the parse mode selection routine entirely fixes this.
has the CVS diff for the refactoring.

The logic goes:
* if neither content type nor doctype are helpful, 
  => throw warning, use SGML as fallback
* in case of an unknown doctype but useful mime type 
  (generally, XML mime type)
  => follow the mime type
* in case of an ambiguous mime type (text/html) but well-known doctype
  (any HTML served as text/html...)
  => follow the doctype 
* if neither are ambiguous, but they collide
  => throw warning, follow the mime type

This was tested with the documents in the catalogue, documents outside the catalogue (custom DTDs), document served with the wrong mime type (XHTML 1.1 as text/html, html 4.01 as application/xhtml+xml etc.) successfully, see:

Considering this fixed. If reopening, please provide clear test cases, thank you.