This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 5031 - Doctype detection fails if root element includes non "word" character
Summary: Doctype detection fails if root element includes non "word" character
Status: RESOLVED FIXED
Alias: None
Product: Validator
Classification: Unclassified
Component: check (show other bugs)
Version: 0.8.1
Hardware: All All
: P3 normal
Target Milestone: 0.8.2
Assignee: Olivier Thereaux
QA Contact: qa-dev tracking
URL: http://qa-dev.w3.org/HEAD/dev/tests/5...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-09-11 06:35 UTC by Olivier Thereaux
Modified: 2007-09-11 06:50 UTC (History)
0 users

See Also:


Attachments

Description Olivier Thereaux 2007-09-11 06:35:55 UTC
The doctype detection routine in preparse_doctype() has the following regexp to detect FPI and SI:

m(<!DOCTYPE\s+(\w+)\s+(?:PUBLIC|SYSTEM)\s+...
the first (\w+) is the name of the document type, which has to be the root element
(ref: http://www.w3.org/TR/xml/#vc-roottype )
but the \w+ is incorrect, as the root element can (among others) have a dash or dot.
(ref: http://www.w3.org/TR/xml/#IDANQDS )

This half-breaks detection of the doctype for languages with root element including non "perl word (alphanum plus _)" characters.
Comment 1 Olivier Thereaux 2007-09-11 06:50:39 UTC
Patched in http://lists.w3.org/Archives/Public/www-validator-cvs/2007Sep/0071.html

Adding test case. 

The validator should now report on this test case as:
This Page Is Valid -//Recordare//DTD MusicXML 1.1 Partwise//EN!
rather than
This Page Is Valid XML!