This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 5031 - Doctype detection fails if root element includes non "word" character
Summary: Doctype detection fails if root element includes non "word" character
Alias: None
Product: Validator
Classification: Unclassified
Component: check (show other bugs)
Version: 0.8.1
Hardware: All All
: P3 normal
Target Milestone: 0.8.2
Assignee: Olivier Thereaux
QA Contact: qa-dev tracking
Depends on:
Reported: 2007-09-11 06:35 UTC by Olivier Thereaux
Modified: 2007-09-11 06:50 UTC (History)
0 users

See Also:


Description Olivier Thereaux 2007-09-11 06:35:55 UTC
The doctype detection routine in preparse_doctype() has the following regexp to detect FPI and SI:

the first (\w+) is the name of the document type, which has to be the root element
(ref: )
but the \w+ is incorrect, as the root element can (among others) have a dash or dot.
(ref: )

This half-breaks detection of the doctype for languages with root element including non "perl word (alphanum plus _)" characters.
Comment 1 Olivier Thereaux 2007-09-11 06:50:39 UTC
Patched in

Adding test case. 

The validator should now report on this test case as:
This Page Is Valid -//Recordare//DTD MusicXML 1.1 Partwise//EN!
rather than
This Page Is Valid XML!