This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 24271 - Document.createElement name validation inconsistent with HTML parse rules
Summary: Document.createElement name validation inconsistent with HTML parse rules
Status: RESOLVED LATER
Alias: None
Product: WebAppsWG
Classification: Unclassified
Component: DOM (show other bugs)
Version: unspecified
Hardware: PC Linux
: P2 normal
Target Milestone: ---
Assignee: Anne
QA Contact: public-webapps-bugzilla
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-01-11 02:17 UTC by Pete Blois
Modified: 2014-02-12 19:41 UTC (History)
5 users (show)

See Also:


Attachments

Description Pete Blois 2014-01-11 02:17:19 UTC
Document.createElement name validation rules (https://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#dom-document-createelement) uses different validation from the HTML parser (http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#tag-name-state). This means that elements can be created via the parser which cannot be created via createElement.

The parser rules are much more permissive. For what it's worth, I do not believe that any two browsers support the same exact set of unicode characters for document.createElement.
Comment 1 Domenic Denicola 2014-01-11 02:32:20 UTC
I too have found this troubling, but I was told there may be some compat risks with relaxing this restriction when I last asked about it.
Comment 2 Ms2ger 2014-01-11 07:52:11 UTC
(In reply to Pete Blois from comment #0)
> I do not believe that any two browsers support the same exact
> set of unicode characters for document.createElement.

Can you give examples of inputs that are handled differently?
Comment 3 Pete Blois 2014-01-12 00:11:33 UTC
Examples:
document.createElement('\u0083'); (FF 26- no, Chrome 31- no, IE 11- yes)
document.createElement('\u00b5'); (FF 26- yes, Chrome 31- no, IE 11- yes)
document.createElement('\u01f6'); (FF 26- no, Chrome 32- yes, IE 11- yes)
document.createElement('\u01f7'); (FF 26- no, Chrome 32- yes, IE 11- no)

The XML spec changed the validation from XML 1.0 Fourth Edition to XML 1.0 Fifth Edition- fourth edition was based on unicode character sets (http://www.w3.org/TR/2006/REC-xml-20060816/#NT-Letter)
Comment 4 Anne 2014-01-13 11:41:33 UTC
I'm not going to make 4th vs 5th edition my problem.

However, only the last two of your examples are testing the difference between those as far as I can tell. The first two should throw for either edition.

So only Chrome makes some sense (though following the 5th edition is questionable).
Comment 5 Henri Sivonen 2014-01-13 11:53:27 UTC
I'm opposed to making any XML 1.0 5th edition-motivated code changes in Gecko. I think we should stick to 4th ed. until XML5/XML-ER arrives some day.

If it doesn't break the Web, I'd be OK with permitting whatever the HTML parser can create.
Comment 6 Anne 2014-01-13 14:10:57 UTC
As far as I can tell Chrome is compliant per 5th. Firefox wants to follow 4th but actually does not do that (it allows U+00B5 in XML). Not sure about IE11. However, none of that makes it clear whether we can allow the kind of names HTML allows, such as "x,".

I recommend that if people want to take this further they get browsers to experiment with what is possible. I'd be happy to throw less exceptions here. I'm also happy to break the tie with XML if it's demonstrated that is possible.

If you file bugs on browsers to do those experiments please cross-link with this bug.