25290 – [Custom]: Ban uppercase and leading "xml" in custom element names?

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 25290 - [Custom]: Ban uppercase and leading "xml" in custom element names?

Summary: [Custom]: Ban uppercase and leading "xml" in custom element names?

Status:	RESOLVED FIXED

Alias:	None

Product:	WebAppsWG
Classification:	Unclassified
Component:	HISTORICAL - Component Model (show other bugs)
Version:	unspecified
Hardware:	PC All

Importance:	P2 normal
Target Milestone:	---
Assignee:	Dimitri Glazkov
QA Contact:	public-webapps-bugzilla

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	14968
	Show dependency tree / graph

Reported:	2014-04-08 07:19 UTC by Simon Pieters
Modified:	2014-05-13 13:52 UTC (History)
CC List:	8 users (show)

See Also:

Attachments

Description Simon Pieters 2014-04-08 07:19:14 UTC

http://w3c.github.io/webcomponents/spec/custom/#dfn-custom-element-type

HTML has two sets of open-ended attribute names which have more restrictions: can't contain ASCII uppercase and can't begin with "xml" (ASCII-case-insensitively).

http://www.whatwg.org/specs/web-apps/current-work/multipage/elements.html#embedding-custom-non-visible-data-with-the-data-*-attributes 
http://www.whatwg.org/specs/web-apps/current-work/multipage/the-iframe-element.html#the-embed-element

The ASCII uppercase is because the HTML parser and some DOM APIs case-fold to lowercase so using uppercase in the actual name means trouble.

I guess leading "xml" is reserved by XML Core WG.

We could also require for element names that the first character is [a-z] since the HTML parser will not parse something into an element otherwise.

Comment 1 Anne 2014-04-08 17:02:20 UTC

I disagree with this. We should not enforce constraints that are not violations of namespace well-formedness (if we even should enforce those).

Comment 2 Ryosuke Niwa 2014-04-08 23:27:09 UTC

(In reply to Simon Pieters from comment #0)
> http://w3c.github.io/webcomponents/spec/custom/#dfn-custom-element-type
> 
> HTML has two sets of open-ended attribute names which have more
> restrictions: can't contain ASCII uppercase and can't begin with "xml"
> (ASCII-case-insensitively).
>
> I guess leading "xml" is reserved by XML Core WG.

That sounds reasonable.

> The ASCII uppercase is because the HTML parser and some DOM APIs case-fold
> to lowercase so using uppercase in the actual name means trouble.
> 
> We could also require for element names that the first character is [a-z]
> since the HTML parser will not parse something into an element otherwise.

So are these requirements.

(In reply to Anne from comment #1)
> I disagree with this. We should not enforce constraints that are not
> violations of namespace well-formedness (if we even should enforce those).

Do you think there are use cases that are harmed by adding restrictions like this?  Do you think people would like to use uppercases in element names, etc... in practice?

Comment 3 Robin Berjon 2014-04-14 10:02:28 UTC

(In reply to Simon Pieters from comment #0)
> I guess leading "xml" is reserved by XML Core WG.

This is reserved so that XML Core can potentially add elements using their own protected naming. I don't think that we should enforce this because:

  1) It seems extremely unlikely to ever happen.
  2) XML parsers don't enforce this (that I've ever seen).
  3) Because of (2), it is not entirely rare for people to actually use "xml" in their element names.
  4) In the unlikely event that XML Core were to create such an element, I'm guessing it would be far more likely for it to ever be supported as a custom element than directly by the browser.
  5) It's unclear to me why XML Core would need this given that they can use namespaces if they want to, too (notably xml:).

Comment 4 Simon Pieters 2014-04-14 21:40:35 UTC

(In reply to Robin Berjon from comment #3)
> This is reserved so that XML Core can potentially add elements using their
> own protected naming. I don't think that we should enforce this because:
> 
>   1) It seems extremely unlikely to ever happen.

Maybe you can convince the XML people to drop the reservation?

>   2) XML parsers don't enforce this (that I've ever seen).

That's by design. It'd be hard to introduce a new element if parsers rejected it. :-)

>   3) Because of (2), it is not entirely rare for people to actually use
> "xml" in their element names.

[citation needed]

>   4) In the unlikely event that XML Core were to create such an element, I'm
> guessing it would be far more likely for it to ever be supported as a custom
> element than directly by the browser.

The reason here would be to allow the XML Core WG to mint a new element that they have reserved for, and for browsers to implement it, without custom elements having poisoned the name already. This is the same reason we require the dash.

>   5) It's unclear to me why XML Core would need this given that they can use
> namespaces if they want to, too (notably xml:).

I thought namespaces were uncool? :-P

Anyway, if we choose to ignore XML's reserved prefix we should drop that requirement from HTML's attributes also.

Comment 5 Robin Berjon 2014-04-15 09:09:12 UTC

(In reply to Simon Pieters from comment #4)
> (In reply to Robin Berjon from comment #3)
> > This is reserved so that XML Core can potentially add elements using their
> > own protected naming. I don't think that we should enforce this because:
> > 
> >   1) It seems extremely unlikely to ever happen.
> 
> Maybe you can convince the XML people to drop the reservation?

I'm happy to contact them if you think it helps.

> >   2) XML parsers don't enforce this (that I've ever seen).
> 
> That's by design. It'd be hard to introduce a new element if parsers
> rejected it. :-)

You'd think that, but this was actually a heated debate when xml:id came out.

> >   3) Because of (2), it is not entirely rare for people to actually use
> > "xml" in their element names.
> 
> [citation needed]

Well, I won't dispute that it's tough to get hard data on this, especially since search engines drop the "<" and web corpora don't have much content. Here's my experience for this: I spent a solid decade on XML support lists and four years on a job that exposed me to schemata written by a wide variety of other people which I had to make work with a tool I was developing. I saw such elements regularly. The classic is an extension point called a variation on <xmlContainer>. Books also had examples like that.

> >   4) In the unlikely event that XML Core were to create such an element, I'm
> > guessing it would be far more likely for it to ever be supported as a custom
> > element than directly by the browser.
> 
> The reason here would be to allow the XML Core WG to mint a new element that
> they have reserved for, and for browsers to implement it, without custom
> elements having poisoned the name already. This is the same reason we
> require the dash.

I understand the reasoning, but it actually cuts both ways: if XML Core were to produce an element that contained a dash I think the feedback should be "don't do that".


> >   5) It's unclear to me why XML Core would need this given that they can use
> > namespaces if they want to, too (notably xml:).
> 
> I thought namespaces were uncool? :-P

Not in XML Core :)

> Anyway, if we choose to ignore XML's reserved prefix we should drop that
> requirement from HTML's attributes also.

Sure.

Overall this is quite a corner case that we'd be creating if we enforced this rule, and I prefer avoiding creating new corner cases. I reckon it's the sort of thing that can be linted if desired but which there is little value in enforcing.

Comment 6 Simon Pieters 2014-04-17 10:09:14 UTC

(In reply to Robin Berjon from comment #5)
> I'm happy to contact them if you think it helps.

Yes please.

> Well, I won't dispute that it's tough to get hard data on this, especially
> since search engines drop the "<" and web corpora don't have much content.
> Here's my experience for this: I spent a solid decade on XML support lists
> and four years on a job that exposed me to schemata written by a wide
> variety of other people which I had to make work with a tool I was
> developing. I saw such elements regularly. The classic is an extension point
> called a variation on <xmlContainer>. Books also had examples like that.

OK. If so, more reason for XML Core to drop the reservation.

> I understand the reasoning, but it actually cuts both ways: if XML Core were
> to produce an element that contained a dash I think the feedback should be
> "don't do that".

Indeed.

Comment 7 Liam R E Quin 2014-04-17 21:43:11 UTC

[personal response not from XML Core WG]

The restriction on names starting with xml (in any combination of upper and lower case) came because XML itself predates XML namespaces.

However, they are not forbidden or illegal.

[[
Names beginning with the string "xml", or with any string which would match (('X'|'x') ('M'|'m') ('L'|'l')), are reserved for standardization in this or future versions of this specification.
]]

means that if you use an element like "xmlisreallyawesomeactually" in your documents you run the risk (in theory) that some future version of XML might define a meaning for that element. If it did, you'd quite likely want to use it with that new meaning in future XHTML documents, so they should not be forbidden.

(the term "reserved" is never formally defined in the XML spec, however)

In practice any new named defined by the XML Core Working Group have been in an XML namespace such as "xml", "xmlns" (the only to hard-wired namespace prefixes) and of course XInclude.

I can take it to the XML Core WG, although I'm not sure they/we will think it worth a revision of the XML spec.

Comment 8 Simon Pieters 2014-04-22 09:33:33 UTC

Thanks Liam. If the reservation serves no purpose then we should lift the restriction for custom elements and <embed> attributes. Whether to update the XML spec is up to you.

Comment 9 Jirka Kosek 2014-04-24 09:40:58 UTC

Actually XML Spec no longer reserves names starting with "xml" (with very few exceptions like processing instructions starting with xml-). See the latest errata at:

http://www.w3.org/XML/xml-V10-5e-errata

Jirka

Comment 10 Simon Pieters 2014-04-25 12:51:20 UTC

OK then, let's drop the restriction. Thanks Jirka.

Comment 11 Simon Pieters 2014-04-29 06:37:11 UTC

http://html5.org/r/8583 dropped "xml" restriction for <embed xmlfoobar="">

Comment 12 Dimitri Glazkov 2014-05-05 17:57:19 UTC

So, nothing for me to do here, right? :)

Comment 13 Simon Pieters 2014-05-06 21:01:56 UTC

ASCII uppercase is still open issue.

Comment 14 Dimitri Glazkov 2014-05-09 19:50:43 UTC

(In reply to Simon Pieters from comment #13)
> ASCII uppercase is still open issue.

FWIW, types/names get converted to lowercase anyway in http://w3c.github.io/webcomponents/spec/custom/#dfn-definition-construction-algorithm.

Comment 15 Simon Pieters 2014-05-12 12:45:19 UTC

More reasons to not allow <foo-BAR> (in XML or createElementNS).

Comment 16 Jirka Kosek 2014-05-12 13:06:21 UTC

(In reply to Simon Pieters from comment #15)
> More reasons to not allow <foo-BAR> (in XML or createElementNS).

I'm not sure if I understand properly to your comment, but XML and createElementNS must support creating elements with names containing dash as such elements are widely used (for example: xsl:for-each).

Jirka

Comment 17 Simon Pieters 2014-05-12 13:09:08 UTC

See comment 13 Jirka.

Comment 18 Liam R E Quin 2014-05-12 19:16:30 UTC

Regarding element names getting lowercased, once you go beyond ASCII life gets complicated -- two people in Turkey died because of sıkısınca being rendered with i instead of ı once, but more to the point, <I>....</ı> is not well-formed XML (İ and i go together in Turkey, so it's not a particularly contrived example).

Similar problems exist with Unicode normalization - <hé>...</hé> is also not well-formed, because one uses the precomposed e acute and the other uses an e followed by a combining acute accent.

So it might be just fine in practice to live with it, or to disallow upper case, or it might be fine to apply ASCII-only lower case as is done e.g. CSS OM. That would also work for HTML elements in general, of course. If this has been discussed to death elsewhere I'm sorry and OK not to respond :-)

Comment 19 Liam R E Quin 2014-05-12 19:19:51 UTC

[I just discovered that "Convert to lowercase" does indeed mean ASCII lowercase; since lowercase wasn't a link I didn't realise. However, unicode normalization or not is still an issue as far as I can tell]

Comment 20 Dimitri Glazkov 2014-05-12 22:18:30 UTC

(In reply to Simon Pieters from comment #15)
> More reasons to not allow <foo-BAR> (in XML or createElementNS).

Can you guys supply a wording I can crib from? Better yet a github pull request? :)
This stuff hurts my brain.

Comment 21 Simon Pieters 2014-05-13 06:56:43 UTC

(In reply to Liam R E Quin from comment #19)
> However, unicode
> normalization or not is still an issue as far as I can tell]

If it doesn't say to apply normalization, then don't. I don't think that's an issue.

Comment 22 Simon Pieters 2014-05-13 07:12:41 UTC

https://github.com/w3c/webcomponents/pull/15

Comment 23 Dimitri Glazkov 2014-05-13 13:52:12 UTC

(In reply to Simon Pieters from comment #22)
> https://github.com/w3c/webcomponents/pull/15

Awesome! Thanks! :)