17873 – valid character range for identifiers too broad

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 17873 - valid character range for identifiers too broad

Summary: valid character range for identifiers too broad

Status:	RESOLVED WONTFIX

Alias:	None

Product:	WHATWG
Classification:	Unclassified
Component:	HTML (show other bugs)
Version:	unspecified
Hardware:	Other other

Importance:	P3 normal
Target Milestone:	Unsorted
Assignee:	Ian 'Hixie' Hickson
QA Contact:	contributor

URL:	http://dev.w3.org/html5/spec/single-p...
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2012-07-18 07:09 UTC by contributor
Modified:	2012-09-06 18:25 UTC (History)
CC List:	6 users (show)

See Also:

Attachments

Description contributor 2012-07-18 07:09:56 UTC

This was was cloned from bug 17298 as part of operation convergence.
Originally filed: 2012-06-03 07:40:00 +0000
Original reporter: Julian Reschke <julian.reschke@gmx.de>

================================================================================
 #0   Julian Reschke                                  2012-06-03 07:40:31 +0000 
--------------------------------------------------------------------------------
"3.2.3.1 The id attribute

The id attribute specifies its element's unique identifier (ID). [DOMCORE]

The value must be unique amongst all the IDs in the element's home subtree and must contain at least one character. The value must not contain any space characters."

This makes it essentially impossible to extend the HTML fragment identifier syntax with new addressing schemes such as <http://simonstl.com/articles/cssFragID.html> or with XPointer.

I understand that for now HTML5 recipients will process the identifiers as specified, but that doesn't mean that they all should pass validation.

Proposal: exclude those characters from US-ASCII which aren't also allowed in XML IDs (<http://www.w3.org/TR/REC-xml/#NT-Name>)
================================================================================
 #1   Henri Sivonen                                   2012-06-04 08:39:18 +0000 
--------------------------------------------------------------------------------
Excluding ASCII digits is an annoying and arbitrary limitation. People often want numbered fragment identifiers. They shouldn't be required to know that they have to put a one-character prefix before the number.

I object to encumbering HTML by limiting IDs to XML Names.
================================================================================
 #2   Julian Reschke                                  2012-06-04 09:00:29 +0000 
--------------------------------------------------------------------------------
ASCII digits are allowed, just not as start character. And yes, that's annoying.

There should be a middle ground here; not having an extension point at all is a problem as well.
================================================================================
 #3   Aryeh Gregor                                    2012-06-05 08:58:05 +0000 
--------------------------------------------------------------------------------
(In reply to comment #1)
> Excluding ASCII digits is an annoying and arbitrary limitation. People often
> want numbered fragment identifiers. They shouldn't be required to know that
> they have to put a one-character prefix before the number.
> 
> I object to encumbering HTML by limiting IDs to XML Names.

How about limiting it to (NameChar)*?  This still prohibits most ASCII punctuation, and I'm not at all sure that's a good idea.  If we really want an extension point, one character should be enough.  What's wrong with assigning special meaning to something involving a space, say?  Or just use something that's unlikely to come up in real-world identifiers, and if we want to make it standard, make conflicting id's invalid after the fact.
================================================================================
 #4   Julian Reschke                                  2012-06-05 09:18:23 +0000 
--------------------------------------------------------------------------------
I think whitespace is indeed likely to show up in identifiers (for instance, when turning a section title into an identifier). And yes, it's good that at least that is invalid.

The addressing schemes that have been proposed in the past (XPointer) an right now (CSS selectors) seem to use "(" and ")".
================================================================================
 #5   Kang-Hao (Kenny) Lu                             2012-06-05 09:42:31 +0000 
--------------------------------------------------------------------------------
(In reply to comment #0)
> This makes it essentially impossible to extend the HTML fragment identifier
> syntax with new addressing schemes such as
> <http://simonstl.com/articles/cssFragID.html> or with XPointer.

I think this depends on whether @id ends up overriding XPointer or the reverse. For example, "top" is still a valid ID even if #top has a default scrolling behavior written in the spec.

(In reply to comment #3)
> How about limiting it to (NameChar)*?  This still prohibits most ASCII
> punctuation, and I'm not at all sure that's a good idea.  If we really want an
> extension point, one character should be enough.  What's wrong with assigning
> special meaning to something involving a space, say?

Both cssFragID and XPointer use parenthesis. So do you mean '(' and ')'?


Can someone remind me why we reserve ID with space in it at first place? Is that for the purpose of letting conformance checker issues an error in case that an author thinks a space delimits multiple IDs in @id ?

If helping authors is a goal here, I would instead suggest we require conformance chekcker issue an warning when @id or @class don't match IDENT in CSS as it's common error that authors think selecting an @id like "1st" or "2nd" would work in CSS.
================================================================================
 #6   Aryeh Gregor                                    2012-06-05 09:53:53 +0000 
--------------------------------------------------------------------------------
(In reply to comment #5)
> Can someone remind me why we reserve ID with space in it at first place?

Because some things expect a space-separated list of id's, and id's with spaces in them would break.  For instance: <td headers="">, <output for="">, itemref="".

> If helping authors is a goal here, I would instead suggest we require
> conformance chekcker issue an warning when @id or @class don't match IDENT in
> CSS as it's common error that authors think selecting an @id like "1st" or
> "2nd" would work in CSS.

I think it would be better to fix CSS to accept such id's, since as far as I know they aren't actually ambiguous.  Particularly since CSS escape syntax doesn't allow an easy way to escape numbers -- \1 is interpreted as U+0001, IIRC.  Of course, authors would still have to escape punctuation like # or . that has special meaning in CSS, but that's more expected, and you can just prefix with a backslash.
================================================================================
 #7   Kang-Hao (Kenny) Lu                             2012-06-05 10:10:25 +0000 
--------------------------------------------------------------------------------
(In reply to comment #6)
> (In reply to comment #5)
> > Can someone remind me why we reserve ID with space in it at first place?
> 
> Because some things expect a space-separated list of id's, and id's with spaces
> in them would break.  For instance: <td headers="">, <output for="">,
> itemref="".

That sounds like an odd restriction then, as these attributes are not likely to be frequently used.
 
> > If helping authors is a goal here, I would instead suggest we require
> > conformance chekcker issue an warning when @id or @class don't match IDENT in
> > CSS as it's common error that authors think selecting an @id like "1st" or
> > "2nd" would work in CSS.
> 
> I think it would be better to fix CSS to accept such id's, since as far as I
> know they aren't actually ambiguous.  

I thought about this too, but given that CSS has been like this for more than a decade (at least spec-wise, I don't know about implementations), this seems like a dangerous change to make in terms of backwards compatibility. (Sites relying on these rule-sets to not apply.)

> Particularly since CSS escape syntax doesn't allow an easy way to escape 
> numbers -- \1 is interpreted as U+0001, IIRC.

Yes.
================================================================================

Comment 1 Ian 'Hixie' Hickson 2012-07-19 23:21:18 UTC

Please love numeric IDs. Making them non-conforming is not a good idea.

Comment 2 Ian 'Hixie' Hickson 2012-09-06 18:25:13 UTC

Uh, s/Please/People/, though "Please" works too I guess. :-)