17298 – valid character range for identifiers too broad

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 17298 - valid character range for identifiers too broad

Summary: valid character range for identifiers too broad

Status:	RESOLVED WONTFIX

Alias:	None

Product:	HTML WG
Classification:	Unclassified
Component:	HTML5 spec (show other bugs)
Version:	unspecified
Hardware:	All All

Importance:	P3 normal
Target Milestone:	---
Assignee:	Robin Berjon
QA Contact:	HTML WG Bugzilla archive list

URL:	http://dev.w3.org/html5/spec/single-p...
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2012-06-03 07:40 UTC by Julian Reschke
Modified:	2015-06-17 03:37 UTC (History)
CC List:	7 users (show)

See Also:

Attachments

Description Julian Reschke 2012-06-03 07:40:31 UTC

"3.2.3.1 The id attribute

The id attribute specifies its element's unique identifier (ID). [DOMCORE]

The value must be unique amongst all the IDs in the element's home subtree and must contain at least one character. The value must not contain any space characters."

This makes it essentially impossible to extend the HTML fragment identifier syntax with new addressing schemes such as <http://simonstl.com/articles/cssFragID.html> or with XPointer.

I understand that for now HTML5 recipients will process the identifiers as specified, but that doesn't mean that they all should pass validation.

Proposal: exclude those characters from US-ASCII which aren't also allowed in XML IDs (<http://www.w3.org/TR/REC-xml/#NT-Name>)

Comment 1 Henri Sivonen 2012-06-04 08:39:18 UTC

(In reply to comment #0)
> Proposal: exclude those characters from US-ASCII which aren't also allowed in
> XML IDs (<http://www.w3.org/TR/REC-xml/#NT-Name>)

Excluding ASCII digits is an annoying and arbitrary limitation. People often want numbered fragment identifiers. They shouldn't be required to know that they have to put a one-character prefix before the number.

I object to encumbering HTML by limiting IDs to XML Names.

Comment 2 Julian Reschke 2012-06-04 09:00:29 UTC

(In reply to comment #1)
> (In reply to comment #0)
> > Proposal: exclude those characters from US-ASCII which aren't also allowed in
> > XML IDs (<http://www.w3.org/TR/REC-xml/#NT-Name>)
> 
> Excluding ASCII digits is an annoying and arbitrary limitation. People often
> want numbered fragment identifiers. They shouldn't be required to know that
> they have to put a one-character prefix before the number.
> 
> I object to encumbering HTML by limiting IDs to XML Names.

ASCII digits are allowed, just not as start character. And yes, that's annoying.

There should be a middle ground here; not having an extension point at all is a problem as well.

Comment 3 Aryeh Gregor 2012-06-05 08:58:05 UTC

(In reply to comment #1)
> Excluding ASCII digits is an annoying and arbitrary limitation. People often
> want numbered fragment identifiers. They shouldn't be required to know that
> they have to put a one-character prefix before the number.
> 
> I object to encumbering HTML by limiting IDs to XML Names.

How about limiting it to (NameChar)*?  This still prohibits most ASCII punctuation, and I'm not at all sure that's a good idea.  If we really want an extension point, one character should be enough.  What's wrong with assigning special meaning to something involving a space, say?  Or just use something that's unlikely to come up in real-world identifiers, and if we want to make it standard, make conflicting id's invalid after the fact.

Comment 4 Julian Reschke 2012-06-05 09:18:23 UTC

(In reply to comment #3)
> How about limiting it to (NameChar)*?  This still prohibits most ASCII
> punctuation, and I'm not at all sure that's a good idea.  If we really want an
> extension point, one character should be enough.  What's wrong with assigning
> special meaning to something involving a space, say?  Or just use something
> that's unlikely to come up in real-world identifiers, and if we want to make it
> standard, make conflicting id's invalid after the fact.

I think whitespace is indeed likely to show up in identifiers (for instance, when turning a section title into an identifier). And yes, it's good that at least that is invalid.

The addressing schemes that have been proposed in the past (XPointer) an right now (CSS selectors) seem to use "(" and ")".

Comment 5 Kang-Hao (Kenny) Lu 2012-06-05 09:42:31 UTC

(In reply to comment #0)
> This makes it essentially impossible to extend the HTML fragment identifier
> syntax with new addressing schemes such as
> <http://simonstl.com/articles/cssFragID.html> or with XPointer.

I think this depends on whether @id ends up overriding XPointer or the reverse. For example, "top" is still a valid ID even if #top has a default scrolling behavior written in the spec.

(In reply to comment #3)
> How about limiting it to (NameChar)*?  This still prohibits most ASCII
> punctuation, and I'm not at all sure that's a good idea.  If we really want an
> extension point, one character should be enough.  What's wrong with assigning
> special meaning to something involving a space, say?

Both cssFragID and XPointer use parenthesis. So do you mean '(' and ')'?


Can someone remind me why we reserve ID with space in it at first place? Is that for the purpose of letting conformance checker issues an error in case that an author thinks a space delimits multiple IDs in @id ?

If helping authors is a goal here, I would instead suggest we require conformance chekcker issue an warning when @id or @class don't match IDENT in CSS as it's common error that authors think selecting an @id like "1st" or "2nd" would work in CSS.

Comment 6 Aryeh Gregor 2012-06-05 09:53:53 UTC

(In reply to comment #5)
> Can someone remind me why we reserve ID with space in it at first place?

Because some things expect a space-separated list of id's, and id's with spaces in them would break.  For instance: <td headers="">, <output for="">, itemref="".

> If helping authors is a goal here, I would instead suggest we require
> conformance chekcker issue an warning when @id or @class don't match IDENT in
> CSS as it's common error that authors think selecting an @id like "1st" or
> "2nd" would work in CSS.

I think it would be better to fix CSS to accept such id's, since as far as I know they aren't actually ambiguous.  Particularly since CSS escape syntax doesn't allow an easy way to escape numbers -- \1 is interpreted as U+0001, IIRC.  Of course, authors would still have to escape punctuation like # or . that has special meaning in CSS, but that's more expected, and you can just prefix with a backslash.

Comment 7 Kang-Hao (Kenny) Lu 2012-06-05 10:10:25 UTC

(In reply to comment #6)
> (In reply to comment #5)
> > Can someone remind me why we reserve ID with space in it at first place?
> 
> Because some things expect a space-separated list of id's, and id's with spaces
> in them would break.  For instance: <td headers="">, <output for="">,
> itemref="".

That sounds like an odd restriction then, as these attributes are not likely to be frequently used.
 
> > If helping authors is a goal here, I would instead suggest we require
> > conformance chekcker issue an warning when @id or @class don't match IDENT in
> > CSS as it's common error that authors think selecting an @id like "1st" or
> > "2nd" would work in CSS.
> 
> I think it would be better to fix CSS to accept such id's, since as far as I
> know they aren't actually ambiguous.  

I thought about this too, but given that CSS has been like this for more than a decade (at least spec-wise, I don't know about implementations), this seems like a dangerous change to make in terms of backwards compatibility. (Sites relying on these rule-sets to not apply.)

> Particularly since CSS escape syntax doesn't allow an easy way to escape 
> numbers -- \1 is interpreted as U+0001, IIRC.

Yes.

Comment 8 contributor 2012-07-18 07:10:00 UTC

This bug was cloned to create bug 17873 as part of operation convergence.

Comment 9 Robin Berjon 2012-09-05 14:38:04 UTC

EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are
satisfied with this response, please change the state of this bug to CLOSED. If
you have additional information and would like the Editor to reconsider, please
reopen this bug. If you would like to escalate the issue to the full HTML
Working Group, please add the TrackerRequest keyword to this bug, and suggest
title and text for the Tracker Issue; or you may create a Tracker Issue
yourself, if you are able to do so. For more details, see this document:


   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Rejected
Rationale:

(In reply to comment #0)
> This makes it essentially impossible to extend the HTML fragment identifier
> syntax with new addressing schemes such as
> <http://simonstl.com/articles/cssFragID.html> or with XPointer.

The first problem with changing this is that it will break existing content. If you try the following:

<!DOCTYPE html>
<html lang='en'>
  <head>
    <meta charset='utf-8'>
    <title>ID test</title>
  </head>
  <body>
    <a href='#css(.foo)'>link to css()</a>
    <div style='height: 1000px'>haha</div>
    <div id='css(.foo)'>link to me!</div>
    <div style='height: 1000px'>haha</div>
  </body>
</html>

you can see that it works. And yes, there's content with crazy IDs out there — this has been loose for a long while by now.

Additionally, I don't believe that it poses a major threat to the extensibility of fragments processing (and I say this as an enthusiastic supporter of Simon's CSS Frag IDs — I helped with the spec, arranged the CG, spoke of it at conferences, etc.). While there are indeed IDs out there that have (, ), and a bunch of other unfriendly characters, the odds that there are some that match "css(VALID_SELECTOR)" are low. In other words, the risk induced by collisions is sufficiently negligible that we may proceed with no fear. It merely places a requirement on the specification extending fragment processing that it states that it must be processed *before* ID processing takes place, and pass what it does not understand itself to ID processing (as last resort).

Comment 10 Julian Reschke 2012-09-05 14:40:50 UTC

The proposal is to change (back) what's conforming (valid) to what HTML 4 said (or a superset of that).

It is *not* about changing the processing requirements.

Comment 11 Robin Berjon 2012-09-05 15:40:05 UTC

(In reply to comment #10)
> The proposal is to change (back) what's conforming (valid) to what HTML 4 said
> (or a superset of that).
> 
> It is *not* about changing the processing requirements.

And I found your argument to do that to be groundless in practice, as I clearly explained. If you wish to keep this bug open, please provide new arguments or a clear explanation of why I may be wrong. Without that, there is really no point in reopening.

Comment 12 Julian Reschke 2012-09-05 17:43:35 UTC

(In reply to comment #11)
> (In reply to comment #10)
> > The proposal is to change (back) what's conforming (valid) to what HTML 4 said
> > (or a superset of that).
> > 
> > It is *not* about changing the processing requirements.
> 
> And I found your argument to do that to be groundless in practice, as I clearly
> explained. If you wish to keep this bug open, please provide new arguments or a
> clear explanation of why I may be wrong. Without that, there is really no point
> in reopening.

The reason is that people who *do* care about conformance will get a warning when they use identifiers that may become problematic in the future; which is exactly the reason why we distinguish between things that are conforming and those which happen to work anyway, no?

Comment 13 Robin Berjon 2012-09-06 07:44:29 UTC

(In reply to comment #12)
> The reason is that people who *do* care about conformance will get a warning
> when they use identifiers that may become problematic in the future; which is
> exactly the reason why we distinguish between things that are conforming and
> those which happen to work anyway, no?

Yes, I know why people care about validation. But you have not demonstrated in any way, manner, or form that the current situation leads to genuine problems.

So, *again*: please state technical reasons to keep this open, or close the bug.

Comment 14 Julian Reschke 2012-09-06 07:48:27 UTC

(In reply to comment #13)
> (In reply to comment #12)
> > The reason is that people who *do* care about conformance will get a warning
> > when they use identifiers that may become problematic in the future; which is
> > exactly the reason why we distinguish between things that are conforming and
> > those which happen to work anyway, no?
> 
> Yes, I know why people care about validation. But you have not demonstrated in
> any way, manner, or form that the current situation leads to genuine problems.
> 
> So, *again*: please state technical reasons to keep this open, or close the
> bug.

I think I did, but let me try again.

If we make all characters conforming (and that *is* a change from HTML4), that essentially means that we can't introduce new fragment identifier notations without changing the semantics of currently conforming documents. 

So if we're serious about things like "css(...)", we should warn people who use this right now. That's exactly what conformance requirements are for.

Comment 15 Michael[tm] Smith 2015-06-17 03:37:18 UTC

See comment 1 and comment 13. Nothing has happened on this bug in 2+ years and it's not enough of a priority to continue tracking for another N years unless somebody's going to take further action on pursuing it.