This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 26946 - Allowing any content of id attributes prevents definition of alternate hash fragment usage
Summary: Allowing any content of id attributes prevents definition of alternate hash f...
Status: RESOLVED WORKSFORME
Alias: None
Product: WHATWG
Classification: Unclassified
Component: HTML (show other bugs)
Version: unspecified
Hardware: Other other
: P3 normal
Target Milestone: Unsorted
Assignee: Ian 'Hixie' Hickson
QA Contact: contributor
URL: https://html.spec.whatwg.org/#the-id-...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-10-01 12:21 UTC by contributor
Modified: 2014-10-20 21:48 UTC (History)
4 users (show)

See Also:


Attachments

Description contributor 2014-10-01 12:21:04 UTC
Specification: https://html.spec.whatwg.org/multipage/dom.html
Multipage: https://html.spec.whatwg.org/multipage/#the-id-attribute
Complete: https://html.spec.whatwg.org/#the-id-attribute
Referrer: https://rawgit.com/whatwg/html-differences/master/Overview.html

Comment:
Allowing "any" content of id attributes is greedy. Sure relax the HTML4 rule
they had to be ASCII, but anynthing????  How would you standardize state
information passing in fragment identifiers, e.g #key-name=keyvalue, if this
perfectly legal id value?  Suggest at least reserving some start characters
(alphabetic or non ASCII for example

Posted from: 14.203.4.38
User agent: Mozilla/5.0 (Windows NT 6.1; rv:32.0) Gecko/20100101 Firefox/32.0
Comment 1 Ian 'Hixie' Hickson 2014-10-03 21:42:32 UTC
You can't standardise information passing in fragment identifiers, because there might be an ID with the value "key-name=keyvalue" already today, whether it's valid or not.
Comment 2 Simon Pieters 2014-10-06 09:43:51 UTC
Though spaces are not allowed so if you don't care about invalid you could use #key-name%20keyvalue or so.
Comment 3 Ian 'Hixie' Hickson 2014-10-06 17:59:06 UTC
that will actually match id="key-name%20keyvalue" in some browsers (and the spec, modulo bug 26988).
Comment 4 Dom Leonard 2014-10-10 03:41:33 UTC
HTML 5 refers conversion of id string values to URL fragments to some "URL Parser" and "URL Serializer" defined in "http://www.w3.org/TR/url/". Another resource is at https://url.spec.whatwg.org/

I was unable to locate in HTML5 or either of these standard a process whereby US-ASCII characters which are not URL-Code points (curly braces and the like), or which have reserved usage under RFC3986, are percent encoded. Both standards appear to assume this occurs before URL component values are passed to them.

Hence encoding of version 5 id value strings into URL hash fragment strings is missing a step.

So I would amend my original comment to suggest id value strings must be percent encoded for a range or ranges of US-ASCII characters enumerated in the HTML standard. Please include "#" and "?" in the list. Somehow I won't, at this point, go into "!" :D
Comment 5 Ian 'Hixie' Hickson 2014-10-15 00:30:16 UTC
When would they need to be encoded? I'm not sure I follow.
Comment 6 Dom Leonard 2014-10-20 10:41:16 UTC
Oh I see it now - the reserved character concept of the RFCs is not being carried forward.

The question underlying this comment was how to separate document references from script data in a URI fragment when both might be present. My conclusion is to suffix any document reference with a line feed ("%0A" in the href string) before appending script data. Short of a redefinition of valid HTML id string values this would appear to delineate the two safely and unambiguously within script.

Thank you all for your comments, and question, which have helped me place this issue in context. AFAIAC this case may now be closed.
Comment 7 Ian 'Hixie' Hickson 2014-10-20 21:48:34 UTC
Marking closed per comment 6.