21902 – #url-code-points: Add notes that points out the need/lack of need to escape certain code points

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 21902 - #url-code-points: Add notes that points out the need/lack of need to escape certain code points

Summary: #url-code-points: Add notes that points out the need/lack of need to escape c...

Status:	RESOLVED FIXED

Alias:	None

Product:	WHATWG
Classification:	Unclassified
Component:	URL (show other bugs)
Version:	unspecified
Hardware:	All All

Importance:	P2 normal
Target Milestone:	Unsorted
Assignee:	Anne
QA Contact:	sideshowbarker+urlspec

URL:	http://url.spec.whatwg.org/#url-code-...
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2013-05-02 12:47 UTC by Leif Halvard Silli
Modified:	2015-08-19 09:09 UTC (History)
CC List:	8 users (show)

See Also:

Attachments

Description Leif Halvard Silli 2013-05-02 12:47:50 UTC

The #url-code-points paragraph should be annotated with more notes that point out important implications and important details related to the characteds included/excluded in the list of URL code points.

I propose that you consider clarifying the motivation for the current note (1) plus add 2 more notes (2),(3) and conisder a fourth note (4):

(1) The #url-code-points paragraph is already accompanied with a note about URL parser’s behavios, whose relevance to URL writing is unexplained. That these code points does not need escaping is obvious from their inclusion in the code ranges list, and that they are escaped by the URL parse, should have little bearing with regard to authoring, no? Please consider a hint about why you wanted to point this out.

(2) Add a note that points out that code points *not* listed amongst the URL code points, need to be escaped. Feel free to list these code points (all of them) but do at any rate at least explicitly mention common code points in need of escaping such as U+0009, U+000A, and U+000D - and please include their Unicode names as well, to help readers! (The '#' seems to belong in this category, whenever its 'fragment semantics' should be escxape, belongs here as well, may be.)

(3) Add a note about which of the ‘special characters’ that *are* listed, need to be percentage-encoded whenever their URL specific functions need to be escaped as well. This includes characteres such as ?, / etc. (And if escaping is not necessary for some of these special chaaracters, then that is unexpected as well, and thus ought to be pointed out.)

(4) Consider adding note about format specific considerations. E.g. the need to escape < and & in XML.

Comment 1 Anne 2014-05-22 10:33:56 UTC

1) I pointed this out because otherwise you might be surprised by what the API returns.

2-4) I think you make some valid points. I'm tempted to wait with fixing this until I use Bikeshed and can include railroad diagrams and such. If you have suggestions for clearer notes, that'd be most welcome.

Comment 2 Sam Ruby 2014-11-05 21:29:28 UTC

Does http://intertwingly.net/projects/pegurl/url.html help?

Comment 3 Anne 2015-08-19 09:09:48 UTC

Railroad diagrams is now https://github.com/whatwg/url/issues/67 but seem potentially problematic.

I added a note to clarify what percent-encoded bytes are good for:

https://github.com/whatwg/url/commit/fef9bcec9615d92695503107732a9cd5f9d05ab8

I didn't go into as much detail as I don't think that's warranted here. We just want to describe the data model, syntax, and the parser, and various operations around that. And on top of that folks can build whatever they want.