This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 21897 - Definition of 'valid URLs' points to The URL standard, which lacks requirement to escape spaces
Summary: Definition of 'valid URLs' points to The URL standard, which lacks requiremen...
Status: CLOSED WORKSFORME
Alias: None
Product: HTML WG
Classification: Unclassified
Component: HTML5 spec (show other bugs)
Version: unspecified
Hardware: PC Windows 3.1
: P2 normal
Target Milestone: ---
Assignee: This bug has no owner yet - up for the taking
QA Contact: HTML WG Bugzilla archive list
URL: http://www.w3.org/html/wg/drafts/html...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-05-02 01:18 UTC by Leif Halvard Silli
Modified: 2013-05-02 11:26 UTC (History)
6 users (show)

See Also:


Attachments

Description Leif Halvard Silli 2013-05-02 01:18:44 UTC
ISSUE: 

1) NU validator considers it an error if a string that is
   supposed to be a URL, contains an unescaped space character.
   (Percentage escaping of valid URLs is common knowledge.)

2) HTML5 says "A URL is a valid URL if it conforms to the
   authoring conformance requirements in the URL standard."
   http://www.w3.org/html/wg/drafts/html/master/infrastructure.html#valid-url

3) HOWEVER, the URL standard does has no requirement that
   spaces are written as percentage-encoded.

NOTE:

   Probably relates to other characters that needs escape too.

PROPOSAL: 

* If the URL standards editor plans to add this requirement, then clarify that this is currently not defined by the URL standard.

* If the URL standard editor has no such plans, then define the requirement in HTML 5.1. (HTML5 CR does not have this issue I believe.)
Comment 1 Anne 2013-05-02 09:24:23 UTC
U+0020 is not a URL unit (unless you write it %20).
Comment 2 Leif Halvard Silli 2013-05-02 11:26:08 UTC
(In reply to comment #1)
> U+0020 is not a URL unit (unless you write it %20).

My bad ... for not seeing that the 'URL code points' paragraph does not list the space character.

Think the section ought to explicitly mention - perhaps in a note - that U+0009, U+000A, and U+000D are to be escaped. Currently, the way to grok it is to understand that space is not listed in the code points list. But that's a bug against the URL standard.