Re: "Web addresses in HTML 5" for review (ISSUE-56 urls-webarch)

Hi Dan,

I'm a bit confused by:
> The parsing process described here should be more closely aligned with 
> the rules given in RFC 3987.
> 
>    1.
> 
>       Strip leading and trailing space characters <#space-character> from w.
> 
>    2.
> 
>       Percent-encode all non-URI characters in w.
> 
>       This probably needs to be laid out in more detail.
> 
>       Note: this step will replace all of the following characters with
>       a percent-encoded equivalent:
> 
>           * all characters with codepoints less than or equal to U+0020
>             (i.e. the C0 control characters)
>           * all characters with codepoints greater than or equal to
>             U+007% (i.e. U+007?F and all non-ASCII characters in the w)
>           * U+0022 double quotation mark
>           * U+0025 percent sign
>           * U+003C less-than sign
>           * U+003E greater-than sign mark
>           * U+005C reverse solidus (backslash)
>           * U+005E circumflex accent
>           * U+0060 grave accent
>           * U+007B left curly bracket
>           * U+007C vertical line
>           * U+007D right curly bracket
> 
>       As a result of percent-encoding the percent sign, any occurrences
>       of percent-encoding in the Web address will be double-encoded at
>       this step.

Why would you want that?

It seems to mean that if w includes "%20" (a properly escaped space 
character), it will be encoded into "%2520".

BR, Julian

Received on Monday, 23 March 2009 15:17:26 UTC