This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 5802 - No mention of %uXXXX escapes
Summary: No mention of %uXXXX escapes
Alias: None
Product: HTML WG
Classification: Unclassified
Component: pre-LC1 HTML5 spec (editor: Ian Hickson) (show other bugs)
Version: unspecified
Hardware: All All
: P2 normal
Target Milestone: ---
Assignee: Ian 'Hixie' Hickson
QA Contact: HTML WG Bugzilla archive list
Keywords: NoReply
Depends on:
Reported: 2008-06-25 12:57 UTC by Henri Sivonen
Modified: 2010-10-04 13:55 UTC (History)
6 users (show)

See Also:


Description Henri Sivonen 2008-06-25 12:57:11 UTC
There appears to be a UTF-16-based URI query string escaping mechanism in use on some Chinese sites. The mechanism escapes a UTF-16 code unit as %uXXXX.

The spec should probably mention these in some way. I don't know what it should say.
Comment 1 Julian Reschke 2008-06-25 13:06:20 UTC
...used by sites?

How is this relevant, unless UAs support it?

Otherwise it's just a convention how certain sites encode characters into URIs, and certainly HTML5 does not need to talk about that.
Comment 2 Philip Taylor 2008-06-25 13:11:33 UTC
I'm not sure what HTML5 defines already, but (if I understand correctly) it would need to define that "%u1234" is transmitted to the server as "%u1234" and not as "%25u1234".

E.g. <> used to have a link to <>, which doesn't work if the %s are encoded as %25.
Comment 3 Julian Reschke 2008-06-25 13:24:11 UTC
(In reply to comment #2)
> I'm not sure what HTML5 defines already, but (if I understand correctly) it
> would need to define that "%u1234" is transmitted to the server as "%u1234" and
> not as "%25u1234".
> ...

But the former would be a broken URL. On the other hand, the latter, when percent-unescaped, would yield the input string.

So I guess I need more information...
Comment 4 Ian 'Hixie' Hickson 2008-06-25 19:50:20 UTC
I'll need to do some research and see how many pages contaun %uXXXX where X = [0-9A-Fa-f].

Is there any documentation on this feature? Are there any tests? It would be interesting to see how browsers react to this. If it's only done by one browser, I'd rather not go there.
Comment 5 Philip Taylor 2008-06-26 14:07:31 UTC
As far as I can see, no browser handles %uXXXX in a special way - the real issue is that

  <a href="?x%25y%z w">

on resolves as "" (in IE6, FF3, O9.5, S3), i.e. it does not percent-encode the lone "%".

If I understand HTML5 correctly, it currently says that should become "" instead (since the lone "%" does not match the <query> production).

That means that HTML5 says <a href="?%u1234"> resolves to "...?%25u1234", which breaks servers that expect the UA to resolve it to "...?%u1234" instead.
Comment 6 Ian 'Hixie' Hickson 2008-06-30 23:52:33 UTC
Yeah. I've made the % not be escaped, since no browsers escape it today and it just seems like an unnecessarily risky change from the status quo. It's still non-conforming.

Comment 7 Maciej Stachowiak 2010-03-14 13:14:29 UTC
This bug predates the HTML Working Group Decision Policy.

If you are satisfied with the resolution of this bug, please change the state of this bug to CLOSED. If
you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:

This bug is now being moved to VERIFIED. Please respond within two weeks. If this bug is not closed, reopened or escalated within two weeks, it may be marked as NoReply and will no longer be considered a pending comment.