This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
There appears to be a UTF-16-based URI query string escaping mechanism in use on some Chinese sites. The mechanism escapes a UTF-16 code unit as %uXXXX. The spec should probably mention these in some way. I don't know what it should say.
...used by sites? How is this relevant, unless UAs support it? Otherwise it's just a convention how certain sites encode characters into URIs, and certainly HTML5 does not need to talk about that.
I'm not sure what HTML5 defines already, but (if I understand correctly) it would need to define that "%u1234" is transmitted to the server as "%u1234" and not as "%25u1234". E.g. <http://10.pro.tok2.com/%7ehonetsugi408/> used to have a link to <http://maps.live.com/default.aspx?v=2&ss=yp.%u3057%u304a%u3084%u6574%u9aa8%u9662~yp.%u3057%u304a%u3084%u63a5%u9aa8%u30fb%u6574%u9aa8%u9662&cp=35.790005~139.439234&style=r&lvl=18&tilt=-90&dir=0&alt=-1000>, which doesn't work if the %s are encoded as %25.
(In reply to comment #2) > I'm not sure what HTML5 defines already, but (if I understand correctly) it > would need to define that "%u1234" is transmitted to the server as "%u1234" and > not as "%25u1234". > ... But the former would be a broken URL. On the other hand, the latter, when percent-unescaped, would yield the input string. So I guess I need more information...
I'll need to do some research and see how many pages contaun %uXXXX where X = [0-9A-Fa-f]. Is there any documentation on this feature? Are there any tests? It would be interesting to see how browsers react to this. If it's only done by one browser, I'd rather not go there.
As far as I can see, no browser handles %uXXXX in a special way - the real issue is that <a href="?x%25y%z w"> on http://example.com/ resolves as "http://example.com/?x%25y%z%20w" (in IE6, FF3, O9.5, S3), i.e. it does not percent-encode the lone "%". If I understand HTML5 correctly, it currently says that should become "http://example.com/?x%25y%25z%20w" instead (since the lone "%" does not match the <query> production). That means that HTML5 says <a href="?%u1234"> resolves to "...?%25u1234", which breaks servers that expect the UA to resolve it to "...?%u1234" instead.
Yeah. I've made the % not be escaped, since no browsers escape it today and it just seems like an unnecessarily risky change from the status quo. It's still non-conforming. r1835
This bug predates the HTML Working Group Decision Policy. If you are satisfied with the resolution of this bug, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document: http://dev.w3.org/html5/decision-policy/decision-policy.html This bug is now being moved to VERIFIED. Please respond within two weeks. If this bug is not closed, reopened or escalated within two weeks, it may be marked as NoReply and will no longer be considered a pending comment.