This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
This was was cloned from bug 16972 as part of operation convergence. Originally filed: 2012-05-07 17:45:00 +0000 Original reporter: Addison Phillips <addison@lab126.com> ================================================================================ #0 Addison Phillips 2012-05-07 17:45:48 +0000 -------------------------------------------------------------------------------- 2.6.3 Resolving URLs http://www.w3.org/TR/html5/urls.html#resolving-urls Step 8.1 replaces characters that cannot be encoded into the target encoding with the question mark character (0x3F). Should this be, instead, the replacement character for the target encoding? For example, UTF-8 would use U+FFFD. Some encodings use _. ================================================================================ #1 Ian 'Hixie' Hickson 2012-05-10 17:58:18 +0000 -------------------------------------------------------------------------------- Please provide test cases demonstrating the proposed behaviour is compatible with legacy implementations. ================================================================================
I validated FF, Chrome, IE8, and Opera 11 all show U+FFFD or a tofu box that evaulates to be U+FFFD and not 0x3F, for the UTF-8 test and all but FF show the same behavior for the SJIS test I created. (FF shows random junk instead) Please see: http://www.inter-locale.com/test/html5test/17861-t2.html (UTF-8) http://www.inter-locale.com/test/html5test/17861-t1.html (SJIS)
Anne, is this an issue for your URL spec?
Yes, it would be. However, those test cases are not testing URLs. They are testing bytes -> unicode for the HTML parser, whereas the URL requirement is about unicode -> bytes for the query component (and only the query component). And I'm pretty sure browsers all use 0x3F there for non-utf-8 encodings.
Ok. Marking INVALID; please reopen if Anne and I misunderstand the issue here.