URI (%C3%A5) versus IRI (å)

Document history: Version 1 - initial version. Version 2 - fixed an error. Version 3 - fixed an incorrect recording of the results for Safari in Table 5. Version 4 - added a 4th set of tests, fixed some typos fixed and improvement on the data correction in version 3. Version 5 - added a 5th and 6th set of (ASCII) tets.
Document purpose: Exposing variation in which idrefs a IRI versus a URI is matching in various browsers.
Key question: How do browsers look at an idref that, character-by-character (minus the hash character), matches the characters in a ”percent-encoded representation” of that idref?

Table of contents

Set #1, set #2, set #3, set #4, set #5, set #6.

Results summary:

The Blink behavior indicates that the browser, for percentage encoded URLs, tries a literal match before it tries the 'semantic' UTF-8 based match.
The Webkit behavior indicates that the browser tries a literal match even for directly typed URLs (when the code point i higher than U+009F, as required by the URL spec).
The IE11 and Firefox behavior indicates that they do not attemt any literal match before they try the ’semantic” UTF-8 based match. (Legacy IE - e.g. IE8, showed more similarty to Webkit and Blink.)

#1: Testing id="å" and id="%C3%A5".

Table 1: Test URLs
URL testExpected idref to be targetedActually targeted idref
id=å id=%C3%A5
IRI as URI:#%C3%A5 id=åFfox, IE11Safari, Chrome
Explicitly targeting %C3%A5#%25C3%25A5id=%C3%A5Ffox, IE11, Safari, Chrome
IRI as named character entity:#&aringid=åFfox, IE11, ChromeSafari
IRI as decimal character reference:#&#x229id=åFfox, IE11, ChromeSafari
IRI as hexadecimal character reference:#&#xe5 id=åFfox, IE11, ChromeSafari
IRI as directly typed character: id=åFfox, IE11, ChromeSafari

Table 2: Idref targets for test urls in table 1.
Target 1:Target 2:
Back to table 1
id=å
id=%C3%A5

#2: Testing only id="æ" (no test of id="%C3%A6")

This test ommits a percent-encoded idref (id="%C3%A6") fragment - this in order to demonstrate that the problem does not manifestate itself when a matching percent-encoded idref is not presenct.

Table 3: Test URLs
URL testExpected idref to be targetedActually targeted idref
id=æ
IRI as URI:#%C3%A6 id=æFfox, IE11, Chrome, Safari
IRI as directly typed character: id=æFfox, IE11, Chrome, Safari

Table 4: Idref targets for test urls in table 3.
Target
Back to table 3
id=æ

#3: Testing id="a" and id="%61".

This test tests a pure ASCII idref versus its ”corresponding” percentencoded variant. Interestingly, for directly typed id='a' (as well as for the named and numerical entities/references) Safari in this test behaves like Chrome. However, from the point of view that it is only for code points that are higher than U+009F that the URL becomes percentage encoded, Safari’s ”improvement” in this test (compared to table 1) is simply as expected.

Table 5: Test URLs
URL testExpected idref to be targetedActually targeted idref
id=a id=%61
IRI as URI:#%61 id=aFfox, IE11Safari, Chrome
Explicitly targeting %61#%2561id=%61Ffox, IE11, Safari, Chrome
IRI as decimal character reference:#&#97id=aFfox, IE11, Chrome, Safari
IRI as hexadecimal character reference:#&#x61 id=aFfox, IE11, Chrome, Safari
IRI as directly typed character:#a id=aFfox, IE11, Chrome, Safari

Table 6: Idref targets for test urls in table 5.
Target 1:Target 2:
Back to table 5
id=a
id=%61

#4: Testing only id="ø" (no test of id="%C3%B8")

This test ommits a directly typed idref (id="ø") fragment - this in order to test whether the issue manifestate itself when a matching directly typed idref is not present. Unfortunatly, in addition to the usual suspects (Webkit and Blinnk), the problem does then manifestate itself in IE11 (but not in Firefox!). However, to experience the problem in IE11, one must first click once, then pause for a second, before performing a secondary click on the link. This test probably shows that the main improvement IE11 has over Blink and Webkit is that it tries to match the UTF-8 decoded percent-encoding before it tries a literal match of the percent-encoded string.
Table 7: Test URLs
URL testExpected idref to be targetedActually targeted idref
noneid="%C3%B8"
IRI as URI:#%C3%B8 noneFfoxIE11, Chrome, Safari
IRI as directly typed character: noneFfox, IE11, ChromeSafari

Table 8: Idref targets for test urls in table 7.
Target
Back to table 7
id=%C3%B8

#5: Testing id="h" (but not for id="%68").

This test checks whether results are different from table 5.

Table 9: Test URLs
URL testExpected idref to be targetedActually targeted idref
id=h
IRI as URI:#%68 id=hFfox, IE11, Safari, Chrome
IRI as directly typed character:#h id=hFfox, IE11, Chrome, Safari

Table 10: Idref targets for test urls in table 9.
Target:
Back to table 9
id=h

#6: Testing id="%78" (but not for id="x").

This test checks whether results are different from table 5. IE11 shows same behavior as in test set 4.

Table 11: Test URLs
URL testExpected idref to be targetedActually targeted idref
noneid=%78
IRI as URI:#%78 noneFfoxIE11, Safari, Chrome
IRI as directly typed character:#x noneFfox, IE11, Chrome, Safari

Table 12: Idref targets for test urls in table 11.
Target:
Back to table 11
id=%78