URI (%C3%A5
) versus IRI (å
)
Document history: Version 1 - initial version. Version 2 - fixed an error. Version 3 - fixed an incorrect recording of the results for Safari in Table 5. Version 4 - added a 4th set of tests, fixed some typos fixed and improvement on the data correction in version 3. Version 5 - added a 5th and 6th set of (ASCII) tets.
Document purpose: Exposing variation in which idrefs a IRI versus a URI is matching in various browsers.
Key question: How do browsers look at an idref that, character-by-character (minus the hash character), matches the characters in a ”percent-encoded representation” of that idref?
Table of contents
Set #1, set #2, set #3, set #4, set #5, set #6.
Results summary:
The Blink behavior indicates that the browser, for percentage encoded URLs, tries a literal match before it tries the 'semantic' UTF-8 based match.
The Webkit behavior indicates that the browser tries a literal match even for directly typed URLs (when the code point i higher than U+009F, as required by the URL spec).
The IE11 and Firefox behavior indicates that they do not attemt any literal match before they try the ’semantic” UTF-8 based match. (Legacy IE - e.g. IE8, showed more similarty to Webkit and Blink.)
#1: Testing id="å"
and id="%C3%A5".
Table 1: Test URLs
| URL test | Expected idref to be targeted | Actually targeted idref |
id=å | id=%C3%A5 |
IRI as URI: | #%C3%A5 | id=å | Ffox, IE11 | Safari, Chrome |
Explicitly targeting %C3%A5 | #%25C3%25A5 | id=%C3%A5 | | Ffox, IE11, Safari, Chrome |
IRI as named character entity: | #å | id=å | Ffox, IE11, Chrome | Safari |
IRI as decimal character reference: | #ȩ | id=å | Ffox, IE11, Chrome | Safari |
IRI as hexadecimal character reference: | #å | id=å | Ffox, IE11, Chrome | Safari |
IRI as directly typed character: | #å | id=å | Ffox, IE11, Chrome | Safari |
Table 2: Idref targets for test urls in table 1.
Target 1: | Target 2: |
Back to table 1 |
id=å
| |
|
|
|
|
id=%C3%A5
|
#2: Testing only id="æ"
(no test of id="%C3%A6"
)
This test ommits a percent-encoded idref (id="%C3%A6"
) fragment - this in order to demonstrate that the problem does not manifestate itself when a matching percent-encoded idref is not presenct.
Table 3: Test URLs
| URL test | Expected idref to be targeted | Actually targeted idref |
id=æ |
IRI as URI: | #%C3%A6 | id=æ | Ffox, IE11, Chrome, Safari |
IRI as directly typed character: | #æ | id=æ | Ffox, IE11, Chrome, Safari |
Table 4: Idref targets for test urls in table 3.
Target |
Back to table 3 |
id=æ
|
#3: Testing id="a"
and id="%61".
This test tests a pure ASCII idref versus its ”corresponding” percentencoded variant. Interestingly, for directly typed id='a'
(as well as for the named and numerical entities/references) Safari in this test behaves like Chrome. However, from the point of view that it is only for code points that are higher than U+009F that the URL becomes percentage encoded, Safari’s ”improvement” in this test (compared to table 1) is simply as expected.
Table 5: Test URLs
| URL test | Expected idref to be targeted | Actually targeted idref |
id=a | id=%61 |
IRI as URI: | #%61 | id=a | Ffox, IE11 | Safari, Chrome |
Explicitly targeting %61 | #%2561 | id=%61 | | Ffox, IE11, Safari, Chrome |
IRI as decimal character reference: | #a | id=a | Ffox, IE11, Chrome, Safari | |
IRI as hexadecimal character reference: | #a | id=a | Ffox, IE11, Chrome, Safari | |
IRI as directly typed character: | #a | id=a | Ffox, IE11, Chrome, Safari | |
Table 6: Idref targets for test urls in table 5.
Target 1: | Target 2: |
Back to table 5 |
id=a
| |
|
|
|
|
id=%61
|
#4: Testing only id="ø"
(no test of id="%C3%B8"
)
This test ommits a directly typed idref (id="ø"
) fragment - this in order to test whether the issue manifestate itself when a matching directly typed idref is not present. Unfortunatly, in addition to the usual suspects (Webkit and Blinnk), the problem does then manifestate itself in IE11 (but not in Firefox!). However, to experience the problem in IE11, one must first click once, then pause for a second, before performing a secondary click on the link. This test probably shows that the main improvement IE11 has over Blink and Webkit is that it tries to match the UTF-8 decoded percent-encoding before it tries a literal match of the percent-encoded string.
Table 7: Test URLs
| URL test | Expected idref to be targeted | Actually targeted idref |
none | id="%C3%B8" |
IRI as URI: | #%C3%B8 | none | Ffox | IE11, Chrome, Safari |
IRI as directly typed character: | #ø | none | Ffox, IE11, Chrome | Safari |
Table 8: Idref targets for test urls in table 7.
Target |
Back to table 7 |
id=%C3%B8
|
#5: Testing id="h"
(but not for id="%68"
).
This test checks whether results are different from table 5.
Table 9: Test URLs
| URL test | Expected idref to be targeted | Actually targeted idref |
id=h |
IRI as URI: | #%68 | id=h | Ffox, IE11, Safari, Chrome | |
IRI as directly typed character: | #h | id=h | Ffox, IE11, Chrome, Safari | |
Table 10: Idref targets for test urls in table 9.
Target: |
Back to table 9 |
id=h
| |
#6: Testing id="%78"
(but not for id="x"
).
This test checks whether results are different from table 5. IE11 shows same behavior as in test set 4.
Table 11: Test URLs
| URL test | Expected idref to be targeted | Actually targeted idref |
none | id=%78 |
IRI as URI: | #%78 | none | Ffox | IE11, Safari, Chrome |
IRI as directly typed character: | #x | none | Ffox, IE11, Chrome, Safari | |
Table 12: Idref targets for test urls in table 11.
Target: |
Back to table 11 |
id=%78
| |