On the IDN issues (on ISSUE-8)

We had a discussion at the f2f on how to treat IDN-s in IRI-s; remember the issue with http://résumé.example.org vs. http://xn--rsum-bpad.example.org. The question did come up whether we should modify our current approach to the equality of IRI-s and consider these two to be identical.

Just for completeness: this question does arise because various tools do behave differently in this respect. If I copy-paste http://résumé.example.org into my IRC client (Colloquy on Mac) then the client will translate it into http://xn--rsum-bpad.example.org and that will be displayed, whereas the client Sandro used on his Linux machine did not. In case you put http://résumé.example.org into the browser's address bar, you of course get an error message from the browser because the domain name cannot be resolved. That error message displays the original IRI on the error page and on the address bar in Safari or Firefox 4, but the converted version appears on Chrome. I am sure there are other discrepancies.

I have therefore contacted Felix Sasaki[1], who used to be on the W3C team in the internationalization area, and also has a good knowledge on RDF, to get his advice. Here is what he said:

[[[
You are safe if you refer to "RFC 3987[2] or its successor".

The update of RFC 3987 (the IRI spec) will harmonize what has diverged e.g. between XML (LEIRIs) and HTML5. But, as RFC 3987, it will not change the general role of IRIs: like URIs, they are a protocol element used to identify resources. Of course in practice HTTP URIs are often used with HTTP, and the step within clients before the DNS resolution more and more relies on IDN processing. That is anticipated in RFC 3987, but there is no strong connection.  Section 5.3.3 of RFC 3987 says 

Scheme-based normalization may also consider IDN components and their conversions to punycode as equivalent.  As an example,
   "http://résumé.example.org" may be considered equivalent to
   "http://xn--rsum-bpad.example.org".

As you can see, this statement does not contain even a MAY or a SHOULD. In terms of the relation between HTTP and DNS resolution versus IRI the identifier protocol element, and for the reasons of (browser and other clients) diversion stated above, I think this makes sense.

So my advice would be "just" to cite the IRI spec, and that's it. Note that other specifications who rely on "a protocol" for resolving resources (see the "doc"http://www.w3.org/TR/2010/REC-xpath-functions-20101214/#func-doc function in XPath 2.0 ) also don't go down the level to IDNs .
]]]

I would propose to go ahead with Felix' advice. I am not sure whether there are any other open issues with IRI-s, I vaguely remember issues around white spaces, but those can be settled.

Ivan

[1] http://www.sasakiatcf.com/felix/cv/
[2] http://www.apps.ietf.org/rfc/rfc3987.html

----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf

Received on Wednesday, 20 April 2011 07:56:07 UTC