Test results: IDN display

These tests check whether a user agent displays IDNs (Internationalized Domain Names) as Unicode or punycode in the status bar. User agents that try to detect possible homograph attacks do so in different ways. These tests explore some of those approaches. They are not exhaustive, and the results may change over time, since there is no standard for how to proceed in this respect, and some of the tests are based on lists that may change.

For more information about what to expect see the article An Introduction to Multilingual Web Addresses.

Summary & conclusions

See the results below for user agents tested. This section summarizes the results of those tests.

IE7 doesn't care about the TLD in the IDN, but it does prevent mixing of scripts in most cases. ASCII can be mixed with other scripts, but not with Cyrillic or Greek. Japanese kanji and hiragana and ASCII can be mixed.

IE7 does, however, produce different behaviour depending on which languages are declared in the browser preferences. The others do not. This means that if you are dealing with a language that is not defined in IE7's selection list, you will only see punycode.

Firefox allows any combination of characters, provided that the TLD is on their approved list.

Opera also allows any combination of characters for TLDs on its whitelist. For TLDs not on the whitelist, the situation is not always clear. From the tests here it seems that any combination of characters is also allowed for many TLDs not on the whitelist - not just Latin1 characters as stated in Opera's description. The exception is that combinations of Greek or Cyrillic with Latin characters are displayed as punycode if the TLD isn't on the whitelist. On the other hand, Devanagari is displayed as punycode if the TLD is not whitelisted, unless it is combined with ASCII or accented Latin characters (which seems odd).

Safari displays any IDN containing only characters from one or more scripts in the whitelist as Unicode, and any other IDN as punycode.

The famous pаypal.com IDN, which has a cyrillic a after the first p, is displayed as punycode by IE7 because it mixes disallowed scripts, by Firefox because .com is not on Firefox's list of supported TLDs, and by Safari because Cyrillic is not on the default whitelist of scripts. Opera, however, displays this as Unicode, since .com is on its whitelist.

Latest results

The following user agents were tested on Windows XP.

When Internet Explorer had only English set in the browser preferences, all of the tests produced punycode in the status bar. For the results below the following languages were set in the preferences: Russian, Japanese, German, Greek, Hindi.

At the time the tests were run, Firefox's list and the Opera whitelist in opera6.ini were configured as follows:

The columns represent user agents tested on a given version and date. The cells contain P if the test to the left produced punycode. The notes below the tables attempt explanations of certain aspects of the tests. For Firefox and Opera, color-coding is used to indicate which TLDs are in their whitelist (green) and which are not (red). For Safari, the same colours are use to indicate whether the IDN contains characters from a script not in the whitelist (red) or not (green).

Latin characters

IE7 Firefox Opera Safari
2.0.0.3 9.10 2.0.1
20070323 20070323 20070323 20070324
charþ.is - - - -
charő.hu - - - -
charþ.hu - - - -
charő.is - - - -
charþ.com - P - -
charő.com - P - -
charþ.xy - P - -
charő.xy - P - -
charþ.fi - - - -
charő.fi - - - -

Notes:

  1. Firefox: The .com TLD is not in Firefox's list of supported TLDs, and the .xy TLD doesn't exist, so IDN is not supported for those domains.
  2. Opera: The .is, .fi and .xy TLDs were not on the Opera whitelist, but the tests still produced Unicode characters in the status bar. Opera's explanation indicates that Latin1 Western European characters only are allowed in a non-whitelisted domain, but ő is an extended Latin character, so this may indicate that a wider range of Latin characters is acceptable.

Non-Latin characters

IE7 Firefox Opera Safari
2.0.0.3 9.10 2.0.1
20070323 20070323 20070323 20070324
кириллица.ru - P - P
ελληνικά.gr - - - P
漢字.jp - - - -
かな.jp - - - -
यूनिकोड.in - P P -
кириллица.fi - - - P
ελληνικά.fi - - - P
漢字.fi - - - -
यूनिकोड.fi - - P -
यूनिकोड.de - - - -
Հայերեն.de - - - -
Հայերեն.am - P - -
ภาษาไทย.th - - - -
ภาษาไทย.com - P - -
ህሔራዊነት.de P - - P
ህሔራዊነት.er P P - P

Notes:

  1. IE7: The Amharic text, in Ethiopic script, is displayed as Unicode. IE7 doesn't allow you to select Amharic from its list in the browser language preference settings, but adding a user-defined 'am' (the language subtag for Amharic) doesn't help either.
  2. Firefox: The .am, .com, .er, .ru and .in TLDs are not in Firefox's list of supported TLDs, so IDN is not supported for those domains. Note that the characters supported for a given TLD bear no relation to the characters defined as valid by that TLD.
  3. Opera: Although .am, .th, .er, .fi, .gr and .ru are not on the whitelist, the Armenian, Thai, Ethiopic, Japanese, Greek and Cyrillic characters are displayed as Unicode. This and most of the other results indicate that non-whitelisted TLDs support far more than just Latin1. The oddity is the Devanagari script, which is displayed as punycode when the TLD is not on the whitelist. This is particularly strange, given that combinations of Devanagari and Latin text in the next test display as Unicode.
  4. Safari: Cyrillic and Greek are not in the default script whilelist, so all examples containing Cyrillic and Greek are shown as punycode. Ethiopic is also absent from the whitelist, and so also appears as punycode.

Non-Latin characters mixed with Latin

IE7 Firefox Opera Safari
2.0.0.3 9.10 2.0.1
20070323 20070323 20070323 20070324
кириллицаascii.ru P P P P
ελληνικάascii.gr P - P P
漢字ascii.jp - - - -
かなascii.jp - - - -
यूनिकोडascii.in - P - -
кириллицаascii.de P - - P
ελληνικάascii.de P - - P
漢字ascii.de - - - -
かなascii.de - - - -
यूनिकोडascii.de - - - -
кириллицchará.ru P P P P
ελληνικάchará.gr P - P P
漢字chará.jp P - - -
かなchará.jp P - - -
यूनिकोडchará.in P P - -
кириллицchará.de P - - P
ελληνικάchará.de P - - P
漢字chará.de P - - -
かなchará.de P - - -
यूनिकोडchará.de P - - -
pаypal.com P P - P

Notes:

  1. IE7: Cyrillic and Greek scripts are not on the list of scripts that IE supports when mixed with ASCII characters. The others are. Only ASCII can be mixed with other scripts.
  2. Firefox: .ru and .in are not in the list of supported TLDs for Firefox. The rest are. Firefox makes no special case for mixing these scripts with ASCII.
  3. Opera: .ru and .gr TLDs (not on the whitelist) display ASCII mixed with Cyrillic and Greek scripts as punycode. Note that in the previous test similar, non-mixed IDNs were displayed as Unicode, so the mixture is significant. Mixed scripts are not an issue in the .de TLD, which is on the whitelist. The .in TLD (also not on the whitelist) displays characters as Unicode, which is odd since in the previous test the Devanagari was displayed as punycode.
  4. Safari: Cyrillic and Greek are not in the default script whilelist, so all examples containing Cyrillic and Greek are shown as punycode.

Kanji and kana characters mixed

IE7 Firefox Opera Safari
2.0.0.3 9.10 2.0.1
20070323 20070323 20070323 20070324
漢字かな.jp - - - -
漢字かな.de - - - -
漢字かな.ru - P - -
漢字かな.in - P - -
漢字かなascii.jp - - - -
漢字かなchará.jp P - - -

Notes:

  1. IE7: Supports mixing of kanji and hiragana and ASCII.
  2. Firefox: .ru and .in are not in the list of supported TLDs for Firefox. The rest are.
  3. Opera: .ru and .in TLDs are not on the whitelist.

Non-Latin mixtures

IE7 Firefox Opera Safari
2.0.0.3 9.10 2.0.1
20070323 20070323 20070323 20070324
кириллица漢字.ru P p P P
кириллица漢字.jp P - - P
यूनिकोड漢字.in P P P -
यूनिकोड漢字.jp P - - -
ελληνικά漢字.jp P - - P
ελληνικά漢字.gr P - P P

Notes:

  1. IE7: Only ASCII can be mixed with another script (with the exception of mingled Japanese scripts), so all of these are displayed as punycode.
  2. Firefox: .ru and .in are not in the list of supported TLDs for Firefox. The rest are.
  3. Opera: .ru, .gr and .in TLDs are not on the whitelist.
  4. Safari: Cyrillic and Greek are not in the default script whilelist, so all examples containing Cyrillic and Greek are shown as punycode.

Unusual characters

IE7 Firefox Opera Safari
2.0.0.3 9.10 2.0.1
20070323 20070323 20070323 20070324
example.com⁄foo.museum P P illegal -
I♥NY.museum P - - -

Notes:

  1. Opera: Rather than displaying punycode for example.com⁄foo.museum in the status bar, Opera displays "opera:illegal-uri-0".