IRIs Beyond the Napkin: A Survey of Internationalized Resource Identifier Issues and Implementation

Talks

IRIs Beyond the Napkin: A Survey of Internationalized Resource Identifier Issues and Implementation

Add to calendar

Event details

Date:
Coordinated Universal Time
Location:
Santa Clara, CA, USA
Speakers:
Martin Dürst and Addison Phillips

If the Latin Alphabet is not your (or your customer's) main script, there are many good reasons for including non-Latin characters in a Web address (URL/URI). This presentation will tell you why, when, and how you can and should do this, and provide the necessary background to make things work for servers and clients.

Non-ASCII characters have been used in Web addresses for more than a decade. Such Web addresses have been called Internationalized Resource Identifiers (IRIs), and since 2005 have been specified in RFC 3987. Early this year, the IETF chartered a Working Group to update the RFC 3987.

The presentation will first explain the basic rules for working with IRIs, in particular the conversion to URIs via UTF-8 and percent-encoding. To provide a deeper understanding, we will then concentrate on the major issues that the IRI Working Group is working on addressing:

  • Moving from defining IRIs as a presentation element, while restricting protocols to using URIs, to defining IRIs as protocol elements on par with URIs.
  • Balancing between syntactical uniformity for long-term simplicity and backwards conformance with established browser behavior in particular for the domain name and fragment identifier parts of an IRI.
  • Moving the specification from a before-after descriptive style to a more procedural style that covers edge cases of implementations existing in the wild.
  • Comparing, normalization, and security issues for IRIs.
  • Restrictions and display advice for bidirectional IRIs.