W3C webarch section/Identifiers

From W3C Wiki

Identifiers are the most visible part of the Web’s inner workings. You probably see them every day scratched on pieces of paper, flashing by in advertisements, and in the address bar of your browser.

Identifiers on the Web serve two purposes. On one hand, they are global names that help machines and people refer to the same things unambiguously. On the other hand, they locate things in the Web: by knowing just the identifier of a page, your Web browser can fetch and display it for you from wherever it happens to be located on the globe. This is why identifiers are sometimes informally called addresses.

What are URI and URL?

URI stands for Uniform Resource Identifier. A URI is a string of characters that identifies a Web page or other thing. For example, http://www.w3.org/standards/webarch/identifiers is the URI of the page you are reading right now. Programs, devices, and people can all refer to this page by this URI, which lets them understand each other.

On the Web, we mostly deal with a kind of URIs known as URLs, or Uniform Resource Locators. A URL tells a computer how to reach information via the Internet. The URI given above is also an example of a URL:

  • the http:// part is the scheme, and it tells the computer to use the HTTP protocol;
  • the www.w3.org part is the hostname, and it pinpoints the Web server’s location in the global network;
  • the /standards/webarch/identifiers part is the path, and it points to a particular page among the thousands of pages that this Web server holds.

The hostname part of such URLs relies on the Internet’s Domain Name System (DNS). Anyone can register a name in this global system, and thus obtain authority over a section of the URL space. This provides a clear, decentralized mechanism for managing URLs and avoiding collisions.

URIs are used for many things beyond Web pages. They are integral to the XML ecosystem, where they identify language elements in an open-ended, extensible way. A URI’s ability to combine name and location in one short string is important to the Linked Data effort, which aims to connect together machine-readable descriptions of things, so that computers can learn about the world and provide better service to humans.

What is IRI?

A URI can only contain Latin letters, digits and a few other characters. This makes them unsuitable for naming things in most of the world’s languages. IRI, the Internationalized Resource Indentifier, lifts this restriction, allowing the entire Unicode repertoire of characters to be used. Behind the scenes, IRIs are usually converted to plain URIs for the sake of older systems that can’t handle Unicode. Thus users can seamlessly benefit from IRIs today.

What is Web Services Addressing?

Web services have special requirements for identifying communicating parties and the messages that pass through the system. The Web Services Addressing suite of specifications covers these needs, with IRIs at its core.

Learn More

Tim Berners-Lee has written an article on why cool URIs don’t change. The “Choose URIs wisely” and “Managing URIs” articles from W3C QA provide further guidance on how to best exploit the nature of URIs.

Current Status of Specifications

URIs and IRIs are defined by IETF, the Internet Engineering Task Force, in RFC 3986 and RFC 3987, respectively. For supplementary documents published by the W3C, see URI Current Status.

See also Web Services Addressing Current Status.