W3C

Architecture of the World Wide Web, First Edition

<span class="trcopy"> W3C Working Editor's Draft 9 December 2003 10 May 2004

This version:
<a class="trcopy" href="http://www.w3.org/TR/2003/WD-webarch-20031209/" shape="rect"> http://www.w3.org/TR/2003/WD-webarch-20031209/ http://www.w3.org/2001/tag/2004/webarch-20040510/
<dt class="trcopy">
Latest version: editor's draft:
<dd class="trcopy">
<a href="http://www.w3.org/TR/webarch/" shape="rect"> http://www.w3.org/TR/webarch/ http://www.w3.org/2001/tag/webarch/
Previous version:
<a class="trcopy" href="http://www.w3.org/TR/2003/WD-webarch-20031001/" shape="rect"> http://www.w3.org/TR/2003/WD-webarch-20031001/ http://www.w3.org/2001/tag/2004/webarch-20040507/
Latest version:
http://www.w3.org/TR/webarch/
Editor:
Ian Jacobs, W3C
Authors:
See acknowledgments .

Abstract

The World Wide Web is a network-spanning information space of resources interconnected by links. This information space is the basis of, and is shared by, a number of information systems. Within each of these systems, agents (people and software) retrieve, create, display, analyze, and reason about resources.

Web architecture includes the definition of the information space in terms of identification and representation of its contents, and of the protocols that support the interaction of agents in an information system making use of the space. Web architecture is influenced by social requirements and software engineering principles, leading principles . These lead to design choices that constrain and constraints on the behavior of systems using that use the Web in order to achieve desired properties of the shared information space: efficiency, scalability, and the potential for indefinite growth across languages, cultures, and media. Good practice by agents in the system is also important to the success of the system. This document reflects the three bases of Web architecture: identification, interaction, and representation.

Status of this document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is the 9 December 2003 Last Call Working 10 May 2004 Editor's Draft of "Architecture of the World Wide Web, First Edition." The Last Call review period ends 5 March 2004, at 23:59 ET. Please send Last Call review comments on this document before This draft takes into account a few additional TAG resolutions that date to were omitted from the 7 May draft; see the deleted text: public W3C TAG mailing list public-webarch-comments@w3.org ( archive ). Last Call Working Draft status is described in <a href="http://www.w3.org/2003/06/Process-20030618/tr.html#last-call" shape="rect"> section 7.4.2 </a> of the W3C Process Document. ) .

This document has been developed by W3C's Technical Architecture Group (TAG) ( charter ). deleted text: The TAG decided unanimously to advance to Last Call at their 4 Dec 2003 teleconference ( <a href="http://www.w3.org/2003/12/04-tag-summary#lcdecision" shape="rect"> minutes </a> ). A complete list of changes to this document since the first public Working Draft is available on the Web.

The TAG charter describes a process for issue resolution by the TAG. In accordance with those provisions, the TAG maintains a running issues list . The First Edition of "Architecture of the World Wide Web" does not address every issue that the TAG has accepted since it began work in January 2002. The TAG has selected a subset of issues that the First Edition does address to the satisfaction of the TAG; those issues are identified in the TAG's issues list. The TAG intends to address the remaining (and future) issues after publication of the First Edition as a Recommendation.

This document uses the concepts and terms regarding URIs as defined in draft-fielding-uri-rfc2396bis-03, preferring them to those defined in RFC 2396. The IETF Internet Draft draft-fielding-uri-rfc2396bis-03 draft-fieldi ng-uri-rfc2396bis-03 is expected to obsolete RFC 2396 , which is the current URI standard. The TAG is tracking the evolution of draft-fielding-uri-rfc2396bis-03.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than "work in progress." The latest information regarding patent disclosures related to this document is available on the Web.

Table of Contents

List of Principles Principles, Constraints, and Good Practice Notes

The following principles principles, constraints, and good practice notes explained are discussed in this document are and listed here for convenience. There is also a free-standing summary .

General Architecture Principles
<ol>
Identification
<ol>
Interaction
<ol>
Data Formats
<ol>

1. Introduction

The World Wide Web ( WWW </acronym>, , or simply Web) Web ) is an information space in which the items of interest, referred to as resources , are identified by global identifiers called Uniform Resource Identifiers ( URIs URI ).

A travel scenario is used throughout this document to illustrate typical behavior of Web agents — people or software (on behalf of a person, entity, or process) acting on this information space. Software agents include servers, proxies, spiders, browsers, and multimedia players.

Story

While planning a trip to Mexico, Nadia reads "Oaxaca weather information: 'http://weather.example.com/oaxaca'" in a glossy travel magazine. Nadia has enough experience with the Web to recognize that "http://weather.example.com/oaxaca" is a URI. Given the context in which the URI appears, she expects that it allows her to access weather information. When Nadia enters the URI into her browser:

  1. The browser performs an information retrieval action in accordance with its configured behavior for resources identified via the "http" URI scheme.
  2. The authority responsible for "weather.example.com" provides information in a response to the retrieval request.
  3. The browser displays the retrieved information, which includes hypertext links to other information. Nadia can follow these hypertext links to retrieve additional information.

This scenario illustrates the three architectural bases of the Web that are discussed in this document:

  1. Identification . Each resource is identified by a URI. In this travel scenario, the resource is about a periodically-updated report on the weather in Oaxaca Oaxaca, and the URI is "http://weather.example.com/oaxaca".
  2. Interaction . Protocols define the syntax and semantics of messages exchanged by agents over a network. Web agents communicate information about the state of a resource through the exchange of representations . In the travel scenario, Nadia (by clicking on a hypertext link ) tells her browser to request a representation of the resource identified by the URI in the hypertext link. The browser sends an HTTP GET request to the server at "weather.example.com". The server responds with a representation that includes XHTML data and the Internet Media Type "application/xml+xhtml". media type "application/xhtml+xml".
  3. Formats . Representations are built from a non-exclusive set of data formats, used separately or in combination (including XHTML, CSS, PNG, XLink, RDF/XML, SVG, and SMIL animation). In this scenario, the representation data format is XHTML. While interpreting the XHTML representation data, the browser retrieves and displays weather maps identified by URIs within the XHTML.

The following illustration shows the relationship between identifier, resource, and representation.

A resource (Oaxaca Weather Info) is identified by a particular URI and is represented by pseudo-HTML content

deleted text: </div> <div class="section"> <h3> 1.1. <a name="about" id="about" shape="rect"> About this Document </a> </h3>

This document In the remainder of this document, we highlight important architectural points regarding Web identifiers, protocols, and formats.

1.1. About this Document

This document describes the properties we desire of the Web and the design choices that have been made to achieve them.

This document promotes re-use of existing standards when suitable, and gives guidance on how to innovate in a manner consistent with the Web architecture.

The terms MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY are used in the principles, constraints, and good practice notes, principles, etc. notes in accordance with RFC 2119 [ RFC2119 ]. However, this document does not include conformance provisions for deleted text: at least these reasons:

1.1.1. Audience of this Document

This document is intended to inform discussions about issues of Web architecture. The intended audience for this document includes:

  1. Participants in W3C Activities; i.e., developers designers of Web technologies and specifications in W3C
  2. Other groups and individuals developing designing technologies to be integrated into the Web
  3. Implementers of W3C specifications
  4. Web content authors and publishers

Readers will benefit from familiarity with the Requests for Comments ( RFC ) series from the IETF , some of which define pieces of the architecture discussed in this document.

Note: This document does not distinguish in any formal way the terms "language" and "format." Context determines which term is used. The phrase "specification designer" encompasses language, format, and protocol designers.

1.1.2. Scope of this Document

This document presents the general architecture of the Web. Other groups inside and outside W3C also address specialized aspects of Web architecture, including accessibility, internationalization, device independence, and Web Services. The section on Architectural Specifications includes references.

This document strikes a balance between brevity and precision while including illustrative examples. TAG findings are informational documents that complement the current document by providing more detail about selected topics. This document includes some important material excerpts from the findings. Since the findings evolve independently, this document also includes references to approved TAG findings. For other TAG issues covered by this document but without an approved finding, references are to entries in the TAG issues list .

Many of the examples in this document involve human activity suppose the familiar Web interaction model where a person follows a link via a user agent, the user agent retrieves and presents data, the user follows another link, etc. This document does not discuss in any detail other interaction models such as voice browsing. For instance, when a graphical user agent running on a laptop computer or hand-held device encounters an error, the user agent can report errors directly to the user through visual and audio cues, and present the user with options for resolving the errors. On the other hand, when someone is browsing the Web through voice input and audio-only output, stopping the dialog to wait for user input may reduce usability since it is so easy to "lose one's place" when browsing with only audio-output. This document does not discuss how the principles, constraints, and good practices identified here apply in all interaction contexts.

1.1.3. Principles, Constraints, and Good Practice Notes

The important points of this document are categorized as follows:

<a name="cat-constraint" id="cat-constraint" shape="rect"> Constraint Principle
An architectural constraint principle is a restriction in behavior or interaction within the system. Constraints may be imposed for technical, policy, or other reasons. </dd> <dt> <a name="cat-design" id="cat-design" shape="rect"> Design Choice </a> </dt> <dd> In the design of the Web, some design choices, like the names fundamental rule that applies to a large number of the <p> situations and <li> variables. Architectural principles include "separation of concerns", "generic interface", "self-descriptive syntax," "visible semantics," "network effect" (Metcalfe's Law), and Amdahl's Law: "The speed of a system is limited by its slowest component."
Constraint
In the design of the Web, some design choices, like the names of the p and li elements in HTML, or the choice of the colon (:) character in URIs, are somewhat arbitrary; if deleted text: <par>, <elt>, or * paragraph had been chosen instead, instead of p or asterisk (*) instead of colon, the large-scale result would, most likely, have been the same. Other design choices are more fundamental; these are the focus of this document. Design choices can lead to constraints, i.e., restrictions in behavior or interaction within the system. Constraints may be imposed for technical, policy, or other reasons to achieve certain properties of the system, such as accessibility and global scope, and non-functional properties, such as relative ease of evolution, re-usability of components, efficiency, and dynamic extensibility.
Good practice
Good practice — by software developers, content authors, site managers, users, and specification writers designers — increases the value of the Web.
deleted text: <dt> <a name="cat-principle" id="cat-principle" shape="rect"> Principle </a> </dt> <dd> An architectural principle is a fundamental rule that applies to a large number of situations and variables. Architectural principles include "separation of concerns", "generic interface", "self-descriptive syntax," "visible semantics," "network effect" (Metcalfe's Law), and Amdahl's Law: "The speed of a system is determined by its slowest component." </dd> <dt> <a name="cat-property" id="cat-property" shape="rect"> Property </a> </dt> <dd> Architectural properties include both the functional properties achieved by the system, such as accessibility and global scope, and non-functional properties, such as relative ease of evolution, re-usability of components, efficiency, and dynamic extensibility. </dd>

This categorization is derived from Roy Fielding's work on "Representational State Transfer" [ REST ]. deleted text: Authors of protocol specifications in particular should invest time in understanding the REST model and consider the role to which of its principles could guide their design: statelessness, clear assignment of roles to parties, uniform address space, and a limited, uniform set of verbs.

1.2. General Architecture Principles

A number of general architecture principles apply to deleted text: across all three bases of Web architecture.

1.2.1. <a id="orthogonal-specs" name="orthogonal-specs" shape="rect"> Orthogonal Independent Specifications

Identification, interaction, and representation are orthogonal independent (or, "independent", "orthogonal", or "loosely coupled") concepts: an identifier can be assigned

  • one identifies a resource with a URI. One may publish and use a URI without knowing what building any representations are available, agents can interact with of the resource or determining whether any deleted text: identifier, and representations can are available.
  • a generic URI syntax allows agents to function in many cases without knowing specifics of URI schemes.
  • in many cases one may change the representation of a resource without regard disrupting references to the identifiers or interactions that may dereference them. </p> resource.

Orthogonality in Independence of specifications facilitates a flexible design that can evolve over time. The fact, for For example, that the one may refer to an image can be identified using with a URI without needing any information worrying about the representation format chosen to represent the image. This independence has allowed the introduction of deleted text: that image allowed formats such as PNG and SVG without disrupting references to deleted text: evolve independent of the specifications that define image elements. resources.

Orthogonal Independent abstractions deserve orthogonal benefit from independent specifications. Specifications should clearly indicate those features that simultaneously access information from otherwise orthogonal independent abstractions. For example a specification should draw attention to a feature that requires information from both the header and the body of a message.

Although the HTTP, HTML, and URI specifications are orthogonal independent for the most part, they are not completely orthogonal. independent. Experience demonstrates that where they are not orthogonal, not, problems have arisen:

  • The HTML specification includes a protocol extension of sorts: it specifies how a user agent sends HTML form data to a server (as a URI query string). The design works reasonably well, although there are limitations related to internationalization (see the TAG finding " URIs, Addressability, and the use of HTTP GET and POST " ) and the query string design impinges on the server design. Developers Software developers (for example, of [ CGI ] applications) might have an easier time finding the specification if it were published separately and then cited from the HTTP, URI, and HTML specifications.
  • The HTML specification allows content providers to instruct HTTP servers to build response headers from META element instances. This is an abstraction violation; the software developer community deserves to be would benefit from being able to find all HTTP headers from the HTTP specification (including any associated extension registries and specification updates per IETF process). Perhaps as a result, this feature of the HTML specification is not widely deployed. Furthermore, this design has led to confusion in user agent development. The HTML specification states that META in conjunction with http-equiv is intended for HTTP servers, but many HTML user agents interpret http-equiv='refresh' as a client-side instruction.
  • Some content authors use the META / http-equiv approach to declare the character encoding scheme of an HTML document. By design, this is a hint that an HTTP server should emit a corresponding "Content-Type" header field. In practice, the use of the hint in servers is not widely deployed. Furthermore, many user agents use this information to override the "Content-Type" header sent by the server. This works against the principle of authoritative representation metadata .

1.2.2. Extensibility

The information in the Web and the technologies used to represent that information change over time. Some examples of successful technologies designed to allow change while minimizing disruption include:

  • the fact that URI schemes are independently specified, specified;
  • the use of an open set of Internet media types in mail and HTTP to specify document interpretation, interpretation;
  • the separation of the generic XML grammar and the open set of XML namespaces for element and attribute names, names;
  • Extensibility extensibility models in Cascading Style Sheets (CSS), XSLT 1.0, and SOAP SOAP;
  • user agent plug-ins plug-ins.

The following applies to languages, in particular Below we discuss the specifications property of "extensibility," exhibited by URIs and some data formats, of and message formats, deleted text: and URIs. <strong> Note: </strong> This document does not distinguish in any formal way the terms "format" and "language." Context has determined which term is used. promotes technology evolution and interoperability.

Language subset : one language is a subset (or, "profile") of a second language if any document in the first language is also a valid document in the second language and has the same interpretation in the second language.

Language extension : one language is an extension of a second language if the second is a language subset of the first (thus, the extension is a superset). Clearly, creating an deleted text: extension language extension is better for interoperability than creating an incompatible language.

Ideally, many instances of a superset language can be safely and usefully processed as though they were in the subset language. language subset. Languages that exhibit this property are said to be "extensible." Language designers can facilitate extensibility by defining how implementations must handle unknown extensions -- for example, that they be ignored (in some way) or should be considered errors.

For example, from early on in the Web, HTML agents followed the convention of ignoring unknown elements. This choice left room for innovation (i.e., non-standard elements) and encouraged the deployment of HTML. However, interoperability problems arose as well. In this type of environment, there is an inevitable tension between interoperability in the short term and the desire for extensibility. Experience shows that designs that strike the right balance between allowing change and preserving interoperability are more likely to thrive and are less likely to disrupt the Web community. <a href="#orthogonal-specs" shape="rect"> Orthogonal Independent specifications help reduce the risk of disruption.

For further discussion, see the section on versioning and extensibility . See also TAG issue xmlProfiles-29 .

1.2.3. Error Handling

Errors occur in networked information systems. The manner in which they are dealt with depends on application context. A user agent acts on behalf of the user and therefore is expected to help the user understand the nature of errors, and possibly overcome them. User agents that correct errors without the consent of the user are not acting on the user's behalf.

Principle: Error recovery

Silent Agent recovery from error without user consent is harmful.

Consent does not necessarily imply that the receiving agent must interrupt the user and require selection of one option or another. The user may indicate through pre-selected configuration options, modes, or selectable user interface toggles, with appropriate reporting to the user when the agent detects an error.

To promote interoperability, specifications specification designers should set expectations about behavior in the face of known error conditions. Experience has led to the following observations about error-handling approaches.

  • Protocol designers should provide enough information about the error condition so that deleted text: a an agent can address the error condition. For instance, an HTTP 404 message ("resource not found") is useful because it allows user agents to present relevant information to users, enabling them to contact the author representation provider in case of the representation that included the (broken) link. </li> <li> Experience with problems.
  • Experience with the cost of building a user agent to handle the diverse forms of ill-formed HTML content convinced the authors designers of the XML specification to require that agents fail deleted text: deterministically upon encountering ill-formed content. Because users are unlikely to tolerate such failures, this design choice has pressured all parties into respecting XML's constraints, to the benefit of all.
  • An agent that encounters unrecognized content may handle it in a number of ways, including as an error; see also the section on extensibility and versioning .
  • Error behavior that is appropriate for a person may not be appropriate for software. People are capable of exercising judgement in ways that software applications generally cannot. An informal error response may suffice for a person but not for a processor.

See the TAG issues contentTypeOverride-24 and errorHandling-20 .

1.2.4. Protocol-based Interoperability

The Web follows Internet tradition in that its important interfaces are defined in terms of protocols, by specifying the syntax, semantics, and sequence of the messages interchanged. The technology shared among Web agents lasts longer than the agents themselves.

It is common for programmers working with the Web to write code that generates and parses these messages directly. It is less common, but not unusual, for end users to have direct exposure to these messages. This leads It is often desirable to the well-known "view source" effect, whereby provide users with access to format and protocol details: allowing them to " view source ," whereby they may gain expertise in the workings of the deleted text: systems by direct exposure to the underlying protocols. system.

2. Identification

Parties who wish to communicate effectively must agree (to a reasonable extent) upon a shared set of identifiers and on their meanings. The ability to use common identifiers across communities motivates global identifiers in Web architecture. Thus, Uniform Resource Identifiers ([ URI ], currently being revised) which are global identifiers in the context of the Web, are central to Web architecture.

Constraint: Identify with URIs

The identification mechanism for the Web is the URI.

A URI must be assigned to a resource in order for agents to be able to refer to the resource. It follows that a resource should be assigned a URI if a third party might reasonably want to link to it, make or refute assertions about it, retrieve or cache a representation of it, include all or part of it by reference into another representation, annotate it, or perform other operations on it. </p> <p> When a <a href="#def-representation" shape="rect"> representation </a> uses a URI (instead Formats that allow content authors to use URIs instead of deleted text: a local identifier) as an identifier, then it gains great power from the vastness of the choice of resources to which it can refer. The phrase identifiers foster the "network effect" describes the fact that effect": the usefulness value of the technology is dependent on these formats grows with the size of the deployed Web.

Resources exist before URIs; a resource may be identified by zero URIs. However, there are many benefits to assigning a URI to a resource, including linking, bookmarking, caching, and indexing by search engines. Designers Software developers should expect that it will prove useful to be able to share a URI across applications, even if that utility is not initially evident.

The scope of a URI is global; the resource identified by a URI does not depend on the context in which the URI appears (see also the section about URIs in other roles ). Of course, what an agent does with a URI may vary. The TAG finding " URIs, Addressability, and the use of HTTP GET and POST " discusses additional benefits and considerations of URI addressability.

Principle: URI assignment

A resource owner SHOULD One should assign a URI to each resource anything that others will expect to refer to.

This principle dates back at least as far as Douglas Engelbart's seminal work on open hypertext systems; see section Every Object Addressable in [ Eng90 ].

2.1. URI Comparisons

The most straightforward way of establishing that two parties are referring to the same resource is to compare, character-by-character, the URIs they are using. Two URIs that are identical (character for character) refer to the same resource. However, Web architecture allows resource owners people to assign more than one URI to a resource.

Constraint: <a name="design-mult-URI" id="design-mult-URI" shape="rect"> URI uniqueness multiplicity

Web architecture does not constrain a deleted text: Web resource to be identified by a single URI.

Thus, Consequently, two URIs that are not identical (character for character) can still refer to the same resource (i.e., they do not necessarily refer to different resources. The most straightforward way resources).

To reduce the risk of establishing a false negative comparison (i.e., an incorrect conclusion that two parties are referring URIs do not refer to the same Web resource resource) or a false positive comparison (i.e., an incorrect conclusion that two URIs do refer to the same resource), certain specifications license applications to apply tests in addition to character-by-character comparison. For example, for "http" URIs, the authority component (the part after "//" and before the next "/") is defined to compare, as character strings, be case-insensitive. Thus, the URIs they are using. "http" URI equivalence is discussed specification licenses applications to conclude that authority components in section 6 of [ two "http" URIs are equivalent when those strings are character-by-character equivalent or differ only by case. By following the "http" URI specification, agents are licensed to conclude that "http://Weather.Example.Com/Oaxaca" and "http://weather.example.com/Oaxaca" identify the same resource.

Agents that reach conclusions based on comparisons that are not licensed by relevant specifications take responsibility for any problems that result. Agents should not assume, for example, that "http://weather.example.com/Oaxaca" and "http://weather.example.com/OAXACA" identify the same resource, since none of the specifications involved states that the path component of an "http" URI is case-insensitive.

Section 6 [ URI ] provides more information about comparing URIs and reducing the risk of false negatives and positives. See the section below on approaches other than string comparison that allow different parties to assert that two URIs identify the same resource .

<div class="boxedtext"> <p> <span class="practicelab"> Good practice:

2.1.1. URI aliases Aliases </span> </p> <p class="practice"> Resource owners should not create arbitrarily different

There are many benefits to ensuring that software can determine, by following specifications, that two URIs for refer to the same resource. deleted text: </p> </div> <p> URI producers should be conservative about the number of different URIs they produce for the same resource. resource, especially when software cannot determine the equivalence of those URIs. For example, the parties responsible for weather.example.com should not use both "http://weather.example.com/Oaxaca" and "http://weather.example.com/oaxaca" to refer to the same resource; agents software will not detect the equivalence relationship by following specifications. On

Good practice: Avoiding URI aliases

A URI owner should not create arbitrarily different URIs for the other hand, there may same resource.

There may, of course, be good reasons for creating similar-looking URIs. For instance, one might reasonably create URIs that begin with "http://www.example.com/tempo" and "http://www.example.com/tiempo" to provide access to resources by users who speak Italian and Spanish.

Likewise, URI consumers should ensure URI consistency. For instance, when transcribing a URI, agents should not gratuitously escape characters. The term "character" refers to URI characters as defined in section 2 of [ URI ].

Good practice: Consistent URI usage

If a URI has been assigned to a resource, agents SHOULD refer to the resource using the same URI, character for character.

Applications may apply rules beyond basic string comparison that are licensed by specifications When a URI alias does become common currency, the URI owner should use protocol techniques such as server-side redirects to reduce connect the risk of false negatives and positives. For example, for "http" URIs, the authority component is case-insensitive. Agents that reach conclusions based on comparisons that are not licensed by relevant specifications take responsibility for any problems that result. Agents should not assume, for example, that "http://weather.example.com/Oaxaca" and "http://weather.example.com/oaxaca" identify the same resource, since none of the specifications involved states that two resources. The community benefits when the deleted text: path part of an "http" URI is case-insensitive. </p> <p> See section 6 [ <a href="#URI" shape="rect"> URI </a> ] for more information about comparing URIs and reducing owner supports both the risk of false negatives "unofficial" URI and deleted text: positives. See the section on future directions for approaches other than string comparison that may allow different parties to <a href="#future-comparison" shape="rect"> assert that two URIs identify the same resource </a>. alias.

2.2. <a name="uri-ownership" id="uri-ownership" shape="rect"> URI Ownership Overloading

The requirement for URIs to be <a href="#URI-ambiguity" shape="rect"> unambiguous </a> demands that At times, different agents do not assign intentionally or unintentionally use the same URI to identify different resources. <a href="#URI-scheme" shape="rect"> URI scheme overloading specifications assure this using a variety of techniques, including: </p> <ul> <li> Hierarchical delegation of authority. This approach, exemplified by refers to the "http" and "mailto" schemes, allows use, in the assignment context of a part Web protocols and formats, of one URI deleted text: space to refer to more than one party, reassignment of resource. Just as promoting a piece of that space to another, and so forth. </li> <li> Random numbers. The generation of shared vocabulary has tangible value, overloading often imposes a fairly large random number, used cost in the "uuid" scheme, reduces the risk of ambiguity to a calculated small risk. </li> <li> Checksums. The generation of communication.

Suppose that one organization uses a URI deleted text: as a checksum based on a data object has similar properties their site to refer to the random number approach. This is the approach taken by the "md5" scheme. </li> <li> Combination of approaches. The "mid" movie "The Sting", and "cid" schemes combine some of the above approaches. </li> </ul> <p> The approach taken for another organization uses the "http" same URI scheme follows the pattern whereby to refer to a resource that talks about "The Sting." Inconsistent use of the Internet community delegates authority, via URI creates confusion about what the deleted text: IANA URI scheme registry [ <a href="#IANASchemes" shape="rect"> IANASchemes </a> ] and identifies. In many contexts, inconsistent use may not lead to error or cause harm. However, in some contexts such as the DNS, over a set Semantic Web, software relies on consistent use of URIs with a common prefix to URIs. If one particular owner. One consequence of this approach is wanted to talk about the Web's heavy reliance on creation date of the central DNS registry. </p> <p> Whatever resource identified by the techniques used, except URI, for instance, it would not be clear whether this meant "when the checksum case, movie created" or "when the agent has a unique relationship with resource about the URI, called <a name="def-uri-ownership" id="def-uri-ownership"> movie was created."

<dfn> Good practice: Avoiding URI ownership </dfn> Overloading </a>. The phrase "authority responsible for a URI" is synonymous with "URI owner" in this document.

Avoid URI overloading.

The social implications of URI ownership are not discussed here. However, the success or failure of these different approaches depends on the extent to which there is consensus in the Internet community section below on abiding by the defining specifications. The concept of URI ownership is especially visible in the case of the HTTP protocol, which enables examines approaches for establishing the deleted text: URI owner to serve <a href="#authoritative-metadata" shape="rect"> authoritative representations </a> source of information about what resource a deleted text: resource. In this case, the HTTP origin server (defined in [ <a href="#RFC2616" shape="rect"> RFC2616 </a> ]) is the agent acting on behalf of the URI owner. </p> </div> <div class="section"> <h3> 2.3. <a name="URI-ambiguity" id="URI-ambiguity" shape="rect"> URI Ambiguity </a> </h3> <p> Just as a shared vocabulary has tangible value, the ambiguous use of terms imposes a cost in communication. <a name="def-uri-ambiguity" id="def-uri-ambiguity"> <dfn> URI ambiguity </dfn> </a> refers to the use of the same URI to refer to more than one distinct resource. </p> <div class="boxedtext"> <p> <span class="practicelab"> Good practice: <a name="pr-uri-ambiguity" id="pr-uri-ambiguity" shape="rect"> URI ambiguity </a> </span> </p> <p class="practice"> Avoid URI ambiguity. </p> </div> <p> URI ambiguity should not be confused with ambiguity in natural language. The English statement "'http://www.example.com/moby' identifies 'Moby Dick'" is ambiguous because one could understand the phrase "Moby Dick" to refer to distinct resources: a particular printing of this work, or the work itself in an abstract sense, or the fictional white whale, or a particular copy of the book on the shelves of a library (via the Web interface of the library's online catalog), or the record in the library's electronic catalog which contains the metadata about the work, or the <a href="http://ibiblio.org/gutenberg/etext01/moby10b.txt" shape="rect"> Gutenberg project's online version </a>. identifies.

2.3.1. 2.2.1. URIs in other Roles

In Web architecture, URIs identify resources. Outside the bounds context of Web architecture specifications, URIs can be useful for other purposes, for example, as database keys. For instance, the organizers of a conference might use "mailto:nadia@example.com" to refer to Nadia. While this usage is not licensed by Web architecture specifications, in the context of the conference, all parties may agree to that local policy and understand one another. Certain properties of URIs, such as their potential for global uniqueness, make them appealing as general-purpose identifiers. In the Web architecture, "mailto:nadia@example.com" identifies an Internet mailbox; that is what is licensed by the "mailto" URI scheme specification. The fact that the URI serves other purposes in non-Web contexts does not lead to URI ambiguity. overloading. URI ambiguity overloading arises when a URI is used to identify two different <em> resources within the context of Web </em> resources. protocols and formats.

2.4. <a name="URI-scheme" id="URI-scheme" shape="rect"> 2.3. URI Schemes Ownership

In The requirement that URIs not be overloaded (explained below) demands that different agents do not assign the same URI "http://weather.example.com/", to different resources. URI scheme specifications assure this using a variety of techniques, including:

The approach taken for the "http" URI scheme requires follows the development and deployment not only of client software to handle pattern whereby the scheme, but also of ancillary agents such as gateways, proxies, and caches. See Internet community delegates authority, via the IANA URI scheme registry [ <a href="#RFC2718" shape="rect"> RFC2718 IANASchemes ] deleted text: for other considerations and costs related the DNS, over a set of URIs with a common prefix to URI scheme design. one particular owner. One consequence of this approach is the Web's heavy reliance on the central DNS registry.

Because of these costs, if Except when a URI scheme exists that meets the needs is constructed from a checksum, all of an application, designers should use it rather than invent one. </p> <div class="boxedtext"> <p> the techniques seek to establish a unique relationship between a social entity and a URI. This relationship is called <span class="practicelab"> Good practice: <a name="pr-new-scheme-expensive" id="pr-new-scheme-expensive" shape="rect"> New URI schemes </a> </span> ownership </p> <p class="practice"> Authors of specifications SHOULD NOT introduce a new URI scheme when an existing scheme provides . In this document, the desired properties of identifiers phrase "authority responsible for domain X" indicates that the same entity owns those URIs where the authority component is domain X. This document does not address how the benefits and their relation responsibilities of URI ownership may be delegated to resources. other parties (e.g., to individuals managing an HTTP server).

deleted text: </div>

Consider our <a href="#scenario" shape="rect"> travel scenario </a>: should the authority providing information about the weather in Oaxaca register a new A URI scheme "weather" for the identification owner may provide representations of deleted text: resources related to the weather? They might then publish URIs such as "weather://travel.example.com/oaxaca". resource identified by the URI upon request. When a software agent dereferences such a URI, if what really happens is that the HTTP GET protocol is invoked used to retrieve a representation of the resource, then an "http" URI would have sufficed. </p> <p> If provide representations, the motivation behind registering a new scheme HTTP origin server (defined in [ RFC2616 ]) is to allow a the software agent to launch a particular application when retrieving a representation, such dispatching can be accomplished at lower expense via Internet Media Types. When designing a new data format, the appropriate mechanism to promote its deployment acting on behalf of the URI owner. The URI owner has a privileged position in the Web is architecture as the Internet Media Type. </p> <p> Note entity that even if an agent cannot process representation data in an unknown format, it can at least retrieve it. The data may contain enough information to allow a user or user agent assigns authoritative metadata to make some use such representations; see the section on authoritative metadata for more information. There are also social expectations for responsible representation management by URI owners. Additional social implications of deleted text: it. When an agent does not handle a new URI scheme, it cannot retrieve a representation. ownership are not discussed here. However, the success or failure of these different approaches depends on the extent to which there is consensus in the Internet community on abiding by the defining specifications.

<h4> 2.4.1. <a name="URI-registration" id="URI-registration" shape="rect">

2.4. URI Scheme Registration Schemes </h4>

The Internet Assigned Numbers Authority ( <acronym> IANA </acronym> ) maintains a registry [ <a href="#IANASchemes" shape="rect"> IANASchemes </a> ] of mappings between In the URI scheme names and scheme specifications. For instance, "http://weather.example.com/", the IANA registry indicates "http" that appears before the "http" scheme is defined in [ <a href="#RFC2616" shape="rect"> RFC2616 </a> ]. The process for registering colon (":") names a new URI scheme. Each URI scheme is defined in [ <a href="#RFC2717" shape="rect"> RFC2717 </a> ]. </p> <p> has a normative specification that explains how identifiers are assigned within that scheme. The deleted text: use of unregistered URI schemes syntax is discouraged for thus a number federated and extensible naming mechanism wherein each scheme's specification may further restrict the syntax and semantics of reasons: identifiers within that scheme.

Examples of URIs from various schemes include:

<strong> Note: </strong> Some URI scheme specifications (such as the "ftp" URI scheme specification) use While the term "designate" where Web architecture allows the current document uses "identify." </p> <p> TAG issue <a href="http://www.w3.org/2001/tag/issues.html#siteData-36" shape="rect"> siteData-36 </a> is about expropriation definition of naming authority. </p> </div> </div> <div class="section"> <h3> 2.5. <a name="uri-opacity" id="uri-opacity" shape="rect"> URI Opacity </a> </h3> <p> It new schemes, introducing a new scheme is tempting to guess the nature costly. Many aspects of URI processing are scheme-dependent, and a resource by inspection significant amount of deployed software already processes URIs of well-known schemes. Introducing a new URI that identifies it. However, scheme requires the Web is designed so that agents communicate resource state through <a href="#def-representation" shape="rect"> representations </a>, development and deployment not identifiers. In general, one cannot determine only of client software to handle the Internet Media Type scheme, but also of representations ancillary agents such as gateways, proxies, and caches. See [ RFC2718 ] for other considerations and costs related to URI scheme design.

Because of a resource by inspecting these costs, if a URI for scheme exists that resource. For example, the ".html" at meets the end needs of "http://example.com/page.html" an application, designers should use it rather than invent one.

Good practice: New URI schemes

A specification SHOULD NOT introduce a new URI scheme when an existing scheme provides deleted text: no guarantee that representations of the identified resource will be served with the Internet Media Type "text/html". The HTTP protocol does not constrain the Internet Media Type based on the path component of the URI; the server is free to return a representation in PNG or any other data format for that URI. </p> <p> Resource state may evolve over time. Requiring resource owners to change URIs to reflect resource state would lead to a significant number desired properties of broken links. For robustness, Web architecture promotes independence between an identifier identifiers and the identified resource. </p> <div class="boxedtext"> <p> <span class="practicelab"> Good practice: <a name="pr-uri-opacity" id="pr-uri-opacity" shape="rect"> URI opacity </a> </span> </p> <p class="practice"> Agents making use of URIs MUST NOT attempt their relation to infer properties of the referenced resource except as licensed by relevant specifications. resources.

The example URI used in the Consider our travel scenario </a> ("http://weather.example.com/oaxaca") suggests that the identified resource has something to do with : should the weather in Oaxaca. A site reporting agent providing information about the weather in Oaxaca could just as easily be identified by the register a new URI "http://vjc.example.com/315". And scheme "weather" for the URI "http://weather.example.com/vancouver" identification of resources related to the weather? They might identify then publish URIs such as "weather://travel.example.com/oaxaca". When a software agent dereferences such a URI, if what really happens is that HTTP GET is invoked to retrieve a representation of the resource "my photo album." resource, then an "http" URI would have sufficed.

On the other hand, the URI "mailto:joe@example.com" indicates that If the URI refers to motivation behind registering a mailbox. The "mailto" URI new scheme specification authorizes agents is to infer that URIs of this form identify allow a software agent to launch a particular application when retrieving a representation, such dispatching can be accomplished at lower expense via Internet mailboxes. media types. When designing a new data format, the appropriate mechanism to promote its deployment on the Web is the Internet media type.

In some cases, relevant technical specifications license URI assignment authorities to publish assignment policies. For more Note that even if an agent cannot process representation data in an unknown format, it can at least retrieve it. The data may contain enough information about to allow a user or user agent to make some use of it. When an agent does not handle a new URI opacity, see TAG issue <a href="http://www.w3.org/2001/tag/ilist#metadataInURI-31" shape="rect"> metaDataInURI-31 </a>. scheme, it cannot retrieve a representation.

deleted text: </div>
<h3> 2.6. <a name="fragid" id="fragid" shape="rect"> Fragment Identifiers

2.4.1. URI Scheme Registration </h3> <div class="boxedtext"> <p> <span class="storylab"> Story </span> </p> <div class="story">

When navigating within the XHTML data that Nadia receives as The Internet Assigned Numbers Authority ( IANA ) maintains a representation registry [ IANASchemes ] of mappings between URI scheme names and scheme specifications. For instance, the resource identified by "http://weather.example.com/oaxaca", Nadia finds IANA registry indicates that the URI "http://weather.example.com/oaxaca#tom" refers to information about tomorrow's weather "http" scheme is defined in Oaxaca. This [ RFC2616 ]. The process for registering a new URI includes the fragment identifier "tom" (the string after the "#"). scheme is defined in [ RFC2717 ].

deleted text: </div> </div>

The <a name="def-fragid" id="def-fragid"> <dfn> fragment identifier </dfn> </a> use of a unregistered URI allows indirect identification of schemes is discouraged for a <a name="def-secondary-resource" id="def-secondary-resource"> <dfn> secondary resource </dfn> </a> by reference number of reasons:

  • There is no generally accepted way to a primary resource and additional information. The secondary resource locate the scheme specification.
  • Someone else may be some portion or subset of the primary resource, some view on representations of using the primary resource, or some scheme for other resource. The interpretation purposes.
  • One should not expect that general-purpose software will do anything useful with URIs of fragment identifiers this scheme beyond URI comparison; the network effect is discussed in lost.

Note: Some URI scheme specifications (such as the section on <a href="#media-type-fragid" shape="rect"> media types and fragment identifier semantics </a>. "ftp" URI scheme specification) use the term "designate" where the current document uses "identify."

deleted text: See TAG issues <a href="http://www.w3.org/2001/tag/issues.html#abstractComponentRefs-37" shape="rect"> abstractComponentRefs-37 issue siteData-36 and <a href="http://www.w3.org/2001/tag/issues.html#DerivedResources-43" shape="rect"> DerivedResources-43 </a>. is about expropriation of naming authority.

2.7. <a name="identifiers-future" id="identifiers-future" shape="rect"> Future Directions for Identifiers 2.5. URI Opacity

There remain open questions regarding identifiers on It is tempting to guess the Web. The following sections identify nature of a few areas resource by inspection of future work in a URI that identifies it. However, the Web community. </p> <div class="section"> <h4> 2.7.1. <a id="i18n-id" name="i18n-id" shape="rect"> Internationalized Identifiers </a> </h4> <p> The integration is designed so that agents communicate resource state through representations , not identifiers. In general, one cannot determine the Internet media type of internationalized identifiers (i.e., composed representations of characters beyond those allowed a resource by [ <a href="#URI" shape="rect"> inspecting a URI </a> ]) into for that resource. For example, the Web architecture is an important and open issue. See TAG issue <a href="http://www.w3.org/2001/tag/ilist#IRIEverywhere-27" shape="rect"> IRIEverywhere-27 </a> for discussion about work going on in this area. </p> </div> <div class="section"> <h4> 2.7.2. <a name="future-comparison" id="future-comparison" shape="rect"> Assertion ".html" at the end of "http://example.com/page.html" provides no guarantee that Two URIs Identify representations of the Same Resource </a> </h4> <p> Emerging Semantic Web technologies, including identified resource will be served with the "Web Ontology Language (OWL)" [ <a href="#OWL10" shape="rect"> OWL10 </a> ], define RDF [ <a href="#RDF10" shape="rect"> RDF10 </a> ] properties such as <code> sameAs </code> Internet media type "text/html". The HTTP protocol does not constrain the Internet media type based on the path component of the URI; the URI owner is free to assert that two URIs identify configure the same resource or <code> functionalProperty </code> server to imply it. return a representation using PNG or any other data format.

deleted text: </div> </div> </div> <div class="section"> <h2> 3. <a name="interaction" id="interaction" shape="rect"> Interaction </a> </h2>

Communication between agents Resource state may evolve over time. Requiring a network about resources involves URIs, messages, URI owner to publish a new URI for each change in resource state would lead to a significant number of broken links. For robustness, Web architecture promotes independence between an identifier and data. the identified resource.

<span class="storylab"> Story Good practice: URI opacity

<div class="story"> <p> Nadia follows a hypertext link labeled "satellite image" expecting

Agents making use of URIs MUST NOT attempt to retrieve a satellite photo infer properties of the Oaxaca region. referenced resource except as licensed by relevant specifications.

The link to example URI used in the satellite image is an XHTML link encoded as <code> <a href="http://example.com/satimage/oaxaca">satellite image</a> </code>. Nadia's browser analyzes the URI and determines that its <a href="#URI-scheme" shape="rect"> scheme travel scenario is "http". The browser configuration determines how it locates ("http://weather.example.com/oaxaca") suggests that the identified information, which might be via a cache of prior retrieval actions, by contacting an intermediary (such as a proxy server), or by direct access resource has something to do with the server identified by the URI. In this example, the browser opens a network connection to port 80 on weather in Oaxaca. A site reporting the server at "example.com" and sends a "GET" message weather in Oaxaca could just as specified easily be identified by the HTTP protocol, requesting a representation of URI "http://vjc.example.com/315". And the URI "http://weather.example.com/vancouver" might identify the resource identified by "/satimage/oaxaca". "my photo album."

The server sends a response message to On the browser, once again according to other hand, the HTTP protocol. The message consists of several headers and URI "mailto:joe@example.com" indicates that the URI refers to a JPEG image. mailbox. The browser reads the headers, learns from the "Content-Type" field "mailto" URI scheme specification authorizes agents to infer that the Internet Media Type URIs of this form identify Internet mailboxes.

In some cases, relevant technical specifications license URI assignment authorities to publish assignment policies. For more information about URI opacity, see TAG issue metaDataInURI-31 .

2.6. Fragment Identifiers

Story

When navigating within the XHTML data that Nadia receives as a representation deleted text: is "image/jpeg", reads the sequence of octets the resource identified by "http://weather.example.com/oaxaca", Nadia finds that deleted text: comprises the representation data, and renders URI "http://weather.example.com/oaxaca#tom" refers to information about tomorrow's weather in Oaxaca. This URI includes the image. fragment identifier "tom" (the string after the "#").

This section describes the architectural principles and constraints regarding interactions between agents, including such topics as network protocols and interaction styles, along with interactions between the Web as The fragment identifier component of a system and the people that make use URI allows indirect identification of deleted text: it. The fact that the Web is a highly distributed system affects architectural constraints and assumptions about interactions. </p> <p> <strong> Note: </strong> The Web Architecture does not require secondary resource by reference to a formal definition primary resource and additional identifying information. The secondary resource may be some portion or subset of the commonly used phrase "on primary resource, some view on representations of the Web." Informally, a primary resource, or some other resource defined or described by those representations. The interpretation of fragment identifiers is "on discussed in the Web" when it has a URI section on media types and an agent can use the URI to retrieve a representation of it using network protocols (given appropriate access privileges, network connectivity, etc.). fragment identifier semantics .

See deleted text: the related TAG issue <a href="http://www.w3.org/2001/tag/ilist.html#httpRange-14" shape="rect"> httpRange-14 issues abstractComponentRefs-37 and DerivedResources-43 .

3.1. <a name="dereference-uri" id="dereference-uri" shape="rect"> Using a URI to Access a Resource 2.7. Future Directions for Identifiers

Agents may use There remain open questions regarding identifiers on the Web. The following sections identify a URI to access few areas of future work in the referenced resource; this is called <a name="uri-dereference" id="uri-dereference"> <dfn> dereferencing Web community.

2.7.1. Internationalized Identifiers

The integration of internationalized identifiers (i.e., composed of characters beyond those allowed by [ URI ]) into the Web architecture is an important and open issue. See TAG issue IRIEverywhere-27 for discussion about work going on in this area.

2.7.2. Assertion that Two URIs Identify the Same Resource

Emerging Semantic Web technologies, including the "Web Ontology Language (OWL)" [ OWL10 ], define RDF [ RDF10 ] properties such as sameAs to assert that two URIs identify the same resource or functionalProperty to imply it.

One consequence of this direction is that URIs syntactically different can be used to identify the same resource. This means that multiple parties may create representations of the (same) resource, all available for retrieval using multiple URIs. A URI owner's rights (e.g., to provide authoritative representation metadata) extend only to the representations served for requests given that URI.

Note also that to URIs that are sameAs one another does not mean they are interchangeable. For instance, suppose that two different organizations own the URIs "http://weather.example.org/stations/oaxaca#ws17a" and "http://weather.example.com/rdfdump?region=oaxaca&station=ws17a". The URIs might both identify the same resource, a certain collection of weather-measuring equipment shared by the two organizations. Although the URIs might be declared "owl:sameAs" each other, the two URI owners might provide very different content when the URIs are dereferenced.

3. Interaction

Communication between agents over a network about resources involves URIs, messages, and data.

Story

Nadia follows a hypertext link labeled "satellite image" expecting to retrieve a satellite photo of the Oaxaca region. The link to the satellite image is an XHTML link encoded as <a href="http://example.com/satimage/oaxaca">satellite image</a> . Nadia's browser analyzes the URI and determines that its scheme is "http". The browser configuration determines how it locates the identified information, which might be via a cache of prior retrieval actions, by contacting an intermediary (such as a proxy server), or by direct access to the server identified by a portion of the URI. In this example, the browser opens a network connection to port 80 on the server at "example.com" and sends a "GET" message as specified by the HTTP protocol, requesting a representation of the resource identified by "/satimage/oaxaca".

The server sends a response message to the browser, once again according to the HTTP protocol. The message consists of several headers and a JPEG image. The browser reads the headers, learns from the "Content-Type" field that the Internet media type of the representation is "image/jpeg", reads the sequence of octets that make up the representation data, and renders the image.

This section describes the architectural principles and constraints regarding interactions between agents, including such topics as network protocols and interaction styles, along with interactions between the Web as a system and the people that make use of it. The fact that the Web is a highly distributed system affects architectural constraints and assumptions about interactions.

See the related TAG issue httpRange-14 .

3.1. Using a URI to Access a Resource

Agents may use a URI to access the referenced resource; this is called dereferencing the URI . Access may take many forms, including retrieving a representation of resource the state of the resource (for instance, by using HTTP GET or HEAD), adding or modifying a representation of the state of the resource (for instance, by using HTTP POST or PUT), PUT, which in some cases may change the actual state of the resource if the submitted representations are interpreted as instructions to that end), and deleting some or all representations of the state of the resource (for instance, by using HTTP DELETE). DELETE, which in some cases may result in the deletion of the resource itself).

There may be more than one way to access a resource for a given URI; application context determines which access mechanism an agent uses. For instance, a browser might use HTTP GET to retrieve a representation of a resource, whereas a link checker might use HTTP HEAD on the same URI simply to establish whether a representation is available. Some URI schemes set expectations about available access mechanisms, others (such as the URN scheme [ RFC 2141 ]) do not. Section 1.2.2 of [ URI ] discusses the separation of identification and interaction in more detail. For more information about relationships between multiple access mechanisms and URI addressability, see the TAG finding " URIs, Addressability, and the use of HTTP GET and POST " .

Although many URI schemes are named after protocols, this does not imply that use of such a URI will necessarily result in access to the resource via the named protocol. Even when an agent uses a URI to retrieve a representation, that access might be through gateways, proxies, caches, and name resolution services that are independent of the protocol associated with the scheme name.

Dereferencing a URI generally involves a succession of steps as described in multiple independent specifications and implemented by the agent. The following example illustrates the series of specifications that are involved when a user instructs a user agent to follow a hypertext link that is part of an SVG document. In this example, the URI is "http://weather.example.com/oaxaca" and the application context calls for the user agent to retrieve and render a representation of the identified resource.

  1. Since the URI is part of a hypertext link in an SVG document, the first relevant specification is the SVG 1.1 Recommendation [ SVG11 ]. Section 17.1 of this specification imports the link semantics defined in XLink 1.0 [ XLink10 ]: "The remote resource (the destination for the link) is defined by a URI specified by the XLink href attribute on the 'a' element." The SVG specification goes on to state that interpretation of an a element involves retrieving a representation of a resource, identified by the href attribute in the XLink namespace: "By activating these links (by clicking with the mouse, through keyboard input, voice commands, etc.), users may visit these resources."
  2. The XLink 1.0 [ XLink10 ] specification, which defines the href attribute in section 5.4, states that "The value of the href attribute must be a URI reference as defined in [IETF RFC 2396], or must result in a URI reference after the escaping procedure described below is applied."
  3. The URI specification [ URI ] states that "Each URI begins with a scheme name that refers to a specification for assigning identifiers within that scheme." The URI scheme name in this example is "http".
  4. [ IANASchemes ] states that the "http" scheme is defined by the HTTP/1.1 specification (RFC 2616 [ RFC2616 ], section 3.2.2).
  5. In this SVG context, the agent constructs an HTTP GET request (per section 9.3 of [ RFC2616 ]) to retrieve the representation.
  6. Section 6 of [ RFC2616 ] defines how the server constructs a corresponding response message, including the 'Content-Type' field.
  7. Section 1.4 of [ RFC2616 ] states "HTTP communication usually takes place over TCP/IP connections." This example does not address that step in the process, or other steps such as Domain Name System ( DNS ) resolution.
  8. The agent interprets the returned representation according to the data format specification that corresponds to the representation's Internet Media Type (the value of the HTTP 'Content-Type') in the relevant IANA registry [ MEDIATYPEREG ].

3.2. Messages and Representations

The Web's protocols (including HTTP, FTP, SOAP, NNTP, and SMTP) are based on the exchange of messages. A message may include representation data as well as metadata about the resource (such as the "Alternates" and "Vary" HTTP headers), the representation, and the me