The World Wide Web uses relatively
simplean technologies with sufficient scalability, efficiency andresources.
utility that they have resultedthe
basis in a remarkable information space
ofby, interrelated resources, growing across languages, cultures,systems. and
media. In an effort to preserve these properties ofcreate,
display, the information
spacerelate, as the technologies evolve, thisresources.
Web architecture document
discussesdefines the coreinformation space design components of the Web. They are
identification of resources, representation of resource state, and
the protocols that support the interaction between agents and resources
resources in the space.
Web architecture is
influenced by social requirements and Wesoftware engineering
principles. These relate core design components,choices
and constraints on
the behavior of systems that use
the Web in order to achieve desired properties of the shared
information space: efficiency, scalability, and the potential for
constraints,indefinite growth across languages, cultures, and goodmedia. Good
practice by agents in the system is practices important
to the principlessuccess of the system. This document reflects the three bases and properties
theyarchitecture: identification, interaction, and support.
This section describes the status
of this document at the time of its publication. Other documents
may supersede this document. A list of current W3C publications and
the latest revision of this technical report can be found in the
W3C technical reports
index at http://www.w3.org/TR/.
PublicationThis is the 16 as a Proposed
RecommendationLast does not implyDraft
of endorsement by the
World Wide Web, W3C Membership.
ThisThe Last is a draft document and may be updated, replaced or obsoleted
by other documents at any time. It is inappropriate to cite this
document as other than "work in progress."
public-webarch-comments@w3.org
(archive).
ThisLast Call is the 5 November 2004 Proposedin
section
Recommendation7.4.2 of “Architecture of the World Wide Web, First
Edition.”extent Publication asplease
provide a Proposedseparate email Recommendationmessage indicates that W3C seeks
endorsement
The TAG has of the stablelast call
comments technical report. The W3C Membership and discussion of
other interested parties are invited to2003
Draft. review the document and
sendbeen comments to public-webarch-comments@w3.orga (with publicnumber archive)of throughcomments
made 2 December 2004. Advisory
Committeethe Representatives should consult their
WBSLast Call Working questionnaires. NoteBecause the that substantive technical comments
were expected during the Last Call review period that ended 17
Septemberon 2004. A completedraft
still listapply.
The of changesexpects since the Last Callof
this draft (and
earlierto drafts) is available.W3C
Recommendation.
This document has been developed by
W3C's Technical
Architecture Group (TAG), which, by (charter).
maintains a list of architecturalchanges to
this document since the first public issues.Working The scope of this document
isWeb.
The aTAG usefulcharter
describes subset of those issues; it is not intended to address
all of them. The TAG intends to addressrunning the remaininglist.
The (and future)
issuesEdition after publication of the First Edition as a Recommendation.
Asaddress every noted in the TAG's Proposed Recommendation transitionsince request,it a few pointsin
January of outstanding dissent
regardingTAG this document remain:
- Sticklera on "information resource"issues in the URI/Resource Relationships (§2.2)First
section
- KopeckyEdition on Representation of a secondary resource
inof the FragmentTAG; those Identifiers
(§2.6)issues section
are
- HTML WG on XLinkidentified in the LinksTAG's issues in XML (§4.5.2)TAG section. In this revision,
that section has been changed to accommodate the HTML WG's request,Edition
at least in part.
Recommendation.
This document uses the concepts and terms
regarding URIs as defined
in draft-fielding-uri-rfc2396bis-06, preferring them to those defined
in byRFC 2396. The IETF the IETF. Indraft-fieldi
ng-uri-rfc2396bis-06
is an 18to Oct 2004RFC announcement, which is the
current URI standard. The TAG is tracking the revision of RFC2396
was endorsed
Publication as ana Working Draft does not IETF Specification, though the
W3C Membership. This latestis a published
draft asdocument and may be updated, replaced
or of this writing is draft-fielding-uri-rfc2396bis-07.at The [URI]time. citation should reflect publication
ofcite this document as the relevant RFC in future revisions.progress."
ThisThe patent policy for this document wasis expected produced under the
5 February 2004
W3C IPR policyPolicy, of the JulyAdvisory Committee review of
the 2001 Process Document.the The
TAGW3C maintains a publicGroup list
of patentPatent disclosures relevant to this document;specification may be that page also
includesTechnical instructionsArchitecture
Group's for disclosing apage. patent. An individual who
has actual knowledge of a patent which the individual believes
contains Essential Claim(s) with respect to this specification
should disclose the information in accordance with section 6 of the W3C Patent Policy.
The World Wide Web (WWW, or
simply Web) is an
information information
space in which the items of interest, referred to as resources, are
identified by global identifiers called Uniform Resource Identifiers
Identifiers (URI).
Examples such as the
following following
travel scenario
are used throughout this document to illustrate typical behavior of
Web
agents—people
—
people or software (on behalf of a person,
entity, or process) acting on this information
space. A user
agent acts on behalf of a user. Software agents include
servers, proxies, spiders, browsers, and multimedia players.
Story
While planning a trip to Mexico,
Nadia reads
"Oaxaca “Oaxaca weather information:
'http://weather.example.com/oaxaca'” in a glossy travel magazine.
Nadia has enough experience with the Web to recognize that
"http://weather.example.com/oaxaca" is a URI and that she is likely
to be able to use software to retrieve associated
information (in this informationcase, about with her Web browser.
When Nadia enters the URI into her browser:
- The browser recognizes that what Nadia typed is a URI.
- The browser performs an information retrieval action in
accordance with its configured behavior for resources identified
via the "http" URI scheme.
- The authority responsible for "weather.example.com" provides
information in a response to the retrieval request.
- The browser interprets the response, identified as XHTML by the
server, and performs additional retrieval actions for inline
graphics and other content as necessary.
- The browser displays the retrieved information, which includes
hypertext links to other information. Nadia can follow these
hypertext links to retrieve additional information.
This scenario illustrates the three
architectural bases of the Web that are discussed in this
document:
-
Identification (§2). URIs are used to identify
resources. resources.
In this travel scenario, the resource is a periodically
updated report on the weather in Oaxaca, and the URI is
“http://weather.example.com/oaxaca”.
"http://weather.example.com/oaxaca".
-
Interaction (§3). Web agents communicatethe
syntax using
standardizedand protocols that enable interaction through thesemantics exchange
of messages whichexchanged by adhere to a definednetwork.
Web syntax and semantics. By
enteringinformation a URI into a retrievalusing
protocols. dialog or selecting a hypertext link,
link, Nadia tells her browser to
request perform a retrieval action for the
resource identified by the URI.URI in
the link.
In this example, the browser sends
an HTTP GET request (part of the HTTP protocol) to the server at
"weather.example.com", via TCP/IP port 80,
"weather.example.com" and the server sends
back a messagerepresentation containing what it determines to be ainformation
state representation
of the resourceresource. In as of the time that representation wasincludes
XHTML generated.
Notedata that this example is specific to hypertext browsing of
information—other kindsthe
data, of interactionNote: are possible,this
document, both within
browsersnoun and through the usethat
encode of other types of Web agent; our
examplenot isnecessarily
describe intended to illustrate one common interaction, not
defineof the range of possible interactionsresource, or limit the ways in
which agents might use the Web.
word "represent".
-
Formats (§4). MostRepresentations protocols used for representation
retrieval and/or submissionfrom make use of a
non-exclusive sequence of one or more
messages, which taken together containset a payload of representation
data and metadata, to transfer the representationcombination
(including between agents.
TheCSS, choice of interaction protocol places limitsRDF/XML, on the formats of
representation data and metadata that can be transmitted. HTTP, for
example, typically transmits a single octet stream plus metadata,
and usesSVG, the "Content-Type" and "Content-Encoding" header fields to
further identify the format of the representation.SMIL
animation). In this
scenario, the representation transferred is primarily
in XHTML, as identified
by the "Content-type" HTTP header field containing the registered
Internet media type name, "application/xhtml+xml". That Internet
media type name indicates that the representation data can be
processed according tointerpreting the XHTML specification.
Nadia's browser is configured and
programmed to interpret the receipt of an "application/xhtml+xml"
typed representation as an instruction to render the content of
that representation according to the XHTML rendering model,
including any subsidiary interactions (such as requests for
external style sheets or in-line images) called for by the
representation. In the scenario, the XHTMLdata, representation data
received from the initial request instructs Nadia's
browser to also
retrieveretrieves and render in-linedisplays the weather maps, eachmaps identified by a
URI and thus causing an additional retrieval action, resulting in
additional representations that areURIs
within processed by the browser
accordingXHTML. to their own data formats (e.g., "application/svg+xml"
indicates the SVG data format), and this process continues until
allSome of the data formats have been rendered. The result of all of
this processing, once the browser has reached anin application
steady-state that completes Nadia's initial requested action, is
commonly referred to as a "Web page".
SVG.
The following illustration shows the
relationship between identifier, resource, and representation.
In the remainder of this document, we
highlight important architectural points regarding Web identifiers,
protocols, and formats. We also discuss some important general architectural principles (§5)
and how they apply to the Web.
This document describes the
properties we desire of the Web and the design choices that have
been made to achieve them. It promotes the reusere-use of existing
standards when suitable, and gives guidance on how to innovate in a manner
manner consistent with Web architecture.
The terms MUST, MUST NOT, SHOULD,
SHOULD NOT, and MAY are used in the principles, constraints, and
good practice notes in accordance with RFC 2119 [RFC2119].
This However, this document does not
include include
conformance provisions for these reasons:
- Conforming software is expected to be so diverse that it would
not be useful to be able to refer to the class of conforming
software agents.
- Some of the good practice notes concern people; specifications
generally define conformance for software, not people.
- We do not believe that theThe addition of a conformance section is
not likely to increase the utility of the document.
This document is intended to inform
discussions about issues of Web architecture. The intended audience
for this document includes:
- Participants in W3C Activities
- Other groups and individuals designing technologies to be
integrated into the Web
- Implementers of W3C specifications
- Web content authors and publishers
Readers will benefit from familiarity with the Requests for Comments
(RFC) series from the IETF, some of which define pieces of the
architecture discussed in this document.
Note: This document
does not distinguish in any formal way the terms "language" and
"format." Context determines which term is used. The phrase
"specification designer" encompasses language, format, and protocol
designers.
This document presents the general
architecture of the Web. Other groups inside and outside W3C also
address specialized aspects of Web architecture, including accessibility,
accessibility, quality assurance, internationalization, device
independence, and Web Services. The section on Architectural Specifications (§7.1)
includes includes
references to these related specifications.
This document strives forstrikes a balance
between brevity and precision while including illustrative
examples. TAG findings are informational documents that complement
the current document by providing more detail about selected
topics. This document includes some excerpts from the findings.
Since the findings evolve independently, this document also includes
references to approved TAG findings. For other TAG issues covered by
by this document but without an approved finding, references are to
entries in the TAG issues list.
Many of the examples in this document
that involve human activity suppose the familiar Web interaction
model (illustrated at the beginning of the Introduction) where a
person follows a link via a user agent, the user agent retrieves
and presents data, the user follows another link, etc. This
document does not discuss in any detail other interaction models
such as voice browsing (see, for example, [VOICEXML2]). The choice of interaction model mayFor
have an impact on expected agent behavior. For instance, when a
graphical user agent running on a laptop computer or
hand-held hand-held
device encounters an error, the user agent can report errors
directly to the user through visual and audio cues, and present the
user with options for resolving the errors. On the other hand, when
someone is browsing the Web through voice input and audio-only output,
output, stopping the dialog to wait for user input may reduce
usability since it is so easy to "lose one's place" when browsing
with only audio-output. This document does not discuss how the principles,
principles, constraints, and good practices identified here
apply apply
in all interaction contexts.
The important points of this document
are categorized as follows:
- Principle
- An architectural principle is a fundamental rule that applies
to a large number of situations and variables. Architectural
principles include "separation of concerns", "generic interface",
"self-descriptive syntax," "visible semantics," "network effect"
(Metcalfe's Law), and Amdahl's Law: "The speed of a system is
limited by its slowest component."
- Constraint
- In the design of the Web, some design choices, like the names of the
p and li elements in HTML, the choice of
the colon (:) character in URIs, or grouping bits into eight-bit
units (octets), are somewhat arbitrary; if paragraph
had been chosen instead of p or asterisk (*)
instead instead
of colon, the large-scale result would, most likely, have been the
same. This document focuses on more
fundamental; these are the focus fundamental design choices:
designdocument. Design
choices that lead to constraints, i.e., restrictions in
behavior or interaction within the system. Constraints may be
imposed for technical, policy, or other reasons to achieve
desirable certain
properties in the system, such as accessibility,accessibility and global
scope, and
non-functional properties, such as relative ease of evolution,
re-usability of components, efficiency, and dynamic
extensibility.
- Good
practice
- Good practice—by software developers, content authors, site
managers, users, and specification designers—increases the value of
the Web.
In order to communicate internally, a
community agrees (to a reasonable extent) on a set of terms and
their meanings.
Since its inception, One goal of the Web, since its inception,Web has been
to build a global community in which any party can share
information with any other party. To achieve this goal, the Web
makes use of a single global identification system: the URI. URIs
are a cornerstone of Web architecture, providing identification
that is commonsystem. across the Web. The global scope of URIs promoteslarge-scale
large-scale "network effects": the value of an identifier increases
the more it is used consistently (for example, the more it is used
in hypertext
links (§4.4)[section 4.4]
).
Principle:
Global Identifiers
Global naming
leads to global network effects.
This principle dates back at least as
far as Douglas Engelbart's seminal work on open hypertext systems; see
see section Every Object Addressable in [Eng90].
The choice of syntax for global
identifiers is somewhat arbitrary; it is their global scope that is important.
important. The Uniform Resource
Identifier, [URI], currently being revised) has been
been successfully deployed since the creation of the Web. There are
substantial benefits to participating in the existing network of URIs,
URIs, including linking, bookmarking, caching, and indexing by
search engines, and there are substantial costs to creating a new
identification system that has the same properties as URIs.
Good
practice: Identify with URIs
To benefit from and
increase the value of the World Wide Web, agents should provide
URIs asengines. identifiers for resources.
A resource should have an associated
URI if another party might reasonably want to create a hypertext
link to it, make or refute assertions about it, retrieve or cache a
representation of it, include all or part of it by reference into
another representation, annotate it, or perform other operations on
it. Software developers should expect that sharing URIsa URI across
applications will be useful, even if that utility is not initially
evident. The TAG finding "URIs, Addressability, and the use of HTTP GET and
POST" discusses additional benefits and considerations
of URI addressability.
Note:
Good Some URIpractice:
schemes (such asURIs the "ftp" URI scheme specification) useincrease the term
"designate"value where thisthe
World document uses "identify."
Byprovide design a URI identifies onefor
resource.resources.
Other We do not limit the scope of whatfuture might be a resource. The term
"resource" is used in adirections general sense for whateveridentifiers) might bemay
identified by a URI. It is conventional onexpand the hypertext Web to
describeas Web pages, images, product catalogs, etc. as “resources”.
The distinguishing characteristic of these resources is that all of
theirthe essential characteristics can be conveyed in a message. Wesame
identify this set as “information
resources”.URIs.
This document is an example of an
information2.2. resource.URI/Resource It consists of words and punctuation symbolsRelationships
and
By graphics anda
URI other artifacts that can be encoded, with varyingone
degreesresource.
We of fidelity, into a sequence of bits. Therelimit is nothing
about the essential information contentscope of this document that
cannot inmight principle be transfered in a representation.
resource.
However, our useThe of the term
"resource" resource
is intentionally more broad. Other things, such as cars and dogs
(and, if you've printed this document on physical sheets of paper,
the artifact that youused are holding in your hand), are resources too.
Theywhatever are not information resources, however, because their essenceidentified
is not information. Although it is possible to describeby a great
manyURI. things about a car or a dog inclass a sequence of bits, the sum ofinformation
those things will invariably be an approximation of the essential
character of the resource.
We define the term “information
resource” because wediscussed observe that it is useful in discussionsInformation of
Web technologyResources and may be useful in constructing specifications for
facilities built for use on the Web.3.1].
Constraint: URIs Identify a Single
Resource
Assign
distinct distinct
URIs to distinct resources.
Since the scope of a URI is global,
the resource identified by a URI does not depend on the context in
which the URI appears (see also the section about indirect
identification (§2.2.3)).
[URI]
is an agreement about how the Internet community allocates names
and associates them with the resources they identify. URIs are
divided into schemesspecifications (§2.4)
that define, viadefine their scheme specification, the mechanism by which scheme
scheme-specific identifiers are associated with resources.resources and take on meaning.
For
example, the "http" URI scheme ([RFC2616])(RFC2616) uses DNS and TCP-based HTTP servers for theso
purpose of identifier allocation and resolution. As anames result,
identifiers such as "http://example.com/somepath#someFrag" often
take on meaning through the communityway experience of performing an
HTTP GET request on the identifier and, if given a successful
response, interpreting the response as a representation of the
identified resource. (See also Fragment Identifiers (§2.6).)from Of course, a retrieval
action like GET is not the only way to obtain information aboutdomain a
resource. One might also publish a document that purports to define
the meaning of a particular URI. TheseWhile other sources ofcommunications information
may suggest meanings for such identifiers, but it's a local policy
decision whether those suggestions should be heeded.heeded, whereas the result obtained through HTTP
GET is, by Internet-wide agreement, authoritative.
Just as one might wish to refer to a
person by different names (by full name, first name only, sports
nickname, romantic nickname, and so forth), Web architecture allows
the association of more than one URI with a resource. URIs that
identify the same resource are called URI aliases. The section on
URI aliases (§2.3.1)
discusses some of the potential costs of creating multiple URIs for
the same resource.
SeveralThe following sections of this document
addressother questions about the relationship between URIs and
resources, including:
ByTo design, aURI URIcollision, identifies one
resource.important to avoid Using the same URI to directly identify differentresources.
resources2.2.1.1. produces a URI collision.ownership
One Collision often
imposesto a costURI incollision communication due to the effort required tothrough
resolve ambiguities.
Suppose,It for example, that one
organization makes usefor of a URI to refer to the movie Theunique
Sting,relationship and another organization uses the same URI to refer
toand a discussion forum about Thethe Sting.case Tofor
the a third
party,"mailto", aware of both organizations, this collision creates
confusion about what the URI identifies,ownership. undermining the value of
the URI.phrase
"authority If one wanted to talk about the creation dateentity
owns of the
resourceURIs identified by the URI, for instance, itX. This
document would not be clear
whether this meant "whenhow the movie was created" or "whenURI
ownership the
discussionmay forum about the movie was created."
Socialsuch and technical solutions have
been devised to helpsomeone who has avoid URI collision. However, the successURI space or
failure of these differentserver.
The approaches depends on the extent to
whichURI there is consensus inpattern
whereby the Internet community ondelegates authority, abiding by
the defining specifications.
URI
Thescheme section on[IANASchemes] URIand allocation (§2.2.2)DNS,
examines approaches for establishing the authoritative sourceset of
information about what resourcewith a URI identifies.
URIsprefix are sometimes used for indirect identificationOne
(§2.2.3).consequence This does not necessarily lead to collisions.
URI allocation ison the
central process of
associating aregistry.
A URI with a resource. Allocation can be performed both
by resource owners andidentified by other parties. It is important to avoid
URI collision
(§2.2.1).
the
URIHTTP ownershipprotocol is a relation
betweenthose a URI and a socialserver
(defined entity, such[RFC2616]) as a person, organization,
or specification. URI ownership givesof the relevant social entity
certainowner rights, including:
- to pass on ownershipauthoritative
representations of some or all owned URIs toURI. The another
owner—delegation;owner and
is
- toalso associate a resource with anor owned URI—URI allocation.
Byto social convention, URI ownershipthe
is delegated from the IANA URI scheme registry [IANASchemes],by itself a social
entity, to IANA-registered URI scheme specifications. Some URI
schemedata specifications further delegate ownership totype,
validity subordinate
registriesconstraints, or to other nominatedconstraints.
Recall owners, who may further delegate
ownership.different In the caseto
create ofURI aaliases. This specification, ownership ultimatelythat
lies with the community that maintains the specification.
same resource,
Thedepending approach taken for the "http" URI
scheme,used for example,interaction.
There follows the pattern whereby the Internet
communityrepresentation delegates authority,management via the3.6]
IANA URI scheme registry and
the DNS, overbelow. Additional a set of
URI URIsownership are with a commonhere. However, prefix to one particular
owner. One consequence of this approach ison the Web's heavy reliance
onwhich the centralis
consensus DNS registry. A different approach is takenabiding by the defining
URNspecifications.
See Syntax scheme [RFC2141]siteData-36,
which delegates ownership of portionsexpropriation of URN spaceauthority.
2.2.1.2. toOther URN Namespace
specificationsschemes
Some which themselves are registered in an
IANA-maintaineddelegated registry of URN Namespace Identifiers.
avoid
URIoverloading. owners are responsible forhave
avoiding the assignment of equivalent URIs to multiple resources.
Thus, if a URI scheme specification does provide for the delegationprocess.
of2.2.2. individualURI orcollision
As organized setsdiscussed of URIs, it should take pains to
ensure that ownership ultimately resides in the hands of a
URI singlecollision.
socialCollision often entity. Allowing multiple owners increases the likelihood of
URIthe collisions.
effort
URIrequired owners may organizeambiguities.
Suppose, or deploy
infrastruture toexample, ensure that representations of associated
resourcesmakes are available and, where appropriate, interaction withto
the resource is possible throughto the exchange of representations.
ThereSting", are social expectations for responsible representationsame
management (§3.5) by URI owners. Additional social implications
ofa URI ownership are not discussedSting." This
collision here.
Seecreates TAG issue siteData-36, whichwhat concerns the expropriation of namingidentifies,
authority.
Somevalue schemes use techniques other
thanIf delegated ownership to avoid collision. For example,about the
specification for the data URL (sic) scheme [RFC2397] specifies thatof the resource identified by
a data scheme URI has onlyit
would one possible representation. The
representationwhether data makes up the URI that identifies that resource.or
Thus, the specification itself determines how data URIs are
allocated; no delegation is possible.created."
OtherThe schemes (such as
"news:comp.text.xml")on relyURI on aassignment social process.
2.2.1]
Toapproaches say that the URI
"mailto:nadia@example.com"authoritative identifies both an Internet mailbox andof
Nadia, the person, introduces a URI collision. However, we can useidentifies.
the2.2.3. URIIndirect to indirectly identify Nadia. Identifiers are commonly used
in this way.
Identification
Listening to a news broadcast, one
might hear a report on Britain that begins, "Today, 10 Downing
Street announced a series of new economic measures." Generally, "10
Downing Street" identifies the official residence of Britain's
Prime Minister. In this context, the news reporter is using it (as
English rhetoric allows) to indirectly identify the British
government. Similarly, URIs identify resources, but they can also
be used in many constructs to indirectly identify other resources.
Globally adopted assignment policies make some URIs appealing as
general-purpose identifiers. Local policy establishes what they
indirectly identify.
For example, the URI "mailto:nadia@example.com" identifies an
Internet mailbox (as specified by the "mailto" URI scheme). Suppose
this that
nadia@example.comparticular URI is Nadia's email address. The
organizers of a conference attended by Nadia attends might use
"mailto:nadia@example.com" to refer indirectly to her (e.g., by
using the URI as a database key in their database of conference
participants). This does not introduce a URI collision.
URIsURI that are identical,
character-by-character, refer to the same resource. Since Web
Architecture allows the association of multiple URIs with a given
resource, two URIs that are not character-by-character identical
may still refer to the same resource. Different URIs do not
necessarily refer to different resources but there is generally a
higher computational cost to determine that multiple
different URIs refer to
the same resource.
To reduce the risk of a false
negative (i.e., an incorrect conclusion that two URIs do not refer
to the same resource) or a false positive (i.e., an incorrect
conclusion that two URIs do refer to the same resource), some
specifications describe equivalence tests in addition to character-by-character comparison. For example,
for "http" URIs, the authority component (the part after "//" and
before the next "/") is defined to be case-insensitive. Thus, the
"http" URI specification allows agents to conclude that
authority components in two "http" URIs identify the same resource
when those strings are character-by-character comparison.equivalent or differ only by
case. Agents that reach conclusions
based on comparisons that are not licensed by the relevant
specifications take responsibility for any problems that result;
see the section on error handling (§5.3)[section 5.3]
for more information
about about
responsible behavior when reaching unlicensed conclusions.
Section Section
6 of [URI] provides
more more
information about comparing URIs and reducing the risk of false
negatives and positives.
See alsothe section below on approaches other than string comparison
that allow different agents the assertion that two URIs identify
the same
resource resource (§2.7.2)2.7.2]
.
Although there are benefits (such as
naming flexibility) to URI aliases, there are also costs.
URI URI
aliases are harmful when they dividecause bifurcation in the Web of related resources.
A corollary of Metcalfe's Principle (the "network effect") is that
the value of a given resource can be measured by the number and
value of other resources
that link to in its network neighborhood,neighborhood of the measured resource).
This type of valuation is commonly used that is, the relative
value of search results because people tend to
create links relating a given topic to those resources that
they feel best reflect that topic, and hence the number of
inbound references are a reflection of the link to it.which
the community values a resource.
The problem with aliases is that if
half of the neighborhood points to one URI for a given
resource, resource,
and the other half points to a second, different URI for that same
resource, the neighborhood is divided. Not only is the aliased
resource undervalued because of this split, the entire neighborhood
of resources loses value because of the missing second-order
relationships that should have existed among the referring
resources by virtue of their references to the aliased
resource.
Good
practice:
Avoiding URI aliases
A URI owner SHOULD
NOT associate arbitrarily different URIs with the same
resource.
URI consumers also have a role in
ensuring URI consistency. For instance, when transcribing a URI,
agents should not gratuitously percent-encode characters. The term
"character" refers to URI characters as defined in section 2 of
[URI]; percent-encoding is
discussed in section 2.1 of that specification.
Good
practice:
Consistent URI usage
An agent that
receives a URI SHOULD refer to the associated resource using
the the
same URI, character-by-character.
When a URI alias does become common
currency, the URI owner
should use protocol techniques such as server-side redirects to
relate the two resources. The community benefits when the URI owner
supports redirection of an aliased URI to the corresponding
"official" URI. For more information on redirection, see
section section
10.3, Redirection, in [RFC2616]. See also [CHIPS] for a discussion of some best practices for
server administrators..
Story
URIDirk aliasing only occurs whenon more
than03 one URI is used to identifyof the
resource same resource. The fact that
different resources sometimes have the same representation does not
makeof the URIs for those resources aliases.
by
"http://weather.example.com/2004/08/03/oaxaca". Story
DirkHe would like to add a link fromit
his Web site to the Oaxaca weather site. He uses thedoes,
one URI
http://weather.example.com/oaxaca and labels his link “weatherweather,"
the in
Oaxacaother on 1 August 2004”. Nadia points out to Dirk that
he is setting misleading expectations forhave the URI he hastoday.
URI used. The
Oaxacaonly weather site policy is that the URI in question identifies
the current weather in Oaxaca—on any given day—and not have
the weather
onsame 1 August. Of course, onmake the first of August in 2004, Dirk's linkaliases.
will be correct, but the rest of the time he will be misleadingfor
visitors to histhe Web site. Nadia points out to Dirk that the weather
site does make availableand a different URI permanently assigned to a
resource describingfor the weather on 1 August 2004.
a
In this story,point there are two
resources: “the current weather in Oaxaca” and “the weatherthe in
Oaxaca on 1 August 2004”. The Oaxaca weatherWeb site assigns
twois URIs to these two different resources. Onoccur
1 August 2004, the representations for thesewell.
The resources
aredistinguishing identical. That fact that dereferencing two differentas URIs
produces identicalto representations does not implyis that the two URIs
are aliases.
In the URI
"http://weather.example.com/", the "http" that appears before the
colon (":") names a URI scheme. Each URI scheme has a specification
that explains the scheme-specific specific
details of how scheme identifiers
are allocated and become associated with a resource. The URI syntax
is thus a federated and extensible naming system wherein each
scheme's specification may further restrict the syntax and
semantics of identifiers within that scheme.
Examples of URIs from various schemes
include:
- mailto:joe@example.org
- ftp://example.org/aDirectory/aFile
- news:comp.infosystems.www
- tel:+1-816-555-1212
- ldap://ldap.example.org/c=GB?objectClass?one
- urn:oasis:names:tc:entity:xmlns:xml:catalog
While Web architecture allows the
definition of new schemes, introducing a new scheme is costly. Many Many
aspects of URI processing are scheme-dependent, and a large amount
of deployed software already processes URIs of well-known schemes.
Introducing a new URI scheme requires the development and
deployment not only of client software to handle the scheme, but
also of ancillary agents such as gateways, proxies, and caches. See See
[RFC2718] for other
considerations and costs related to URI scheme design.
Because of these costs, if a URI
scheme exists that meets the needs of an application, designers
should use it rather than invent one.
Good
practice: Reuse URI schemes
A specification
SHOULD reuse an existing URI scheme (rather than create a new one)
when it provides the desired properties of identifiers and
their their
relation to resources.
Consider our travel scenario: should the agent providing
information about the weather in Oaxaca register a new URI scheme
"weather" for the identification of resources related to the
weather? They might then publish URIs such as
"weather://travel.example.com/oaxaca". When a software agent
dereferences such a URI, if what really happens is that HTTP GET is
invoked to retrieve a representation of the resource, then an
"http" URI would have sufficed.
If the motivation behind registering a new scheme is to allow a
software agent to launch a particular application when retrieving a
representation, such dispatching can be accomplished at lower expense
via Internet media types. When designing a new data format, the
appropriate mechanism to promote its deployment on the Web is the
Internet media type. Media types also provide a means for building new information space
applications [section 4.6]
, described below.
Note that even if an agent cannot process representation data in an
unknown format, it can at least retrieve it. The data may contain
enough information to allow a user or user agent to make some use
of it. When an agent does not handle a new URI scheme, it cannot
retrieve a representation.
The Internet Assigned Numbers
Authority (IANA) maintains a registry [IANASchemes] of mappings between
URI scheme names and scheme specifications. For instance, the IANA
registry indicates that the "http" scheme is defined in [RFC2616]. The process for registering a
new URI scheme is defined in [RFC2717].
Unregistered URI schemes SHOULD NOT
be used for a number of reasons:
- There is no generally accepted way to locate the scheme
specification.
- Someone else may be using the scheme for other purposes.
- One should not expect that general-purpose software will do
anything useful with URIs of this scheme beyond URI
comparison.
One misguidedcomparison;
the motivation for
registering a new URI schemeeffect is to allow a software agent to launchlost.
a
Note: particular application when retrieving a representation. The same
thing can be accomplished at lower expense by dispatching instead
on theas type of the representation, thereby allowing use of existing
transfer protocols and implementations.
Even if an agent cannot process
representation data in an unknown format, it can at least retrieve
it. The data may contain enough information to allow a user or user
agent to makespecification) some use of it. When an agent does not handle a new
URI scheme, it cannot retrieve a representation.
When designing a new data format, the
preferred mechanism to promote its deployment on the Webwhere is the
Internet media type (see Representation Types and Internet Media Types (§3.2)).
Media types also provide a means for building new information
applications, as described in future directions for data formats (§4.6)."identify."
It is tempting to guess the nature of
a resource by inspection of a URI that identifies it. However, the
Web is designed so that agents communicate resource information
state through representations, not identifiers. In general, one cannot
determine the Internet media type of
representations of a resource representation by inspecting a URI
for that resource. For example, the ".html" at the end of
"http://example.com/page.html" provides no guarantee that
representations of the identified resource will be served with the
Internet media type "text/html". The publisher is free to allocate
identifiers and define how they are served. The HTTP protocol does
not constrain the Internet media type based on the path component
of the URI; the URI owner is free to configure the server to return
a representation using PNG or any other data format.
Resource state may evolve over time.
Requiring a URI owner to publish a new URI for each change in
resource state would lead to a significant number of broken
references. For robustness, Web architecture promotes
independence independence
between an identifier and the state of the identified resource.
Good
practice: URI opacity
Agents making use
of URIs SHOULD NOT attempt to infer properties of the referenced
resource.
In practice, a small number of
inferences can be made because they are explicitly licensedspecified by the
relevant specifications. Some of these inferences are discussed in
the details of
retrieving a representation (§3.1.1).
The example URI used in the
travel scenario
("http://weather.example.com/oaxaca") suggests to a human reader
that the identified resource has something to do with the weather
in Oaxaca. A site reporting the weather in Oaxaca could just as
easily be identified by the URI "http://vjc.example.com/315". And
the URI "http://weather.example.com/vancouver" might identify the
resource "my photo album."
On the other hand, the URI "mailto:joe@example.com"
"mailto:joe@example.com" indicates that the URI refers to a
mailbox. The "mailto" URI scheme specification authorizes agents to
infer that URIs of this form identify Internet mailboxes.
Some URI assignment authorities
document and publish their URI assignment policies. For more
information about URI opacity, see TAG issues metaDataInURI-31 and siteData-36.
Story
When browsingnavigating within the XHTML document
that Nadia receives as a representation of the resource identified by
by "http://weather.example.com/oaxaca", she finds that the
URI URI
"http://weather.example.com/oaxaca#weekend""http://weather.example.com/oaxaca#tom" refers to the part of
the representation that conveys information about thetomorrow's weather in weekend
outlook.Oaxaca. This URI includes the fragment identifier "weekend" (the
string after the "#").
The fragment identifier component of a URI
allows indirect identification of a secondary
resource by reference to a primary resource and
additional identifying information. The secondary resource may be
some portion or subset of the primary resource, some view on
representations of the primary resource, or some other resource
defined or described by those representations. The terms "primary
resource" and "secondary resource" are defined in section 3.5 of
[URI].
The terms “primary” and “secondary”
in this context do not limit the nature of the resource—they are
not classes. In this context, primary and secondary simply indicate
that there is a relationship between the resources for theinterpretation purposes
of one URI: the URI with a fragment identifier. Any resource can be
identified as aidentifiers secondary resource. It might also be identified
using a URI without a fragment identifier, and a resource may be
identified as a secondary resource via multiple URIs. The purpose
of these terms is to enable discussion of the relationship between
such resources, not to limit the nature of a resource.
The interpretation of fragment
identifiers is discussed in the section on media types and fragment
identifier semantics (§3.2.1).
See TAG issue abstractComponentRefs-37, which concerns the use
of fragment identifiers with namespace names to identify abstract components.
components.
There remain open questions
regarding identifiers on the Web.
The following sections identify a few areas of future work in the
Web community.
The integration of
internationalized identifiers (i.e., composed of characters beyond
those allowed by [URI]) into the
Web architecture is an important and open issue. See TAG issue
IRIEverywhere-27 for discussion about work going
on in this area.
Emerging Semantic Web technologies,
including the "Web Ontology Language (OWL)" [OWL10], define RDF properties such as
sameAs to assert that two URIs identify the same
resource or inverseFunctionalProperty to imply it.
Communication between agents over a
network about resources involves URIs, messages, and data. The The
Web's protocols (including HTTP, FTP, SOAP, NNTP, and SMTP) are
based on the exchange of messages. A message may include data as well as
metadata about a resource (such as the "Alternates" and "Vary" HTTP
headers), the message data, and the message itself (such as the "Transfer-encoding"
"Transfer-encoding" HTTP header). A message may even include
metadata about the message metadata (for message-integrity checks,
for instance). Two important
classes of message are those that request a representation of an Information Resource, and those that
return the result of such a request.
Story
Nadia follows a hypertext link
labeled "satellite image" expecting to retrieve a satellite photo
of the Oaxaca region. The link to the satellite image is an XHTML
link encoded as <a
href="http://example.com/satimage/oaxaca">satellite
image</a>. Nadia's browser analyzes the URI and
determines that its scheme
is "http". The browser configuration determines how it locates the identified
identified information, which might be via a cache of prior
retrieval actions, by contacting an intermediary (such as a proxy
server), or by direct access to the server identified by a portion
of the URI. In this example, the browser opens a network connection
to port 80 on the server at "example.com" and sends a "GET" message
as specified by the HTTP protocol, requesting a representation of
the resource.
The server sends a response message
to the browser, once again according to the HTTP protocol. The
message consists of several headers and a JPEG image. The browser
reads the headers, learns from the "Content-Type" field that the
Internet media type of the representation is "image/jpeg", reads
the sequence of octets that make up the representation data, and
renders the image.
This section describes the
architectural principles and constraints regarding interactions
between agents, including such topics as network protocols and
interaction styles, along with interactions between the Web as a
system and the people that make use of it. The fact that the Web is
a highly distributed system affects architectural constraints and
assumptions about interactions.
3.1. Information
Resources and Representations
The term
Information Resource refers to
resources that convey information. Any resource that has a representation
is an information resource.
A representation
consists
logically of two parts: data (expressed in one or more formats used separately or in combination) and
metadata (such as the Internet media
type of the data).
The Information Resource provides the foundation for the familiar
hypertext Web, where agents use representations to modify as well as
retrieve information state. Much of this document describes
architecture specific to Information Resources. For instance, the
techniques of caching and content negotiation,
and the social processes of publishing, apply to Information
Resources.
Agents may use a URI to access the
referenced resource; this is called dereferencing the URI. Access may
take many forms, including retrieving a representation of the
resource (for instance, by using HTTP GET or HEAD), adding or
modifying a representation of the resource (for instance, by using
HTTP POST or PUT, which in some cases may change the actual state
of the resource if the submitted representations are interpreted as
instructions to that end), and deleting some or all representations
of the resource (for instance, by using HTTP DELETE, which in some
cases may result in the deletion of the resource itself).
There may be more than one way to
access a resource for a given URI; application context determines
which access method an agent uses. For instance, a browser might
use HTTP GET to retrieve a representation of a resource, whereas a
hypertext link checker might use HTTP HEAD on the same URI simply
to establish whether a representation is available. Some URI
schemes set expectations about available access methods, others
(such as the URN scheme [RFC
2141]) do not. Section 1.2.2 of [URI] discusses the separation of identification and
interaction in more detail. For more information about
relationships between multiple access methods and URI
addressability, see the TAG finding "URIs, Addressability, and the use of HTTP GET and
POST".
Although many URI schemes (§2.4) are named after protocols, this
does not imply that use of such a URI will necessarily result in
access to the resource via the named protocol. Even when an agent uses
uses a URI to retrieve a representation, that access might be through
through gateways, proxies, caches, and name resolution services
that are independent of the protocol associated with the scheme
name.
Many URI schemes define a default
interaction protocol for attempting access to the identified
resource. That interaction protocol is often the basis for
allocating identifiers within that scheme, just as "http" URIs are
defined in terms of TCP-based HTTP servers. However, this does not
imply that all interaction with such resources is limited to the
default interaction protocol. For example, information retrieval
systems often make use of proxies to interact with a multitude of
URI schemes, such as HTTP proxies being used to access "ftp" and
"wais" resources. Proxies can also to provide enhanced services,
such as annotation proxies that combine normal information
retrieval with additional metadata retrieval to provide a seamless,
multidimensional view of resources using the same protocols and
user agents as the non-annotated Web. Likewise, future protocols
may be defined that encompass our current systems, using entirely
different interaction mechanisms, without changing the existing
identifier schemes. See also, principle of orthogonal specifications (§5.1).
Dereferencing a URI generally
involves a succession of steps as described in multiple
specifications and implemented by the agent. The following example
illustrates the series of specifications that governs the process
when a user agentinstructs a is instructed to follow a hypertext link (§4.4) that is part of an SVG
document. In this example, the URI is
"http://weather.example.com/oaxaca" and the application context calls
calls for the user agent to retrieve and render a representation of
the identified resource.
- Since the URI is part of a hypertext link in an SVG document,
the first relevant specification is the SVG 1.1 Recommendation
[SVG11]. Section 17.1 of
this specification imports the link semantics defined in XLink 1.0
[XLink10]: "The remote
resource resource
(the destination for the link) is defined by a URI specified by the
XLink
href attribute on the 'a'
element." element."
The SVG specification goes on to state that interpretation of an
a element involves retrieving a representation of a
resource, identified by the href attribute in the
XLink namespace: "By activating these links (by clicking with the
mouse, through keyboard input, voice commands, etc.), users
may may
visit these resources."
- The XLink 1.0 [XLink10]
specification, which defines the
href attribute in in
section 5.4, states that "The value of the href attribute must be a
URI reference as defined in [IETF RFC 2396], or must result in a
URI reference after the escaping procedure described below is
applied."
- The URI specification [URI]
states that "Each URI begins with a scheme name that refers to a
specification for assigning identifiers within that scheme." The
URI scheme name in this example is "http".
- [IANASchemes] states
that the "http" scheme is defined by the HTTP/1.1 specification
(RFC 2616 [RFC2616],
section section
3.2.2).
- In this SVG context, the agent constructs an HTTP GET
request request
(per section 9.3 of [RFC2616])
to retrieve the representation.
-
Section 6 of [RFC2616]
defines how the server constructs a corresponding response message,
including the 'Content-Type' field.
- Section 1.4 of [RFC2616]
states "HTTP communication usually takes place over TCP/IP
connections." This example addressesdoes not neither that step in the
process nor other steps such as Domain Name System
(DNS) resolution.
- The agent interprets the returned representation according to
the data format specification that corresponds to
the the
representation's Internet Media Type (§3.2) (the value of the HTTP
'Content-Type') in the relevant IANA registry [MEDIATYPEREG].
Precisely which representation(s)
are retrieved depends on a number of factors, including:
- Whether the URI owner makes available any representations at
all;
- Whether the agent making the request has access privileges for
those representations (see the section on linking and access control (§3.5.2));
- If the URI owner has provided more than one representation (in
different formats such as HTML, PNG, or RDF; in different languages
such as English and Spanish; or transformed dynamically according
to the hardware or software capabilities of the recipient), the
resulting representation may depend on negotiation between the user
agent and server.
- The time of the request; the worldinformation changes over time, and so
representations of resourcesthat information are also likely to change over
time.change.
AssumingNote also that a representation has
been successfullyand retrieved, the expressive power of the
representation'sa format will affect how precisely the
representation provider communicates resource state. If the
representation communicates theuse state of the resource inaccurately,
thisto inaccuracy or ambiguityinformation may lead to confusion about what the
resource is. If different users reach different conclusions aboutturn
what the resource is, this maycan lead to UR