Copyright © 2002 W3C ® ( MIT , INRIA , Keio ), All Rights Reserved. W3C liability , trademark , document use , and software licensing rules apply.
The World Wide Web is a networked information system. Web Architecture consists of the requirements, constraints, principles, and design choices that influence the design of the system and the behavior of agents within the system. When followed, the large-scale effect is that of a shared information space. This document organizes the technical discussion of the system in three parts: identification, representation, and interaction. This document also addresses some non-technical (social) issues that contribute to the shared information space.
This document strives to establish a reference set of requirements, constraints, principles, and design choices for Web architecture.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. The latest status of this document series is maintained at the W3C.
This draft incorporates suggestions from Roy Fielding and others who have sent comments to www-tag. is intended for review by the TAG. It does not represent consensus within the TAG. This document has been developed by W3C's Technical Architecture Group (TAG) ( charter ). A list of changes in this document is available.
This draft remains incomplete; sections 1 and 2 are the most developed, 3 and 4 the least. The TAG has published a number of findings that address specific architecture issues. Parts of those findings may appear in subsequent drafts. Please also consult the list of issues under consideration by the TAG.
This draft includes some editorial notes and also references to open TAG issues . These do not represent all open issues in the document. They are expected to disappear from future drafts.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than "work in progress."
The latest information regarding patent disclosures related to this document is available on the Web. As of this publication, there are no disclosures.
Please send comments on this document to the public W3C TAG mailing list www-tag@w3.org ( archive ).
A list of current W3C Recommendations and other technical documents can be found at the W3C Web site.
The World Wide Web (or, Web) is a networked information system consisting of <span id="glossary-agents" title="Agents" class="glossary-entry"> agents </span> ( programs acting on behalf of another person, entity, or process ) that exchange information.
This document organizes Web architecture into:
The terms MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY are used in accordance with RFC 2119 [ RFC2119 ].
The intended audience for this document includes:
The authors have made every effort to keep this document terse, with the expectation that additional documents will elaborate on the required properties, constraints, and principles, rationale, and examples.
Readers will benefit from familiarity with the Requests for Comments ( RFC ) series from the IETF , some of which define pieces of the architecture discussed in this document. deleted text: For more information on RFCs, refer to "The Internet Standards Process -- Revision 3" [ <a shape="rect" href="#RFC2026"> RFC2026 </a> ].
This document focuses on the architecture of the Web. For instance, the principles enumerated in this document are those closely related to the Web. General design principles are not discussed in detail, such as minimal constraint (fewer rules makes the system more flexible), modularity, minimum redundancy, extensibility, simplicity, and robustness.
Other groups within W3C are addressing architectural design goals in the following areas:
For information about architectural principles of the Internet, refer to [ RFC1958 ].
In
the
design
of
the
Web,
some
design
decisions,
like
the
names
of
the
<p>
and
<li>
elements
in
HTML,
or
the
choice
of
the
colon
character
in
URIs,
are
somewhat
arbitrary;
if
<par>,
<elt>,
or
*
had
been
chosen
instead,
the
large-scale
result
would,
most
likely,
have
been
the
same.
Other
design
choices
are
more
fundamental;
these
are
the
focus
of
this
document.
The terms used in the following list are elaborated on in the document.
Some of the items in the above list may conflict with current practice, and so education and outreach will be required to improve on that practice. Other items may fill in gaps in published specifications or may call attention to known weaknesses in those specifications.
The architecture described in this document is the result of experience. There has been some theoretical and modeling work in the area of Web Architecture, notably Roy Fielding's work on "Representational State Transfer" [ REST ].
The Web is a universe of resources. A <span class="glossary-entry" id="glossary-resource" title="Resource"> resource </span> is defined by [ RFC2396 ] to be anything that has identity. Examples include documents, files, menu items, machines, and services, as well as people, organizations, and concepts. Web architecture starts with a uniform syntax for resource identifiers, so that we can refer to resources, access them, describe them, and share them. The Uniform Resource Identifier (URI) syntax employs an extensible set of URI schemes . Several URI schemes incorporate deleted text: into this syntax some identification mechanisms that pre-date the Web: Web into this (generic URI) syntax:
mailto:nobody@example.org
ftp://example.org/aDirectory/aFile
news:comp.infosystems.www
tel:+1-816-555-1212
Other URI schemes have been introduced since the advent of the Web, including those introduced as a consequence of new protocols. Examples of URIs for these schemes include:
http://www.example.org/something?with=arg1;and=arg2
ldap://ldap.itd.umich.edu/c=GB?objectClass?one
urn:oasis:SAML:1.0
One can append a fragment identifier to a URI to yield an identifier for part of, or a view of, a resource 2 . The following URIs include fragment identifiers:
ftp://example.org/aDirectory/aDocument#section1
http://www.example.org/states#texas
Note
that
while
this
composition
is
syntactically
fully
general,
it
is
meaningless
in
some
URI
schemes.
The
URI
mailto:nobody@example.org#abc
is
meaningless
in
practice.
A generic syntax for URIs is defined by [ RFC2396 ]. The current document uses the term "URI" to mean, in RFC2396 terms, an absolute URI reference 3 optionally followed by a fragment identifier . The TAG is working actively to convince the IETF to revise RFC2396 so that the definition of "URI" aligns with the current document.
When one resource refers to another via a URI, a <span class="glossary-entry" id="glossary-link" title="Link"> link </span> is formed. When many resources are linked this way, the large-scale effect is a shared information space, where resources are addressable identifiable by URI. The value of the Web increases with the number of resources addressable identified by URI; this is due to the "network effect." In turn, resources are more valuable when they are addressable in identifiable on the Web. Hence:
Constraint
Use URIs: All important resources SHOULD be identified by a URI. 4
There are many benefits to making resources addressable identifiable by URI. Some are by design (e.g., linking and bookmarking), while others have arisen naturally (e.g., global search services). See the TAG finding URIs, Addressability, and the use of HTTP GET for some details about the interaction of this principle in HTTP application design.
The two primary operations on URIs are:
There may be applications (e.g., XML namespace names [ XMLNS ]) where comparison is expected to be the sole or primary operation on a URI. Certain URI schemes provide rules for determining the syntactic equivalence of URIs, i.e., whether two URIs are different spellings of the same identifier. These rules vary from scheme to scheme.
For
example,
URNs
begin
with
two
colon-delimited
fields,
the
first
of
which
is
the
string
urn
and
the
second
identifies
is
the
subclass
of
URN,
for
example
<code>
urn:ietf:example
</code>.
"namespace
identifier"
(
NID
).
In
URNs,
these
two
fields
are
to
be
compared
in
a
case-insensitive
fashion.
The
remainder
of
the
URN
following
the
second
colon
is
subject
to
rules
dependent
on
the
content
of
the
second
field
(following
the
first
colon)
-
thus
the
equivalence
rules
may
vary
within
URN
namespace
identifiers.
Section
3.2.3
of
the
HTTP
specification
[
RFC2616
]
states
that,
when
comparing
two
HTTP
URIs,
the
host
name
part
must
be
considered
case-insensitive,
so
http://WWW.EXAMPLE/
and
http://www.example/
identify
the
same
resource.
Good practice
URI case: It SHOULD NOT be assumed that URIs which differ only in character case can be used interchangeably.
Note: Equivalence of URIs is not the same as consistent representations of a resource.
Issue : URIEquivalence-15 : When are two URI variants considered equivalent? See also issue IRIEverywhere-27 - Should W3C specifications start promoting IRIs?
To <span id="glossary-deref" title="Dereference" class="glossary-entry"> dereference </span> a URI is to interact apply in succession a finite set of relevant specifications, beginning with the resource it identifies. </span> <span id="desc-glossary-representation" class="glossary-desc"> One interacts with a resource by specification that governs the exchange of representations scheme of resource state; a <span id="glossary-representation" title="Representation" class="glossary-entry"> representation the URI .
A " representation " is a data object that represents or describes a resource state. </span> state, and is the vehicle for conveying the meaning of a resource. A resource is an abstraction for which there is a conceptual mapping to a (possibly empty) set of representations. Representations, when transferred by a Web <a shape="rect" href="#interaction"> protocol </a>, are often accompanied by metadata in the message (for example, HTTP headers). In particular, the value of the media type metadata value is key to the correct interpretation
As
an
example
of
deleted text:
a
resource
representation,
and
governs
the
handling
application
of
fragment
identifiers.
</p>
<p>
For
instance,
specifications
in
succession,
suppose
the
URI
that
http://weather.yahoo.com/forecast/MXOA0069
identifies
is
used
within
an
a
resource
element
of
an
SVG
document.
The
sequence
of
specifications
applied
is:
a
link
involves
retrieving
a
representation
of
a
resource,
identified
by
the
XLink
href
attribute:
"By
activating
these
links
(by
clicking
with
the
mouse,
through
keyboard
input,
and
voice
commands),
users
may
visit
these
resources."
This
means
that
the
GET
method
defined
in
HTTP/1.1
is
used
to
retrieve
the
representation
of
the
resource.
Representations, when transferred by a Web protocol , are often accompanied by metadata in the message (for example, HTTP headers). In particular, the value of the media type in the set of metadata is key to the correct interpretation of a resource representation, and governs the handling of fragment identifiers. See section 2 for more information about formats used to encode representations.
Depending on the protocol used, there may be several ways to dereference a URI. One of the most important operations for the Web is to <span id="glossary-retrieve" title="Retrieve a representation" class="glossary-entry"> retrieve a representation </span> of a resource (such as with HTTP GET), which means to retrieve a representation of the state of the resource. There are other ways to interact with a resource (such as with HTTP POST). Dereference mechanisms vary by URI scheme . For instance, the URN scheme [ RFC 2141 ] does not specify a dereference procedure.
Good practice
Resource descriptions: Owners of important resources SHOULD make available representations that describe the nature and purpose of those resources.
Issue : namespaceDocument-8 : What should a "namespace document" look like?
Principle
Safe retrieval: Agents do not incur obligations by retrieving a representation.
For instance, a user does not incur an obligation by following an HTML link that causes the user agent to retrieve a representation. Tools such as proxies and search engines can retrieve representations without user interaction; it would be harmful to the Web if such operations incurred obligations. See the TAG finding " URIs, Addressability, and the use of HTTP GET" for more information about safe retrieval.
Issue : deepLinking-25 : What to say in defense of principle that deep linking is not an illegal act?
deleted text: <p> <span class="ednote"> Editor's note </span>: Need to say something about difference between assertions about a resource and assertions about a representation. E.g., do not use the same URI to refer to the resource "Moby Dick" and to the particular representation of that resource, or do not use the same URI to refer to a person and to that person's mailbox. See <a shape="rect" href="http://www.w3.org/2001/tag/ilist#httpRange-14"> issue httpRange-14 </a>. </p>URIs represent a worldwide contract for who can create names and how the resources they designate take on meaning. In the case of HTTP URIs, for example, the agreement is that the authoritative meaning of the resource designated by the URI is established by retrieving a representation of the resource (per the HTTP specification [ RFC2616 ]) and then interpreting the representation according to the relevant specifications. The authoritative meaning of a resource is established by following specifications.
Representations
of
a
resource
may
vary
as
a
function
of
factors
including
time,
the
identity
of
the
agent
accessing
the
resource,
data
submitted
to
the
resource
when
interacting
with
it,
and
changes
external
to
the
resource.
Consider
the
previous
URI
http://weather.yahoo.com/forecast/MXOA0069
:
representations
for
the
designed
resource
(the
weather
in
Oaxaca)
depend
on
(at
least)
time,
the
expressed
preference
of
the
user
for
Fahrenheit
or
Celsius,
the
identity
of
the
user-agent
software
receiving
the
representation,
and,
presumably,
the
weather
in
Oaxaca.
Good practice
Consistent representations: It is confusing and costly when, for a given URI, representations vary in unpredictable ways.
For example, serving two images as equivalents through HTTP content negotiation, where one image represents a square and the other a circle, will undermine confidence in the URI used to retrieve those images.
A
description
of
what
a
URI
identifies
should
be
unambiguous.
For
instance,
saying
that
the
URI
http://www.example.com/moby
identifies
"Moby
Dick"
can
lead
to
confusion
because
this
might
be
interpreted
as
any
one
of
the
following
very
distinct
resources:
a
particular
printing
of
this
work
(say,
by
ISBN),
or
the
work
itself
in
an
abstract
sense
(for
example,
using
RDF),
or
the
fictional
white
whale,
or
a
particular
copy
of
the
book
on
the
shelves
of
a
library
(via
the
Web
interface
of
the
library's
online
catalogue),
or
the
record
in
the
library's
electronic
catalogue
which
contains
the
metadata
about
the
work,
or
the
Gutenberg
project's
online
version
.
Similarly,
one
should
not
use
the
same
URI
to
refer
to
a
person
and
to
that
person's
mailbox.
See
issue
httpRange-14
.
There are thus strong social expectations that once a URI identifies a particular resource, it should continue indefinitely to refer to that resource; this is called the <span class="glossary-entry" title="Persistence" id="glossary-persistence"> persistence </span> of the URI. Persistence is always a matter of policy and commitment on the part of authorities assigning URIs rather than a constraint imposed by technological means.
For
example,
each
W3C
technical
report
(e.g.,
"the
SVG
specification")
is
in
fact
a
series
of
documents
that
mature
over
time
(from
Working
Drafts,
Candidate
Recommendations,
Proposed
Recommendations,
to
Recommendation).
W3C
assigns
a
URI
to
the
"latest
version"
in
the
series
(e.g.,
http://www.w3.org/TR/SVG
).
W3C
also
assigns
a
URI
for
each
specification
in
the
series
(called
the
"this
version
URI",
e.g.,
http://www.w3.org/TR/2001/PR-SVG-20010719/
).
W3C
policy
is
that
representations
of
the
"latest
version"
resource
will
change
over
time
(with
each
new
publication
of
an
SVG
specification).
W3C
policy
is
also
that
representations
of
a
specification
designated
by
a
"this
version"
identifier
will
not
change
over
time,
to
the
best
of
W3C's
ability
to
maintain
its
archives
intact.
HTTP [ RFC2616 ] has been designed to promote consistency. For example, HTTP redirection (via some of the 3xx response codes) permits servers to tell a client that further action needs to be taken by the client in order to fulfill the request (e.g., the resource has been assigned a new URI). In addition, content negotiation also promotes consistency, as a site manager would not be required to define new URIs for each new format that is supported, as would be the case with protocols that don't support content negotiation, such as FTP.
For more discussion about persistence, refer to [ Cool ]. 5
It
is
confusing
and
costly
when
people
use
the
same
URI
to
refer
to
different
resources
(i.e.,
where
there
is
some
inconsistency
in
usage
compared
to
the
authoritative
meaning
of
the
resource).
Suppose
company
A
uses
http://example.com/coolcompany
to
refer
to
CoolCompany's
home
page,
while
company
B
uses
http://example.com/coolcompany
to
refer
to
CoolCompany.
Company
A
then
buys
company
B,
but
when
they
try
to
merge
their
databases,
they
cannot
due
to
this
inconsistent
usage
of
the
URI.
Good practice
Consistent URIs: Indiscriminate use of a URI undermines its value and interferes with people who rely on it.
One
important
characteristic
of
a
<span class="glossary-entry" title="URI Scheme" id="glossary-uri-scheme">
URI
</span>
is
its
scheme
(the
string
that
precedes
the
first
colon
in
a
URI).
For
example
the
scheme
of
the
URI
http://www.example.com/
is
"http",
and
for
ftp://ftp.example.com/
it
is
"ftp".
It
is
common
to
classify
URIs
by
scheme,
calling
the
two
preceding
examples
respectively
an
"HTTP
URI"
and
an
"FTP
URI".
Since many aspects of URI processing are scheme-dependent, and since a huge range of software is expected to be able to process URIs, the cost of introduction of new URI schemes is very high.
Good practice
New URI schemes: Authors of specifications SHOULD avoid introducing new URI schemes when existing schemes can be used to meet the goals of the specifications.
While "myscheme:blort" is a URI that satisfies the syntactic constraints of [ RFC2396 ], if "myscheme" is not registered, you are not guaranteed that somebody else isn't already using it for something else.
The IANA registry [ IANASchemes ] lists registered URI schemes and the specifications that define them. For instance, the IANA registry indicates that the "http" scheme is defined by [ RFC2616 ]. Refer to RFC2717 for information about registering a new URI scheme.
The deployment and use of different URI schemes may require varying degrees of central coordination and administration. For example, MAILTO, FTP, and HTTP URIs depend (in practice at least) on the use of the DNS infrastructure. Also, there is a central registry of URN subclasses <sup> <a name="note6" id="note6" href="#urn-namespaces"> 6 </a> </sup>. namespace identifiers.
In some URI schemes it is meaningful for a URI to end with a fragment identifier. The fragment identifier is interpreted only after the retrieval of a representation. Section 4.1 of [ RFC2396 ] states that "the format and interpretation of fragment identifiers is dependent on the media type [RFC2046] of the retrieval result," that is, the representation.
For instance, if the representation is an HTML document, the fragment identifies a hypertext anchor. In the case of a graphics format, the fragment might identify a circle or spline. In the Resource Description Framework [ RDF10 ], fragments can be used to identify anything, be it abstract (e.g., a dream) or concrete (e.g., an automobile).
Good practice
Coneg with fragments: Authors SHOULD NOT use HTTP content negotiation for different media types that do not share the same fragment identifier semantics.
Editor's note : There has been some discussion but no agreement that new access protocols should provide a means to convert fragment identifiers according to media type.
The following generalities about URIs are included to answer some frequently asked questions about URIs. Some of these These are generalities do not because they hold for all some, but not necessarily all, URI schemes .
http://www.example.com/lj45sr
and
know
that
it
refers
to
"my
old
car"
or
"the
weather
forecast
for
Oaxaca."
Over time, we trust that some URIs will identify familiar resources, but that trust derives from social behavior, not the spelling of the identifier.
Data on the Web manifests itself through <a shape="rect" href="#resource-interactions"> resource representations . A resource representation consists of:
A format specification describes the structure of the bit sequence.
Refer to other W3C format guidelines: Charmod, XAG, etc.
What is a format, and how does it relate to the concept of a document. Do all documents have a format? Is a document a collection of resources of different formats organized into a whole? Is a document the same as a resource? the same as a message body? as a non-multipart message body? What is the distinction between documents and data, if any. Does 'document' imply human readable and if so, does it imply presentation? Does it imply a hierarchically structured, report-like document with headings and subheadings? Is a catalog a document? Is a rave flyer a document?
Negotiation (stuff above might go here also) by network request, by listed alternatives in content any preference? Resource variants, foo.css and foo.html unlikely to be equivalent.
On the interpretation and processing of formats (see namespaceDocument-8 and mixedNamespaceMeaning-13 ):
@@Incomplete sections on specification design.@@
On using XML:
This section attempts to organize some areas of future discussion. Separating the concepts content, presentation, and interaction allows more easily composable specifications. For example, a markup language can be specified independently of a style sheet language. The separation facilitates alternate presentations of the same content, which is seen to have an accessibility advantage and to be more suited to the multiple modalities of Web access.
Issue : contentPresentation-26 : Separation of semantic and presentational markup, to the extent possible, is architecturally sound.
Composability (ns-meaning). Use of XML for tree structured content. Linking in general v. idref in one document. Human readable v. machine data. Served or not (hidden behind server - semantic firewall, accessibility. Linking into parts of the content, transclusion of parts. Compound documents, components from multiple servers - scalability, deep linking. Processing models, error handling.
Presentation by decoration (application of CSS to XML as presentation), and by derivation (creation of html/svg/etc as presentation). Linking (bidirectionally) between content and presentations. Inheritance of properties across namespaces. Consistency of property names. Subsets. 'Applies to' as opposed to 'set on'. Specificity of properties as attributes, chaining styling, restyling. Time-lines, linking to portions of a time-line.
Animation, scripting, events, client/server interaction. Declarative v. script based - accessibility, power; formalization of common functionality (loop animation, rollovers) in declarative form. DOM - making additional methods, add to rather than replacing XML DOM. Effect of script/programming language limitations on choice of element and attribute names. Linking to active components - XForms example with model and abstract form control, can be extended to presentational instantiation of form control.
As mentioned in the introduction, the Web is designed to create the large-scale effect of a shared information space that scales well and behaves predictably.
@@There may be some general principles that hold across all three previous chapters. Put them in an appendix and refer to them from each section?@@
When designing specifications that address independent functions of a system, avoidable references between the specifications are in general harmful. They are harmful because they impede the independent evolution of the specifications.
For
example,
it
is
a
strength
of
XML
that
XPath
cannot
query
the
HTTP
header.
It
is
a
strength
of
HTTP
that
it
does
not
refer
to
details
of
the
underlying
TCP
do
to
the
extent
that
it
cannot
be
run
over
a
different
transport
service.
Similarly,
the
RDF
data
graph
has
a
significance
that
is
independent
of
the
actual
serialization.
However,
there
is
a
flaw:
the
embedded
XML
parsetype="Literal"
data
type.
Sometimes it is necessary (and good for given application) to break layers. For example, it is good for an HTTP client to be aware of TCP speeds and round trip times to different mirror servers in order to optimize the choice of server. When designing specification, identify the functionalities that break layers so it is clear when they are being used.
http://example/dir1/dir2/file1
,
the
relative
URI
reference
../file2
is
a
shortened
form
of
http://example/dir1/file2
and
the
relative
URI
reference
#abc
is
a
shortened
form
for
http://example/dir1/dir2/file1#abc
.
(
Note
3
context.
)
The authors of this document are the participants of W3C's Technical Architecture Group: Tim Berners-Lee (Chair, W3C), Tim Bray (Antarctica Systems), Dan Connolly (W3C), Paul Cotton (Microsoft), Roy Fielding (Day Software), Chris Lilley (W3C), David Orchard (BEA Systems), Norman Walsh (Sun), and Stuart Williams (Hewlett-Packard).
The TAG thanks people for their thoughtful contributions on the TAG's public mailing list, www-tag ( archive ).