Copyright © 2002 W3C ® (MIT, INRIA, Keio), All Rights Reserved. W3C liability, trademark, document use, and software licensing rules apply.
The World Wide Web is a networked information system. Web Architecture is the set of rules that all agents in the system follow that result in the large-scale effect of a shared information space. Identification, data formats, and protocols are the main technical components of Web Architecture, but the large-scale effect depends on social behaviours as well.
This document is a reference set of rules for Web Architecture.
This document has been superseded. See next version.
This document has been developed for discussion by the W3C Technical Architecture Group.
This draft is highly unstable. This draft represents substantial input from TAG participants, but does not yet represent consensus. It is a draft with no official standing. Once this document has undergone substantial revision, the TAG expects to develop it on the W3C Recommendation track.
Please send comments on this document to the public W3C TAG mailing list www-tag@w3.org (archive).
Publication of this document by W3C indicates no endorsement by W3C.
The World Wide Web ("Web" from here on) is a networked information system consisting of agents (clients, servers, and other programs) that exchange information. Open: Web Architecture is the set of rules that all agents in the system follow that result in the large-scale effect of a shared information space that scales well and behaves predictably.
This architecture consists of:
This document focuses on architectural principles specific to or fundamental to the Web. It does not address general principles of design, which are also important to the success of the Web. Indeed, behind many of the principles of Web Architecture lie these and other principles: minimal constraint (fewer rules makes the system more flexible), modularity, minimum redundancy, extensibility, simplicity, robustness, etc.
This document does not address design goals covered by targetted W3C specifications:
According to [RFC2396] a resource is "anything that has identity." A resource is part of the Web when there is a URI that identifies it. (Open:issue httpRange-14 : What is the range of the HTTP dereference function?).
UseURI: All important resources SHOULD be part of the Web, i.e., identified by a URI.
Open: The URI specification [RFC2396] represents a worldwide agreement on who can create identifiers and how they take on meaning in protocols and formats.
A number of identification mechanisms pre-date the Web, such as those for electronic mailboxes and ftp documents. URIs were designed to incorporate these existing naming schemes ('ftp', 'mailto', etc.) and a new scheme designed specially for the Web: 'http'.
A URI scheme defines the properties of URIs in that scheme. The
IANA registry [IANASchemes] lists URI schemes and
the specifications that define them. For instance, the HTTP URI scheme is defined in section 3.2.2
of the HTTP specification [RFC2616]. In a URI, the
scheme name appears before the colon (":"), as in
ftp
://www.ietf.org/rfc/rfc2396.txt
.
Open: Some important properties vary by URI scheme, including the following:
As mentioned above, a URI schemes may have different persistence properties. There are strong social expectations that once a URI identifies a particular resource, it should continue indefinitely to refer to that resource. Persistence is always a matter of policy and commitment on the part of authorities assigning URIs rather than a constraint imposed by technological means.
For example, each W3C technical report (e.g., "the SVG specification") is in fact a series of documents that represent the maturation of the technical report (Working Drafts, Candidate Recommendations, Proposed Recommendations, and a Recommendation). W3C assigns a URI to the "latest version" in the specification series (e.g., http://www.w3.org/TR/SVG). W3C also assigns a URI for each specification in the series (called the "this version URI", as in http://www.w3.org/TR/2001/PR-SVG-20010719/). W3C policy is that representations of the "latest version" resource will change over time (with each new publication of an SVG specification). W3C policy is also that representations of a specification designed by a "this version" URI will not change over time (to the best of W3C's ability to maintain its archives intact).
RFC 2141 [RFC2141] defines the "Uniform Resource Name (URN) URI scheme. URNs form a subset of URIs that are required to remain globally unique and persistent even when the resource ceases to exist or becomes unavailable (per section 1.2 of [RFC2396]). In practice, URNs cannot be dereferenced. URIs of other schemes (including HTTP) can also be managed to meet the goal of persistence, and can be dereferenced.
For more ideas on persistence policies, see "Cool URIs Don't Change" [Cool].
In general, to promote scalability, Web architecture should avoid centralized registries. There are exceptions (e.g., DNS may be acceptable). On the other thand, the TAG finding "Mapping between URIs and Internet Media Types" promotes the idea of using the Web as a repository for new Media Types. [TAG issue uriMediaType-9]
AvoidRegistries: Designers SHOULD avoid centralized registries but MAY rely on the continued existence and utility of the DNS.
To dereference a URI means to request a representation of the resource designated by the URI. The dereference mechanism varies according to URI scheme and must be defined by each scheme where dereferencing is a goal. See "Guidelines for new URL Schemes" [RFC2718]. The dereference mechanism for the HTTP URI scheme is GET [TAG issue whenToUseGet-7].
UseGET: Agents SHOULD be able to dereference URIs for important resources.
Open: "Since HTTP GET is defined and widely deployed, agends SHOULD use HTTP URIs.
Open: Say something here à la what Tim Bray said: "Don't build a world of resources that cannot be identified by URI."?
DescribeResource: Dereferencing a URI for an important abstract concept (for example, Internet protocol parameters) SHOULD return human and/or machine readable representations that describe the nature and purpose of those resources.
GETIsSafe: URI Dereferencing URIs is safe; i.e. agents do not incur obligations by following links. [TAG finding "URIs, Addressability, and the use of HTTP GET"]
Please refer to the TAG finding "URIs, Addressability, and the use of HTTP GET" for information about safe operations and using HTTP GET for addressibility.
Section 4 of RFC 2396 [RFC2396] introduces the term URI Reference to include absolute URIs and two other constructs also used for identification:
../main.html
. The meaning of a relative URI references
depends on the context where it is used, unlike absolute URIs, whose
meaning is the same in any context.There are thus four classes of identifiers that comprise URI References:
SYSTEM
identifiers belong to
this class.Open:
UseOfURIReference: Authors of specifications MUST use the terms "URI" and "URI Reference" according to the definitions in RFC2396.
URIs that can be deferenced can end with a fragment identifier (to form a URI reference). Section 4.1 of [RFC2396] states that "the format and interpretation of fragment identifiers is dependent on the media type [RFC2046] of the retrieval result," that is, the representation. For instance, if the representation is an HTML document, the fragment identifier designates a hypertext anchor. In the case of a graphics format, a URI reference might designate a circle or spline. In the case of RDF, a a URI reference can designate anything, be it abstract (e.g., a dream) or concrete (e.g., my car). The plain text media type does not define semantics for fragment identifiers.
ConegFragment: Authors SHOULD NOT use HTTP content negotiation for different media types that do not share the same fragment identifier semantics.
Open: New access protocols should provide a means to convert fragment identifiers according to media type.
@@Ideas:@@
As mentioned in the introduction, the Web is designed to create the large-scale effect of a shared information space that scales well and behaves predictably. The architectural style known as Representational State Transfer [REST] encapsulates this notion of a shared information space. According to Fielding:
REST provides a set of architectural constraints that, when applied as a whole, emphasizes scalability of component interactions, generality of interfaces, independent deployment of components, and intermediary components to reduce interaction latency, enforce security, and encapsulate legacy systems.
-- Roy Fielding, Section 5.5 of [REST]
HTTP has been specially designed for REST interactions. HTTP has a variety of methods designed to manipulate resource state through represenation transfer between agents. These methods include GET (covered in section 1.2), POST, PUT, and DELETE.
This chapter uses the REST model to explain how Web protocols take into account the properties of resources and URIs, as well as real-world time and space constraints, in order to improve the user's Web experience.
Relevant issues, findings:
Do not make assumptions about a resource based on the spelling of a URI that refers to it (other than what is defined in specifications for the URI scheme). Since URIs are opaque, it is an error to assume, for example, that a URI that happens to end with the string ".html" refers to a resource that has an HTML representation. Though people must not infer anything about the nature of a resource representation from a URI ending in ".html", resource owners must not create confusion by purposely misassigning suffixes and representation types.
At times it is useful or necessary to reveal a URI (e.g., in an advertisement on the side of a bus), in which case, good social behavior requires that the URI be easy to use. But in general, just as "children should be seen but not heard", URIs should be used but not seen. In general, URIs should be hidden from view since they are ugly to look at and they tend to lure us into thinking they hold definitive meaning about a resource.
Open: Canonical form of URIs. Seeissue URIEquivalence-15.
Authors should not use a URI to identify more than one resource.
Nothing prevents us from considering "a representation of the novel Moby Dick" to be a resource itself (and thus to have an assigned URI). Authors should not use the same URI to refer to the resource "Moby Dick" and to the particular representation of that resource. Similarly, authors should not use the same URI to refer to a person and to that person's mailbox.