Architecture of the World Wide Web

W3C Working Editor's Draft 7 12 November 2002

This version:: <a shape="rect" href="http://www.w3.org/2001/tag/2002/WD-webarch-20021107"> http://www.w3.org/2001/tag/2002/WD-webarch-20021107 http://www.w3.org/2001/tag/2002/webarch-20021112
Previous version:: <a shape="rect" href="http://www.w3.org/2001/tag/2002/WD-webarch-20021029"> http://www.w3.org/2001/tag/2002/WD-webarch-20021029 http://www.w3.org/2001/tag/2002/WD-webarch-20021107
Latest TAG draft:: http://www.w3.org/2001/tag/webarch/
Latest TR page draft:: http://www.w3.org/TR/webarch/
Editor:: Ian Jacobs, W3C
Authors:: See acknowledgments .

Abstract

The World Wide Web is a networked information system. Web Architecture consists of the requirements, constraints, principles, and design choices that influence the design of the system and the behavior of agents within the system. When followed, the large-scale effect is that of a shared information space. This document organizes the technical discussion of the system in three parts: identification, representation, and interaction. This document also addresses some non-technical (social) issues that contribute to the shared information space.

This document strives to establish a reference set of requirements, constraints, principles, and design choices for Web architecture.

Status of this document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. The latest status of this document series is maintained at the W3C.

This draft incorporates suggestions from Roy Fielding and others who have sent comments to www-tag. is intended for review by the TAG. It does not represent consensus within the TAG. This document has been developed by W3C's Technical Architecture Group (TAG) ( charter ). A list of changes in this document is available.

This draft remains incomplete; sections 1 and 2 are the most developed, 3 and 4 the least. The TAG has published a number of findings that address specific architecture issues. Parts of those findings may appear in subsequent drafts. Please also consult the list of issues under consideration by the TAG.

This draft includes some editorial notes and also references to open TAG issues . These do not represent all open issues in the document. They are expected to disappear from future drafts.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than "work in progress."

The latest information regarding patent disclosures related to this document is available on the Web. As of this publication, there are no disclosures.

Please send comments on this document to the public W3C TAG mailing list www-tag@w3.org ( archive ).

A list of current W3C Recommendations and other technical documents can be found at the W3C Web site.

1. Introduction
2. Identification and resources
3. Representations
4. Interaction
- 4.1. HTTP and REST
- 4.2. Ideas and issues
5. General design principles
- 5.1. Information hiding
6. Glossary
7. References
- 7.1. Normative References
- 7.2. Non-Normative References
8. End notes
9. Acknowledgments

1. Introduction

The World Wide Web (or, Web) is a networked information system consisting of agents ( programs acting on behalf of another person, entity, or process ) that exchange information.

This document organizes Web architecture into:

Identification . Agents identify objects in the system (called "resources") with Uniform Resource Identifiers ( URIs ), defined in [ RFC2396 ].
Representation . Agents represent resources using a nonexclusive set of data formats. used formats, separately or in combination (e.g., XHTML, CSS, PNG, XLink, RDF/XML, SMIL animation). This section also discusses technologies for building new data formats (XML, XML Namespaces).
Interaction . Agents exchange representations via protocols, including HTTP [ RFC2616 ], FTP, and SMTP ¹. Several of these protocols share a reliance on the Multipurpose Internet Mail Extensions ( MIME ) standards for the format of message bodies [ RFC2045 ] and for Internet Media Types [ RFC2046 ].

The terms MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY are used in accordance with RFC 2119 [ RFC2119 ].

1.1. Audience of this document

The intended audience for this document includes:

Participants in W3C groups,
Other groups and individuals developing technologies to be integrated into the Web.

The authors have made every effort to keep this document terse, with the expectation that additional documents will elaborate on the required properties, constraints, and principles, rationale, and examples.

Readers will benefit from familiarity with the Requests for Comments ( RFC ) series from the IETF , some of which define pieces of the architecture discussed in this document. deleted text: For more information on RFCs, refer to "The Internet Standards Process -- Revision 3" [ <a shape="rect" href="#RFC2026"> RFC2026 </a> ].

1.2. Scope of this document

This document focuses on the architecture of the Web. For instance, the principles enumerated in this document are those closely related to the Web. General design principles are not discussed in detail, such as minimal constraint (fewer rules makes the system more flexible), modularity, minimum redundancy, extensibility, simplicity, and robustness.

Other groups within W3C are addressing architectural design goals in the following areas:

Internationalization; see W3C's Internationalization Activity .
Accessibility; see W3C's Web Accessibility Initiative .
Device independence; see W3C's Device Independence Activity .

For information about architectural principles of the Internet, refer to [ RFC1958 ].

1.3. Summary of required properties, constraints, principles, and good practice notes

In the design of the Web, some design decisions, like the names of the and <li> elements in HTML, or the choice of the colon character in URIs, are somewhat arbitrary; if <par>, <elt>, or * had been chosen instead, the large-scale result would, most likely, have been the same. Other design choices are more fundamental; these are the focus of this document.

The terms used in the following list are elaborated on in the document.

Use URIs: [constraint]: All important resources SHOULD be identified by a URI.
URI case: [practice]: It SHOULD NOT be assumed that URIs which differ only in character case can be used interchangeably.
Resource descriptions: [practice]: Owners of important resources SHOULD make available representations that describe the nature and purpose of those resources.
Safe retrieval: [principle]: Agents do not incur obligations by retrieving a representation.
Consistent representations: [practice]: It is confusing and costly when, for a given URI, representations vary in unpredictable ways.
Consistent URIs: [practice]: Indiscriminate use of a URI undermines its value and interferes with people who rely on it.
New URI schemes: [practice]: Authors of specifications SHOULD avoid introducing new URI schemes when existing schemes can be used to meet the goals of the specifications.
Coneg with fragments: [practice]: Authors SHOULD NOT use HTTP content negotiation for different media types that do not share the same fragment identifier semantics.

Some of the items in the above list may conflict with current practice, and so education and outreach will be required to improve on that practice. Other items may fill in gaps in published specifications or may call attention to known weaknesses in those specifications.

The architecture described in this document is the result of experience. There has been some theoretical and modeling work in the area of Web Architecture, notably Roy Fielding's work on "Representational State Transfer" [ REST ].

2. Identification and resources

The Web is a universe of resources. A resource is defined by [ RFC2396 ] to be anything that has identity. Examples include documents, files, menu items, machines, and services, as well as people, organizations, and concepts. Web architecture starts with a uniform syntax for resource identifiers, so that we can refer to resources, access them, describe them, and share them. The Uniform Resource Identifier (URI) syntax employs an extensible set of URI schemes . Several URI schemes incorporate deleted text: into this syntax some identification mechanisms that pre-date the Web: Web into this (generic URI) syntax:

MAILTO URIs identify mailboxes:
mailto:nobody@example.org
FTP URIs identify identify ftp files and directories:
ftp://example.org/aDirectory/aFile
NEWS URIs identify USENET newsgroups:
news:comp.infosystems.www
TEL URIs identify telephones: terminals on the telephone network:
tel:+1-816-555-1212

Other URI schemes have been introduced since the advent of the Web, including those introduced as a consequence of new protocols. Examples of URIs for these schemes include:

http://www.example.org/something?with=arg1;and=arg2
ldap://ldap.itd.umich.edu/c=GB?objectClass?one
urn:oasis:SAML:1.0

One can append a fragment identifier to a URI to yield an identifier for part of, or a view of, a resource ². The following URIs include fragment identifiers:

ftp://example.org/aDirectory/aDocument#section1
http://www.example.org/states#texas

Note that while this composition is syntactically fully general, it is meaningless in some URI schemes. The URI mailto:nobody@example.org#abc is meaningless in practice.

A generic syntax for URIs is defined by [ RFC2396 ]. The current document uses the term "URI" to mean, in RFC2396 terms, an absolute URI reference ³ optionally followed by a fragment identifier . The TAG is working actively to convince the IETF to revise RFC2396 so that the definition of "URI" aligns with the current document.

2.1. Resources, URIs, and the shared information space

When one resource refers to another via a URI, a link is formed. When many resources are linked this way, the large-scale effect is a shared information space, where resources are addressable identifiable by URI. The value of the Web increases with the number of resources addressable identified by URI; this is due to the "network effect." In turn, resources are more valuable when they are addressable in identifiable on the Web. Hence:

Constraint

Use URIs: All important resources SHOULD be identified by a URI. ⁴

There are many benefits to making resources addressable identifiable by URI. Some are by design (e.g., linking and bookmarking), while others have arisen naturally (e.g., global search services). See the TAG finding URIs, Addressability, and the use of HTTP GET for some details about the interaction of this principle in HTTP application design.

2.2. Operations on URIs

The two primary operations on URIs are:

Comparison of To compare identifiers
Interaction with resources To dereference a URI

2.2.1. Comparison of Comparing identifiers

There may be applications (e.g., XML namespace names [ XMLNS ]) where comparison is expected to be the sole or primary operation on a URI. Certain URI schemes provide rules for determining the syntactic equivalence of URIs, i.e., whether two URIs are different spellings of the same identifier. These rules vary from scheme to scheme.

For example, URNs begin with two colon-delimited fields, the first of which is the string urn and the second identifies is the subclass of URN, for example <code> urn:ietf:example </code>. "namespace identifier" ( NID ). In URNs, these two fields are to be compared in a case-insensitive fashion. The remainder of the URN following the second colon is subject to rules dependent on the content of the second field (following the first colon) - thus the equivalence rules may vary within URN namespace identifiers.

Section 3.2.3 of the HTTP specification [ RFC2616 ] states that, when comparing two HTTP URIs, the host name part must be considered case-insensitive, so http://WWW.EXAMPLE/ and http://www.example/ identify the same resource.

Good practice

URI case: It SHOULD NOT be assumed that URIs which differ only in character case can be used interchangeably.

Note: Equivalence of URIs is not the same as consistent representations of a resource.

Issue : URIEquivalence-15 : When are two URI variants considered equivalent? See also issue IRIEverywhere-27 - Should W3C specifications start promoting IRIs?

2.2.2. <a shape="rect" name="resource-interactions" id="resource-interactions"> Interactions with resources Dereferencing a URI

To dereference a URI is to interact apply in succession a finite set of relevant specifications, beginning with the resource it identifies. One interacts with a resource by specification that governs the exchange of representations scheme of resource state; a representation the URI .

A " representation " is a data object that represents or describes a resource state. state, and is the vehicle for conveying the meaning of a resource. A resource is an abstraction for which there is a conceptual mapping to a (possibly empty) set of representations. Representations, when transferred by a Web <a shape="rect" href="#interaction"> protocol </a>, are often accompanied by metadata in the message (for example, HTTP headers). In particular, the value of the media type metadata value is key to the correct interpretation

As an example of deleted text: a resource representation, and governs the handling application of fragment identifiers. For instance, specifications in succession, suppose the URI that http://weather.yahoo.com/forecast/MXOA0069 identifies is used within an a resource element of an SVG document. The sequence of specifications applied is:

The URI specification [ RFC2396 ]. This specification says (in section 3.1) that deleted text: is the weather forecast scheme "define the semantics for Oaxaca, Mexico. A representation retrieved by means the remainder of that URI may be encoded in any number of formats, including HTML, XHTML, and SVG; see <a shape="rect" href="#representations"> section 2 </a> for more information about formats. Interaction with a resource is governed by successive application of a finite set of specifications, beginning with the specification that governs the <a shape="rect" href="#URI-scheme"> scheme of the URI </a>. For example, suppose that <code> http://weather.yahoo.com/forecast/MXOA0069 </code> is used within an <code> a </code> element of an SVG document. The sequence of specifications applied is: <ol> <li> The URI specification [ <a shape="rect" href="#RFC2396"> RFC2396 </a> ]. This specification says (in section 3.1) that the scheme "define the semantics for the remainder of the the URI string." In this case, the URI scheme is HTTP.
The HTTP/1.1 protocol. Section 3.2.2 of RFC2616 [ RFC2616 ] explains the semantics of HTTP URIs.
The SVG 1.0 Recommendation [ SVG10 ], which imports the link semantics defined by XLink 1.0 [ XLink10 ]. Section 17.1 of the SVG specification suggests that interaction with an a link involves retrieving a representation of a resource, identified by the XLink href attribute: "By activating these links (by clicking with the mouse, through keyboard input, and voice commands), users may visit these resources." This means that the GET method defined in HTTP/1.1 is used to retrieve the representation of the resource.
Once the representation has been retrieved, the media type of the representation governs its interpretation (here, for rendering).

Representations, when transferred by a Web protocol , are often accompanied by metadata in the message (for example, HTTP headers). In particular, the value of the media type in the set of metadata is key to the correct interpretation of a resource representation, and governs the handling of fragment identifiers. See section 2 for more information about formats used to encode representations.

2.2.3. Retrieving a representation

Depending on the protocol used, there may be several ways to dereference a URI. One of the most important operations for the Web is to retrieve a representation of a resource (such as with HTTP GET), which means to retrieve a representation of the state of the resource. There are other ways to interact with a resource (such as with HTTP POST). Dereference mechanisms vary by URI scheme . For instance, the URN scheme [ RFC 2141 ] does not specify a dereference procedure.

Good practice

Resource descriptions: Owners of important resources SHOULD make available representations that describe the nature and purpose of those resources.

Issue : namespaceDocument-8 : What should a "namespace document" look like?

Principle

Safe retrieval: Agents do not incur obligations by retrieving a representation.

For instance, a user does not incur an obligation by following an HTML link that causes the user agent to retrieve a representation. Tools such as proxies and search engines can retrieve representations without user interaction; it would be harmful to the Web if such operations incurred obligations. See the TAG finding " URIs, Addressability, and the use of HTTP GET" for more information about safe retrieval.

Issue : deepLinking-25 : What to say in defense of principle that deep linking is not an illegal act?

deleted text: Editor's note : Need to say something about difference between assertions about a resource and assertions about a representation. E.g., do not use the same URI to refer to the resource "Moby Dick" and to the particular representation of that resource, or do not use the same URI to refer to a person and to that person's mailbox. See <a shape="rect" href="http://www.w3.org/2001/tag/ilist#httpRange-14"> issue httpRange-14 </a>.

2.2.4. Consistent representations and persistence

URIs represent a worldwide contract for who can create names and how the resources they designate take on meaning. In the case of HTTP URIs, for example, the agreement is that the authoritative meaning of the resource designated by the URI is established by retrieving a representation of the resource (per the HTTP specification [ RFC2616 ]) and then interpreting the representation according to the relevant specifications. The authoritative meaning of a resource is established by following specifications.

Representations of a resource may vary as a function of factors including time, the identity of the agent accessing the resource, data submitted to the resource when interacting with it, and changes external to the resource. Consider the previous URI http://weather.yahoo.com/forecast/MXOA0069: representations for the designed resource (the weather in Oaxaca) depend on (at least) time, the expressed preference of the user for Fahrenheit or Celsius, the identity of the user-agent software receiving the representation, and, presumably, the weather in Oaxaca.

Good practice

Consistent representations: It is confusing and costly when, for a given URI, representations vary in unpredictable ways.

For example, serving two images as equivalents through HTTP content negotiation, where one image represents a square and the other a circle, will undermine confidence in the URI used to retrieve those images.

A description of what a URI identifies should be unambiguous. For instance, saying that the URI http://www.example.com/moby identifies "Moby Dick" can lead to confusion because this might be interpreted as any one of the following very distinct resources: a particular printing of this work (say, by ISBN), or the work itself in an abstract sense (for example, using RDF), or the fictional white whale, or a particular copy of the book on the shelves of a library (via the Web interface of the library's online catalogue), or the record in the library's electronic catalogue which contains the metadata about the work, or the Gutenberg project's online version . Similarly, one should not use the same URI to refer to a person and to that person's mailbox. See issue httpRange-14 .

There are thus strong social expectations that once a URI identifies a particular resource, it should continue indefinitely to refer to that resource; this is called the persistence of the URI. Persistence is always a matter of policy and commitment on the part of authorities assigning URIs rather than a constraint imposed by technological means.

For example, each W3C technical report (e.g., "the SVG specification") is in fact a series of documents that mature over time (from Working Drafts, Candidate Recommendations, Proposed Recommendations, to Recommendation). W3C assigns a URI to the "latest version" in the series (e.g., http://www.w3.org/TR/SVG ). W3C also assigns a URI for each specification in the series (called the "this version URI", e.g., http://www.w3.org/TR/2001/PR-SVG-20010719/ ). W3C policy is that representations of the "latest version" resource will change over time (with each new publication of an SVG specification). W3C policy is also that representations of a specification designated by a "this version" identifier will not change over time, to the best of W3C's ability to maintain its archives intact.

HTTP [ RFC2616 ] has been designed to promote consistency. For example, HTTP redirection (via some of the 3xx response codes) permits servers to tell a client that further action needs to be taken by the client in order to fulfill the request (e.g., the resource has been assigned a new URI). In addition, content negotiation also promotes consistency, as a site manager would not be required to define new URIs for each new format that is supported, as would be the case with protocols that don't support content negotiation, such as FTP.

For more discussion about persistence, refer to [ Cool ]. ⁵

2.2.5. Consistent use of URIs

It is confusing and costly when people use the same URI to refer to different resources (i.e., where there is some inconsistency in usage compared to the authoritative meaning of the resource). Suppose company A uses http://example.com/coolcompany to refer to CoolCompany's home page, while company B uses http://example.com/coolcompany to refer to CoolCompany. Company A then buys company B, but when they try to merge their databases, they cannot due to this inconsistent usage of the URI.

Good practice

Consistent URIs: Indiscriminate use of a URI undermines its value and interferes with people who rely on it.

2.3. URI Schemes

One important characteristic of a URI is its scheme (the string that precedes the first colon in a URI). For example the scheme of the URI http://www.example.com/ is "http", and for ftp://ftp.example.com/ it is "ftp". It is common to classify URIs by scheme, calling the two preceding examples respectively an "HTTP URI" and an "FTP URI".

Since many aspects of URI processing are scheme-dependent, and since a huge range of software is expected to be able to process URIs, the cost of introduction of new URI schemes is very high.

Good practice

New URI schemes: Authors of specifications SHOULD avoid introducing new URI schemes when existing schemes can be used to meet the goals of the specifications.

While "myscheme:blort" is a URI that satisfies the syntactic constraints of [ RFC2396 ], if "myscheme" is not registered, you are not guaranteed that somebody else isn't already using it for something else.

The IANA registry [ IANASchemes ] lists registered URI schemes and the specifications that define them. For instance, the IANA registry indicates that the "http" scheme is defined by [ RFC2616 ]. Refer to RFC2717 for information about registering a new URI scheme.

The deployment and use of different URI schemes may require varying degrees of central coordination and administration. For example, MAILTO, FTP, and HTTP URIs depend (in practice at least) on the use of the DNS infrastructure. Also, there is a central registry of URN subclasses <a name="note6" id="note6" href="#urn-namespaces"> 6 </a> . namespace identifiers.

2.4. Fragment identifiers

In some URI schemes it is meaningful for a URI to end with a fragment identifier. The fragment identifier is interpreted only after the retrieval of a representation. Section 4.1 of [ RFC2396 ] states that "the format and interpretation of fragment identifiers is dependent on the media type [RFC2046] of the retrieval result," that is, the representation.

For instance, if the representation is an HTML document, the fragment identifies a hypertext anchor. In the case of a graphics format, the fragment might identify a circle or spline. In the Resource Description Framework [ RDF10 ], fragments can be used to identify anything, be it abstract (e.g., a dream) or concrete (e.g., an automobile).

Good practice

Coneg with fragments: Authors SHOULD NOT use HTTP content negotiation for different media types that do not share the same fragment identifier semantics.

Editor's note : There has been some discussion but no agreement that new access protocols should provide a means to convert fragment identifiers according to media type.

2.5. Some generalities about URIs

The following generalities about URIs are included to answer some frequently asked questions about URIs. Some of these These are generalities do not because they hold for all some, but not necessarily all, URI schemes .

It is not possible to inspect a URI and determine what resource it identifies. For example, in general, one cannot look at http://www.example.com/lj45sr and know that it refers to "my old car" or "the weather forecast for Oaxaca."
Over time, we trust that some URIs will identify familiar resources, but that trust derives from social behavior, not the spelling of the identifier.
Several different URIs can identify the same resource.
It is possible to compare two URIs to see whether they are spelled equivalently; see the section on comparison of identifiers for more details.
It is not possible to inspect two URIs that are spelled differently and determine whether they identify the same resource. This does not prevent some URI schemes from mandating equivalence for particular sets of URIs using that scheme.
It is not possible to inspect a URI and know the media type of representation(s) of that resource. For example, do not assume that a URI that ends with the string ".html" refers to a resource that has an HTML representation. Of course, resource owners should not publish URIs likely to cause confusion.

3. Representations

Data on the Web manifests itself through <a shape="rect" href="#resource-interactions"> resource representations . A resource representation consists of:

An Internet Media Type
A sequence of bits

A format specification describes the structure of the bit sequence.

Refer to other W3C format guidelines: Charmod, XAG, etc.

3.1. Scope

What is a format, and how does it relate to the concept of a document. Do all documents have a format? Is a document a collection of resources of different formats organized into a whole? Is a document the same as a resource? the same as a message body? as a non-multipart message body? What is the distinction between documents and data, if any. Does 'document' imply human readable and if so, does it imply presentation? Does it imply a hierarchically structured, report-like document with headings and subheadings? Is a catalog a document? Is a rave flyer a document?

Negotiation (stuff above might go here also) by network request, by listed alternatives in content any preference? Resource variants, foo.css and foo.html unlikely to be equivalent.

3.2. Processing model

On the interpretation and processing of formats (see namespaceDocument-8 and mixedNamespaceMeaning-13 ):

It's useful to say what xml:lang means in a very large number of cases, without too much effort
We also need to allow other specs to use xml:lang in other ways (e.g., xslt outputting it).

3.3. Format specification design guidelines

@@Incomplete sections on specification design.@@

3.3.1. When to use XML

Persistence; there is lots of redundancy
Internationalization
Clean error-handling; early detection of errors
Mix of structure and text content
Composability

On using XML:

Designers SHOULD use XML Namespaces when they use XML.

3.3.2. Content, Presentation, and Interaction

This section attempts to organize some areas of future discussion. Separating the concepts content, presentation, and interaction allows more easily composable specifications. For example, a markup language can be specified independently of a style sheet language. The separation facilitates alternate presentations of the same content, which is seen to have an accessibility advantage and to be more suited to the multiple modalities of Web access.

Issue : contentPresentation-26 : Separation of semantic and presentational markup, to the extent possible, is architecturally sound.

3.3.2.1. Content

Composability (ns-meaning). Use of XML for tree structured content. Linking in general v. idref in one document. Human readable v. machine data. Served or not (hidden behind server - semantic firewall, accessibility. Linking into parts of the content, transclusion of parts. Compound documents, components from multiple servers - scalability, deep linking. Processing models, error handling.

3.3.2.2. Presentation

Presentation by decoration (application of CSS to XML as presentation), and by derivation (creation of html/svg/etc as presentation). Linking (bidirectionally) between content and presentations. Inheritance of properties across namespaces. Consistency of property names. Subsets. 'Applies to' as opposed to 'set on'. Specificity of properties as attributes, chaining styling, restyling. Time-lines, linking to portions of a time-line.

3.3.2.3. Interactivity

Animation, scripting, events, client/server interaction. Declarative v. script based - accessibility, power; formalization of common functionality (loop animation, rollovers) in declarative form. DOM - making additional methods, add to rather than replacing XML DOM. Effect of script/programming language limitations on choice of element and attribute names. Linking to active components - XForms example with model and abstract form control, can be extended to presentational instantiation of form control.

3.4. Ideas and issues

For new format specifications, use XML family of specifications unless there's a good reason not to. Which XML specifications? Which particular family members?
Format designers should use URIs without constraining content providers to particular URI schemes.
Allow for Web-wide linking, not just internal document linking.
Namespaces. Issues namespaceDocument-8 , mixedNamespaceMeaning-13
Qnames: Issues rdfmsQnameUriMapping-6 , qnameAsId-18 and finding " Using QNames as Identifiers in Content "
Formatting properties: Issue formattingProperties-19 , contentPresentation-26
Error handling: Issue errorHandling-20
Media type registration: RFC3023Charset-21 , finding Internet Media Type registration, consistency of use . Also, makes sure to define fragment identifier semantics.
Effect of Mobile on architecture - size, complexity, memory constraints. Binary infosets, storage efficiency. Composable subsets.
What is the scope of using XLink? xlinkScope-23
Can a specification include rules for overriding HTTP content type parameters? contentTypeOverride-24
Create formats that allow authors to hide URIs from view (e.g., behind link text). For authors: at times it is useful or necessary to reveal a URI (e.g., in an advertisement on the side of a bus), in which case, good social behavior requires that the URI be easy to use.

4. Interaction

As mentioned in the introduction, the Web is designed to create the large-scale effect of a shared information space that scales well and behaves predictably.

4.1. HTTP and REST

4.2. Ideas and issues

Consistency of media types and message contents (from " TAG Finding: Internet Media Type registration, consistency of use "
Consistency of communicating character encoding (same source).
HTTP as a substrate protocol [ TAG issue HTTPSubstrate-16 ]

5. General design principles

@@There may be some general principles that hold across all three previous chapters. Put them in an appendix and refer to them from each section?@@

5.1. Information hiding

When designing specifications that address independent functions of a system, avoidable references between the specifications are in general harmful. They are harmful because they impede the independent evolution of the specifications.

For example, it is a strength of XML that XPath cannot query the HTTP header. It is a strength of HTTP that it does not refer to details of the underlying TCP do to the extent that it cannot be run over a different transport service. Similarly, the RDF data graph has a significance that is independent of the actual serialization. However, there is a flaw: the embedded XML parsetype="Literal" data type.

Sometimes it is necessary (and good for given application) to break layers. For example, it is good for an HTTP client to be aware of TCP speeds and round trip times to different mirror servers in order to optimize the choice of server. When designing specification, identify the functionalities that break layers so it is clear when they are being used.

6. Glossary

Agents: programs acting on behalf of another person, entity, or process
Dereference: To dereference a URI is to apply in succession a finite set of relevant specifications, beginning with the specification that governs the scheme of the URI.
Link: When one resource refers to another via a URI, a link is formed.
MIME: standards for the format of message bodies [RFC2045] and for Internet Media Types [RFC2046].
Persistence: There are thus strong social expectations that once a URI identifies a particular resource, it should continue indefinitely to refer to that resource; this is called the persistence of the URI.
Resource: A resource is defined by [RFC2396] to be anything that has identity.
Retrieve a representation: to retrieve a representation of the state of the resource.
URI Scheme: One important characteristic of a URI is its scheme (the string that precedes the first colon in a URI).

7. References

7.1. Normative References

IANASchemes: IANA's online registry of URI Schemes is available at http://www.iana.org/assignments/uri-schemes.; Dan Connolly's list of URI schemes is a useful resource for finding out which references define various URI schemes.
RFC2045: IETF " RFC 2045: Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies ", N. Freed, N. Borenstein, November 1996. Available at http://www.ietf.org/rfc/rfc2045.txt.
RFC2046: IETF " RFC 2046: Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types ", N. Freed, N. Borenstein, November 1996. Available at http://www.ietf.org/rfc/rfc2046.txt.
RFC2119: IETF " RFC 2119: Key words for use in RFCs to Indicate Requirement Levels ", S. Bradner, March 1997. Available at http://www.ietf.org/rfc/rfc2119.txt.
RFC2396: IETF " RFC 2396: Uniform Resource Identifiers (URI): Generic Syntax ", T. Berners-Lee, R. Fielding, L. Masinter, August 1998. Available at http://www.ietf.org/rfc/rfc2396.txt.
RFC2616: IETF " RFC 2616: Hypertext Transfer Protocol -- HTTP/1.1 ", J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, T. Berners-Lee, June 1999. Available at http://www.ietf.org/rfc/rfc2616.txt.
RFC2717: IETF " Registration Procedures for URL Scheme Names ", R. Petke, I. King, November 1999. Available at http://www.ietf.org/rfc/rfc2717.txt.

7.2. Non-Normative References

Axioms: " Universal Resource Identifiers - Axioms of Web Architecture ", T. Berners-Lee, living document dated December 1996. Available at http://www.w3.org/DesignIssues/Axioms.
Cool: " Cool URI's don't change " T. Berners-Lee, W3C, 1998 Available at http://www.w3.org/Provider/Style/URI.
CSS2: " Cascading Style Sheets, level 2 ", B. Bos, H. Lie, C. Lilley, I. Jacobs, 12 May 1998. This W3C Recommendation is available at http://www.w3.org/TR/1998/REC-CSS2-19980512/.
Eng90: " Knowledge-Domain Interoperability and an Open Hyperdocument System ", D. C. Engelbart, June 1990.
Fielding: " Principled Design of the Modern Web Architecture ", R.T. Fielding and R.N. Taylor, UC Irvine. In Proceedings of the 2000 International Conference on Software Engineering (ICSE 2000), Limerick, Ireland, June 2000, pp. 407-416. This document is available at http://www.ics.uci.edu/~fielding/pubs/webarch_icse2000.pdf.
Fragments: " Fragment Identifiers on URIs ", T. Berners-Lee, living document dated April 1997. Available at http://www.w3.org/DesignIssues/Fragment.
HTML40: " HTML 4.01 Specification ", D. Raggett, A. Le Hors, I. Jacobs, 24 December 1999. This W3C Recommendation is available at http://www.w3.org/TR/1999/REC-html401-19991224/.
P3P10: " The Platform for Privacy Preferences 1.0 (P3P1.0) Specification ", M. Marchiori, ed., 16 April 2002. This W3C Recommendation is available at http://www.w3.org/TR/2002/REC-P3P-20020416/.
RDF10: " Resource Description Framework (RDF) Model and Syntax Specification ", O. Lassila, R. R. Swick, eds., 22 February 1999. This W3C Recommendation is available at http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/.
REST: " Representational State Transfer (REST) ", Chapter 5 of "Architectural Styles and the Design of Network-based Software Architectures", Doctoral Thesis of R. T. Fielding, 2000. Available at http://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm.
RFC1958: IETF " RFC 1958: Architectural Principles of the Internet ", B. Carpenter, June 1996. Available at http://www.ietf.org/rfc/rfc1958.txt.
deleted text: <a shape="rect" name="RFC2026" id="RFC2026"> RFC2026 </a> </dt> <dd> IETF " <a shape="rect" href="http://www.ietf.org/rfc/rfc2026.txt"> RFC 2026: The Internet Standards Process -- Revision 3 </a> ", S. Bradner, October 1996. Available at http://www.ietf.org/rfc/rfc2026.txt. </dd> <dt> RFC2141: IETF " RFC 2141: URN Syntax ", R. Moats, May 1997. Available at http://www.ietf.org/rfc/rfc2141.txt.
RFC2718: " Guidelines for new URL Schemes ", L. Masinter, H. Alvestrand, D. Zigmond, R. Petke, November 1999. Available at: http://www.ietf.org/rfc/rfc2718.txt.
RFC3236: IETF " RFC 3236: The 'application/xhtml+xml' Media Type ", M. Baker, P. Stark, January 2002. Available at: http://www.rfc-editor.org/rfc/rfc3236.txt
SVG10: " Scalable Vector Graphics (SVG) 1.0 Specification ", J. Ferraiolo, ed., 4 Sep 2001. This W3C Recommendation is available at http://www.w3.org/TR/2001/REC-SVG-20010904/.
UniqueDNS: " IAB Technical Comment on the Unique DNS Root" , B. Carpenter, 27 Sep 1999. Available at http://www.icann.org/correspondence/iab-tech-comment-27sept99.htm.
XHTML10: " XHTML 1.0: The Extensible HyperText Markup Language: A Reformulation of HTML 4 in XML 1.0 ", S. Pemberton et al., 26 January 2000, revised 1 August 2002. Available at http://www.w3.org/TR/2002/REC-xhtml1-20020801/.
XLink10: " XML Linking Language (XLink) Version 1.0 ", S. DeRose, E. Maler, D. Orchard, 27 June 2001. This W3C Recommendation is available at http://www.w3.org/TR/2001/REC-xlink-20010627/.
XML10: " Extensible Markup Language (XML) 1.0 (Second Edition) ", T. Bray, J. Paoli, C.M. Sperberg-McQueen, E. Maler, 6 October 2000. This W3C Recommendation is available at http://www.w3.org/TR/2000/REC-xml-20001006.
XMLNS: " Namespaces in XML ", T. Bray, D. Hollander, A. Layman, 14 Jan 1999. This W3C Recommendation is available at http://www.w3.org/TR/1999/REC-xml-names-19990114/.
W3CPROCESS: " W3C Process Document ", 19 July 2001 Version. Available at http://www.w3.org/Consortium/Process-20010719/.

8. End notes

@@Text here on why SMTP part of Web@@ ( Note 1 context. )
When comparison is expected to be the sole or primary operation on a URI, it does not matter whether one has chosen a URI with our without a fragment identifier. However, when one expects to interact with a resource, there are some advantages to using a URI without a fragment identifier: only URIs work with intermediaries in the Web architecture (e.g., proxies) or with redirection (in HTTP, for example). ( Note 2 context. )
[ RFC2396 ] defines a URI reference to be either an absolute URI reference or a relative URI reference. The syntax for a relative URI reference is a shortened form of that for an absolute URI reference, where some prefix of the URI is missing and certain path components ("." and "..") have a special meaning when, and only when, interpreting a relative path. For example, in a document whose base URI is http://example/dir1/dir2/file1, the relative URI reference ../file2 is a shortened form of http://example/dir1/file2 and the relative URI reference #abc is a shortened form for http://example/dir1/dir2/file1#abc. ( Note 3 context. )
This principle dates back at least as far as Douglas Engelbart's seminal work on open hypertext systems; see section Every Object Addressable in [ Eng90 ]. ( Note 4 context. )
The title is somewhat misleading. It's not the URIs that change, it's what they identify. ( Note 5 context. )

deleted text:

<li> <a id="urn-namespaces" name="urn-namespaces"> </a> URN subclasses are called "namespaces" and are identified by namespace identifiers, or NIDs. ( <a href="#note6"> Note 6 context. </a> ) </li>

9. Acknowledgments

The authors of this document are the participants of W3C's Technical Architecture Group: Tim Berners-Lee (Chair, W3C), Tim Bray (Antarctica Systems), Dan Connolly (W3C), Paul Cotton (Microsoft), Roy Fielding (Day Software), Chris Lilley (W3C), David Orchard (BEA Systems), Norman Walsh (Sun), and Stuart Williams (Hewlett-Packard).

The TAG thanks people for their thoughtful contributions on the TAG's public mailing list, www-tag ( archive ).

Architecture of the World Wide Web

W3C Working Editor's Draft 7 12 November 2002

Abstract

Status of this document

Table of Contents

1. Introduction

1.1. Audience of this document

1.2. Scope of this document

1.3. Summary of required properties, constraints, principles, and good practice notes

2. Identification and resources

2.1. Resources, URIs, and the shared information space

2.2. Operations on URIs

2.2.1. Comparison of Comparing identifiers

2.2.2. <a shape="rect" name="resource-interactions" id="resource-interactions"> Interactions with resources Dereferencing a URI

2.2.3. Retrieving a representation

2.2.4. Consistent representations and persistence

2.2.5. Consistent use of URIs

2.3. URI Schemes

2.4. Fragment identifiers

2.5. Some generalities about URIs

3. Representations

3.1. Scope

3.2. Processing model

3.3. Format specification design guidelines

3.3.1. When to use XML

3.3.2. Content, Presentation, and Interaction

3.3.2.1. Content

3.3.2.2. Presentation

3.3.2.3. Interactivity

3.4. Ideas and issues

4. Interaction

4.1. HTTP and REST

4.2. Ideas and issues

5. General design principles

5.1. Information hiding

6. Glossary

7. References

7.1. Normative References

7.2. Non-Normative References

8. End notes

9. Acknowledgments