DRAFT: Introduction to Web Achitecture

1 April 2002

This version:: http://www.w3.org/2001/tag/2002/0330-intro
Superseded by:: http://www.w3.org/2001/tag/2002/0508-intro
Editor:: Ian Jacobs, W3C

Abstract

This is an introduction to World Wide Web ("Web" from here on) Architecture, as seen by the W3Cs' Technical Architecture Group (TAG).This introduction has two purposes: to give the reader a general sense of what the TAG means by Web Architecture, and to call out some of the principles regarded as fundamental to the success of the Web. This introduction does not go into detail about any one piece of Web Architecture. That will be the TAG's primary role going forward: to write down the details and keep watering and pruning the model as the Web grows.

Status of this document

This document has been superseded. See next version.

This document has been developed for discussion by the W3C Technical Architecture Group.

This document is the work of the editor. It incorporates the work of the other TAG participants by sewing together pieces they have written. It is a draft with no official standing. It does not necessarily represent the consensus opinion of the TAG.

Comments may be directed to the W3C TAG mailing list www-tag@w3.org (archive).

Publication of this document by W3C indicates no endorsement by W3C.

Why Web Architecture?

By any measure, the Web has been a successful instrument for human communication. Broadly speaking, there are a number of reasons for this success, including these important ones:

It is easy to use.
It can grow (scale) rapidly since it is not centrally controlled.
It allows people to share just about whatever they want (poetry, commerce, music, and so on).

The purpose of Web Architecture is to understand why the Web is easy to use, why it can grow, and why it is so flexible as a communications tool. The Web is not always easy to use (for example, by some people with disabilities, or because it's hard to find what you're looking for, or because your browser acts in an unexpected manner). Web Architecture also helps us understand why the Web infrastructure does not meet our needs, or why we lose confidence in it.

One important principle of Web Architecture is that in order to allow maximum flexibility in how the Web works, there should be as few rules as possible that must be obeyed in order for it to work. However, there have to be some rules to ensure that people and computers can communicate at all (interoperate). One reason to write down this model of Web Architecture is to provide guidance to organizations (W3C and others) developing technologies, so that these technologies build on principles known to promote a healthy Web.

You might already be asking yourself "Shouldn't the Web Architecture have been worked out in advance? Isn't it a little late to be doing this?" Web Architecture principles have been around since the beginning of the Web (hence its success so far). But the Web continues to evolve in scope, means of access change (e.g., increased access through mobile devices), and thus it's an ongoing process to confirm what we believed to be the Web, and to plan for its expansion to the extent that we can.

The Web is far from perfect. And the Web Architecture that will be described by the TAG is neither perfect nor complete. But just as the Web is very useful though imperfect, this Web Architecture is expected to be a useful foundation to drive consensus on how to build a better Web.

What is the Web?

A definition of the Web that motivates the particular Web Architecture that is described here is: "The Web is a shared information space." This definition is convenient because looking at the Web from afar, and over a long period of time, gives the illusion of slow change. When URIs break, or Web sites that one day sold books do unrelated activities the next, or when the author tells you that you need a special tool to use their site, or when a page takes too long to download, your confidence in the Web falls a little. Shared understanding relies on stability and ongoing confirmation. Web Architecture is the set of rules that Web agents (browsers, multimedia players, and other clients and servers that exchange information) follow that result in the large-scale effect of a shared information space.

We call the things we share on the Web "resources". Anything that can be named can be a Web resource: a person, idea, dream, or physical object. Chapter one of this document explains how we identify Web resources with Uniform Resource Identifiers (URI) and some properties of URIs that help the Web scale. The URI is the most fundamental piece of the Web Architecture, which is perhaps why it has aroused such lengthy debates (names are important!) and why it must be thoroughly explained and understood

URIs identify resources that change over time. If you request the W3C home page today and again next week, you are likely to get two different results since W3C announces news there several times a week. In one sense, this resource is still the W3C home page; that is stable. But one request for the home page may result in a different representation than a later request. Note that fewer changes imply more stability, but this is a resource we want to change, otherwise it would be boring. Some resources should change more often (e.g., weather resources) and some less (e.g., books written long ago).

Web agents communicate about resources by exchanging representations of them -- we call these documents in a broad sense -- and messages about these documents. Resources and representations are different, but sometimes choice of language leads to confusion that they are the same thing (just as the expression "the phone book" can mean an evolving set of phone numbers for a city but also a specific frozen copy of the phone book that's in the kitchen drawer).

We construct representations of resources using data formats (such as HTML) and agents exchange these representations and messages about them according to well-defined protocols (such as HTTP). A representation consists of data, data describing the data (which we call metadata), and, on occasion, metadata to describe the metadata. Chapter two discusses:

a nonexclusive set of data formats designed for interchange between Web agents. This includes several formats used in isolation or in combinations (e.g. XHTML, PNG, XLink, RDF, SMIL animation, Ruby), as well as technologies for designing new formats (XML, Namespaces, DOM). Some of these formats are sufficiently ubiquitous that their semantics may be considered part of the Web Architecture: examples include the elements inside HTML's "head" element and the prescribed error-handling behavior for XML processors. The data format of a representation is known as a media type (or MIME type, from the title of the defining specification).
a small and nonexclusive set of protocols for interchanging information between agents: HTTP comes to mind first, but SMTP and others are also important. Several of these protocols share a reliance on the MIME metadata/packaging system.

Perhaps most importantly, chapter two explains how to we actually succeed in understanding one another through the exchange data formats and protocols.

Chapter three returns to the question of how we make the Web look fairly static even as it teems with innumerable messages and actions of document creation, modification, and removal. Users benefit from a Web Architecture that hides details of operation under the hood. For instance, HTML was designed so that users would not have to see URIs; they are to be hidden behind an interface of telling link text. To explain what constraints on URIs, protocols, and data formats enable this mass hallucination that the Web is stable, we use a model called REST, for Representational State Transfer.

Chapters one, two, and three focus on the Web as a quasi-static distribution of the state of resources. In chapter three, messages go on "under the hood" and we use REST to hide the implementation details from our view of the Web. In chapter four, we consider those messages that we want to rise to the surface, to be part of the user's Web experience, and how this affects the Web Architecture. While in chapter two we describe the interpretation of documents, in chapter four we describe the interpretation of messages.

Chapter 1: What does a URI identify?

@@This section and others to be integrated soon@@

Chapter 2: What does a document mean?

Chapter 3: The illusion of a stable Web (REST)

Chapter 4: What does a message mean?

Ian Jacobs
Last modified $Date: 2002/05/15 21:07:52 $