Web Architecture 101 - Keio SFC

Lecture Notes. April 2007.

Status: Work in Progress, Companion: Slide (Part 1) (PDF, 16 Mo), Slide (Part 2) (PDF, 12 Mo)


Architecture (noun) 建築学
• The art or practice of designing and constructing buildings.
• The style in which a building is designed or constructed, esp. with regard to a specific period, place, or culture : Victorian architecture.
• The complex or carefully designed structure of something : the chemical architecture of the human brain.
• The conceptual structure and logical organization of a computer or computer-based system : a client/server architecture.

Oxford American Dictionary

Architecture is the art of designing and building useful and usable things. This can refer to buildings or objects. Architecture is not technology for the sake of technology, design for the sake of design: a building needs a roof to protect people from rain and snow; it needs foundations to avoid collapsing; it needs doors if the inhabitants need or want privacy.

Likewise, the Architecture of the World Wide Web is not arbitrary. It comes from some specific needs (organizing and enriching information into a cohesive space) and some ideas to answer these needs.

This is why we will not be talking a lot about technology in this lecture. The What and How comes second. First comes the Why.

Names and Identifiers

And if they were able to converse with one another, do you not think that they would be in the habit of giving names to the objects they saw before them?

Republic, Plato

Naming is an important feature of language. It allows us to identify the subject or object of what is being talked about. It allows us to refer to objects or subjects, even when they are not present.

Naming usually only works in a local context. (e.g: the first name of a person is enough for a conversation within a small group, but ambiguous in wider contexts)

When a name permits unambiguous reference to a thing, regardless of context, we can call it a Universal Identifier.

Examples of identifiers: Bar Code, ISBN, SSN...

Examples of identifiers given by students

identifiercontextissues
Cow tracking number
fingerprint
Telephone number
Geographical coordinates
Car plate numbers
Student ID
Chemical formula
Unix login name

Identifying Concepts

Does a name (or an identifier) refer to a physical thing? When I say "this is a chair" (pointing at an object in the room), does "chair" identify the physical object? Not quite - chair can also be used to refer to any other objects in the room with legs, and on which I can sit. This is because "chair" refers not to a physical object, but to the concept of a "seat for one person, typically with a back and four legs".

…but the word "chair" also refers to the physical object made of wood, indirectly. Let us look a little more at the distinction betweeen the concept and its object(s)

From Concept to Physical Object

Read Plato's allergory of the cave. (or "イデア論" in Japanese). A concept is the intellectual idea. There can be many equivalent physical objects for a given concept.

For example, "Fruit" is a concept. A specific fruit I eat (e.g this strawberry, that apple) are physical objects which relate to the concept of "Fruit". If I say "pick a fruit in the basket", not everyone will choose the same thing. But they will all be correct: there are several physical "fruit" objects that are equivalent physical representations of the concept "fruit in the basket".

On the Web

  1. "Concepts" are called "Resources"
  2. "Identifiers" are called "URIs"
  3. "Physical Objects" are called "Representations"

… everything else that has been said so far is still valid, now translated in the words of the Web:

Axioms of Web Architecture: URIs

Some Axioms of Web Architecture regarding URIs:

Universality

Any resource anywhere can be given a URI. Reciprocally, when I want to refer to a resource on the Web, I should be using a URI.

Universal Resource Identifiers -- Axioms of Web Architecture (1996), Tim Berners-Lee

Global Scope

It doesn't matter to whom or where you specify that URI, it will have the same meaning.

Universal Resource Identifiers -- Axioms of Web Architecture (1996), Tim Berners-Lee

Sameness

a URI will repeatably refer to "the same" thing

Universal Resource Identifiers -- Axioms of Web Architecture (1996), Tim Berners-Lee

Opacity

It is tempting to guess the nature of a resource by inspection of a URI that identifies it. However, the Web is designed so that agents communicate resource information state through representations, not identifiers. In general, one cannot determine the type of a resource representation by inspecting a URI for that resource.

Architecture of the World Wide Web, Volume One

Benefits of using URIs as Identifiers

There are substantial benefits to participating in the existing network of URIs, including linking, bookmarking, caching, and indexing by search engines, and there are substantial costs to creating a new identification system that has the same properties as URIs.

Architecture of the World Wide Web, Volume One

Hypertext and the Web (a very short History)

See A Little History of the World Wide Web. Tim Berners-Lee made his proposal for an Information Management system at CERN in 1989, and launched the WorldWideWeb project in 1991. He was building on technologies and concepts developed since (at least) 1945.

Pioneers: Vannevar Bush (As We May Think, 1945), Ted Nelson (Literary Machines, 1981).

Key concepts: HyperText, HyperMedia. Linked data.

This is why a "web" of notes with links (like references) between them is far more useful than a fixed hierarchical system. When describing a complex system, many people resort to diagrams with circles and arrows. Circles and arrows leave one free to describe the interrelationships between things in a way that tables, for example, do not. The system we need is like a diagram of circles and arrows, where circles and arrows can stand for anything.

Information Management: A Proposal (1989) Tim Berners-Lee, CERN

Document Formats

Formats enabling hyperlinking

Literary works have long been using:

So, that notion of hypertext seemed to me immediately obvious because footnotes were already the ideas wriggling, struggling to get free, like a cat trying to get out of your arms.

BBC Interview, Ted Nelson

A (very) example of non-linear text: the Talmud.

On the web, non-linear data is Linked Data:

  1. Use URIs as names for things
  2. Use HTTP URIs so that people can look up those names.
  3. When someone looks up a URI, provide useful information.
  4. Include links to other URIs. so that they can discover more things.

Design Issues: Linked Data (2006), Tim Berners-Lee

Orthogonality, separation of Style and Content

Taking the example of a video, with sound, image and subtitles.

This depends on the format used. Building the video with Orthogonal technologies will ensure separation of independent features, while separating form (style) and content allows easier maintainance and reuse of both components.

The Web is a heterogeneous environment where a wide variety of agents provide access to content to users with a wide variety of capabilities. It is good practice for authors to create content that can reach the widest possible audience, including users with graphical desktop computers, hand-held devices and mobile phones, users with disabilities who may require speech synthesizers, and devices not yet imagined. Furthermore, authors cannot predict in some cases how an agent will display or process their content. Experience shows that the separation of content, presentation, and interaction promotes the reuse and device-independence of content; this follows from the principle of orthogonal specifications

Architecture of the World Wide Web, Volume One

When two specifications are orthogonal, one may change one without requiring changes to the other, even if one has dependencies on the other.

Architecture of the World Wide Web, Volume One

Interacting with the Information Space

First we defined resources. Little islands of information, isolated. We gave them Identifiers, Universal Identifiers so that any other resource could know about them, talk about them. Then came hyperlinking, a way to refer to the other resources using the identifiers: we could think of hyperlinking as defining routes between the islands. Defining a route is a good thing, but we still need boats to follow the route: we still need a technology to actually interact with this information space of linked resources.

Enter HTTP.

HTTP: The Web's Toolbox

HTTP (short for Hypertext Transfer Protocol) is a client/server protocol for the retrieval and manipulation of resources on the Web.

At the beginning of the lecture we likened the URIs as nouns in a natural language. HTTP provides the verbs.

The main HTTP verbs

HTTP Method Description
POST Send data for processing (or storage)
GET Retrieve a representation of a resource
PUT Update (or create) a resource
DELETE Delete a resource

PUT

The PUT HTTP Verb requests that some content be stored at the given URI.

POST

The POST HTTP Verb is used for actions requiring processing by the server. It is a way for the client to pass some data to the server, and say "here, take this, do your job". The job may involve storing data but it can be any kind of processing.

Wait... PUT or POST?

This is a tricky distinction, often misunderstood. PUT and POST do look alike, because POST can be used to store data at the given URI. The HTTP specification could probably disambiguate things further. What it says is:

The fundamental difference between the POST and PUT requests is reflected in the different meaning of the Request-URI. The URI in a POST request identifies the resource that will handle the enclosed entity. That resource might be a data-accepting process, a gateway to some other protocol, or a separate entity that accepts annotations. In contrast, the URI in a PUT request identifies the entity enclosed with the request -- the user agent knows what URI is intended and the server MUST NOT attempt to apply the request to some other resource.

HTTP 1.1 Specification

In other words, the important distinction is:

  1. When using PUT, I am clearly storing data under the given URI
  2. When using POST, I may be storing data, but this storing of data may be anywhere, not just in the resource I am targeting. It all depends on the process behind the POST action.

Eliotte Harold also wrote an excellent note on POST vs. PUT, well worth reading.

GET

the GET HTTP Verb is used for retrieval of a representation of resource on the web.

GET comes with parameters (the Accept* Headers) which can be used by the server to determine which is the best representation of the resource, if several are available. This process is called Content Negotiation, and it can be very useful to serve representation with equivalent formats (Format Negotiation) or languages (Language Negotiation).

One important thing to remember is that GET operations must be both safe and idempotent (basically: they must not have any side effect). Reciprocally, any resource retrieval without side effect side effects should use GET.

See also: URIs, Addressability, and the use of HTTP GET and POST by the W3C Technical Architecture Group.

GET's little brother is HEAD. the HEAD HTTP Verb is used to retrieve only information about a given resource and representation, but not the content of the representation itself.

DELETE

The DELETE HTTP Verb is used to request the deletion of the resource from the Web. It is important to note that it is not equivalent from removing a file on the server.

... and more

There are a few other verbs. See the specifications of HTTP Methods (or verbs) for more. HTTP Verbs also have been extended with WebDAV.

REST: an architectural style based on the power of HTTP

References and Further Readings

In English

In Japanese / 日本語

Karl Dubost, olivier Thereaux. W3C / Keio University