Architecture of the World Wide Web

Ian Jacobs, TAG Editor
World Wide Web Consortium

Meeting of xml.gov XML Working Group
21 January 2004, Washington, D.C.

These slides online at:
http://www.w3.org/2004/Talks/0121-ij-xmlgov/
(zip file).

W3C's Technical Architecture Group (TAG)
Overview of Architecture of the World Wide Web
Architecture Review of US Patent Office Web Site

W3C's Technical Architecture Group

W3C's Technical Architecture Group was chartered in 2001:

"to document and build consensus around principles of Web architecture and to interpret and clarify these principles when necessary."

Roles: write, coordinate, mediate

TAG Participants

TAG participants elected and appointed:

TAG, posing in front of same motorbike as in Sep 2002, in Vancouver

Why an Architecture Document?

To distill ten years of experience with the hypertext Web
To help developers of Web technologies avoid pitfalls
To provide guidance to users, site managers, software designers on promoting a robust Web
To build consensus around concepts and terms
To learn humility...

Community Brings Issues to TAG

23 Jan 2002: Potential architecture issue brought to TAG's attention on public list www-tag@w3.org (archive):
- HTTP GET deprecated in XForms Last Call Working Draft (background)
29 Jan 2002: TAG accepts issue whenToUseGet-7.

Teleconference and mailing list discussions ensue.

TAG Explores Problem Space

What makes HTTP GET important?

HTTP GET designed so that URI alone encodes interaction; allows linking
Safe/unsafe distinction in protocol enables user agent support
Requests with no side-effects enable caching of results:
- At global networking scale, require fast data exchange
- Use caching to improve performance (see HTTP/1.1 study)
- Design protocols that support caching

8 Apr 2002: Dan Connolly receives assignment to write strawman proposal. This evolves into a draft finding.

TAG Coordinates to Build Consensus

4 May 2002: Making connection to Web Services, David Orchard proposes SOAP HTTP GET Binding Version 0.1
3 Jun 2002: David Orchard receives assignment to request of XMLP Working Group that SOAP 1.2 (then a Working Draft) HTTP binding include GET method.
10 Jun 2002: TAG approves finding URIs, Addressability, and the use of HTTP GET.
10 Nov 2002: TAG announces agreement regarding use of GET.

Groups Document Consensus

24 Jun 2003: SOAP Version 1.2 Part 2: Adjuncts becomes a W3C Recommendation, with GET as part of HTTP binding (see section 4.1.2)
22 Sep 2003: TAG accepts revised finding URIs, Addressability, and the use of HTTP GET and POST
14 Oct 2003: XForms 1.0 becomes a W3C Recommendation, with support for HTTP GET.
9 Dec 2003: Architecture Document to Last Call with discussion of safe interactions in section 3.5

Ongoing: Marking safe operations in WSDL

What type of information is in the Architecture Document, Findings?

Properties we desire of the Web, and
Design choices to achieve them.

Example related to previous issue:

Principles, Constraints, Good Practice Notes

Agents do not incur obligations by retrieving a representation ("GET is safe").

Rationale

Benefits of URI addressability: linking, bookmarking, caching
Benefits of distinction in protocols of safe/unsafe: user agent alerts, caching

Stories and examples

Examples of safe (lookup) and unsafe (credit card purchase) interactions
Considerations for sensitive data
Practical considerations, ephemeral limitations

Architecture Tripod

Identification
Interaction
Representation

Identification I: Why URIs?

Value of common syntax for global identifiers:

"Great multiplicative power of reuse derives from the fact that all languages use URIs as identifiers: This allows things written in one language to refer to things defined in another language. The use of URIs allows a language to leverage the many forms of persistence, identity, and various forms of equivalence." -- URIs, Addressability, and the use of HTTP GET and POST

Power in the network effect.
Extension through URI schemes

Identification II: URI Usage

Comparison. Key to Semantic Web, caches
Dereference. Discussed below in Interactions

Due to global scope, URIs also used outside of Web protocols (e.g., as database keys).

Interaction I: Dereferencing a URI

Communication between agents involves URIs, messages, data
Dereference a URI, get back a representation of resource state
Representation consists of representation data and metadata (e.g., Internet Media Type).

Interaction II: Dereferencing a URI (illustration)

A resource (Oaxaca Weather Info) is
identified by a particular URI and is represented by pseudo-HTML
content

Interaction III: Managing Representations

Internet Media Type
Representations evolve over time as resource, technology evolves
Consistency in representation increases trust in URI
Content negotiation facilitates evolution
Fragment identifier semantics and content negotiation

Interaction IV: Issues Raised by Interaction

Safe, unsafe interactions
Sensitive data
Access control independent of identification (cf. Deep Linking Finding)
Metadata from representation provider authoritative. Other behavior ok, but requires transparency for user.

Representation: Data formats

Data formats used to organize representation data
Data format considerations: binary v. text, extensibility, versioning, composition, modularization
Hypertext
XML-based data formats: links, namespaces, qnames, media types

Architecture Review of US Patent Office Web Site

Review of the United States Patent and Trademark Office revealed:

HTTP GET used for database lookup (good)
HTTP GET used for unsafe interactions (not good)
URI for patent is actually URI for search (not optimal)
POST used to protect sensitive login data (design choice)

HTTP GET used for database lookup

HTML "GET" form used for database lookup:

   <form action="/netacgi/nph-Parser" method="GET">

Use GET for queries, searches, database lookups.

HTTP GET used for unsafe interactions (not good)

Modifying state of shopping cart is unsafe since produces side-effect:

Search with keyword "hypertext"
Select Patent 6,678,889: "Systems, methods and computer program ...."
Add to cart, view "Quantity" (1)
Hit back button, add to cart, view "Quantity" (2)

"Add to Cart" an HTML link:

   <a href=".../AddToShoppingCart?docNumber=6,678,889...">...

I cannot link to shopping cart from this slide; a search engine or pre-fetching agent might increment counter (cf. SVG 1.2, section 11.8.

In HTML, use "POST" form for unsafe operations.

URI for patent is actually URI for search (not optimal)

What might a URI for a patent look like?

   http://www.uspto.gov/patents/p6678889

Note that this is globally unambiguous; better than "6678889"

Search produced this URI for search on "hypertext":

   http://patft.uspto.gov/...s1=hypertext&OS=hypertext...

Search produced this URI for search by patent number 6,678,899:

   http://patft.uspto.gov/...s1=6,678,889.WKU....

Why are these URIs different if this is the same patent?

Cost of Arbitrarily Different URIs

At first, I thought these URIs were arbitrarily different URIs for the same resource. If so, machines cannot compare reliably, so:

Interferes with caching
Semantic Web does not work
Site management more complex

Identify Results of Search, not Search

Resource only indirectly identified as query result.

My expectation is not to bookmark search, but result of search
Search might return different results another day; I want to refer to the patent.

POST used to protect sensitive login data (design choice)

GET allows URIs (bookmarking, back button), but we don't want sensitive data in URI.
Choices include:
- GET with HTTP Basic Authentication over SSL: Sensitive data in HTTP headers, so allows bookmarking. User agent manages passwords.
- POST over SSL
However, cost to SSL as well

Think about these architecture issues, tradeoffs during design! See URIs, Addressability, and the use of HTTP GET and POST

Future work

Issues the TAG has not resolved for First Edition
Other systems that make use of URI space: Web Services, Semantic Web
Internationalized URIs (IRIEverywhere-27)
XML canonicalization
Binary XML (binaryXML-30)
Mixing XML Namespaces (mixedUIXMLNamespace)

Questions

Review period for Architecture of the World Wide Web open until 5 March 2004; see Call for Review.

Last modified: $Revision: 1.61 $