Web Architecture

Dan Connolly
Park University Department of Information and Computer Science
26 Oct 2005

with thanks to Ian Jacobs for his xml.gov presentation

postscript: The Park ACM club is hosting a archived video stream of the presentation.

$Revision: 1.4 $ of $Date: 2005/10/28 17:30:21 $

Overview

  1. W3C's Technical Architecture Group (TAG)
  2. Overview of Architecture of the World Wide Web
  3. Architecture Review of US Patent Office Web Site

W3C's Technical Architecture Group

W3C's Technical Architecture Group was chartered in 2001:

"to document and build consensus around principles of Web architecture and to interpret and clarify these principles when necessary."

Roles: write, coordinate, mediate

TAG Participants

TAG participants elected and appointed at our September 2002 meeting in Vancouver:

TAG, posing in front of same motorbike as in Sep 2002, in Vancouver
  1. Norman Walsh, Sun Microsystems. Docbook guy
  2. Paul Cotton, Microsoft. fulltext industry
  3. Chris Lilley, W3C. SVG lead
  4. David Orchard, BEA. Web Services.
  5. Roy Fielding. HTTP spec editor. REST thesis
  6. Tim Berners-Lee, W3C Director
  7. Stuart Williams, HP
  8. Dan Connolly, W3C
  9. Tim Bray, Sun. XML 1.0 co-editor
  10. Ian Jacobs, W3C. Tech writer

TAG Participants, all dressed up

TAG members in ties!
  1. Henry Thompson, W3C and U. Edinburgh. XML Schema editor
  2. Norman Walsh
  3. David Orchard
  4. Vincent Quint, INRIA
  5. Tim Berners-Lee
  6. Dan Connolly
  7. Roy Fielding
  8. Noah Mendelsohn, IBM

Why an Architecture Document?

Community Brings Issues to TAG

Teleconference and mailing list discussions ensue.

TAG Explores Problem Space

What makes HTTP GET important?

8 Apr 2002: Dan Connolly receives assignment to write strawman proposal. This evolves into a draft finding.

TAG Coordinates to Build Consensus

Groups Document Consensus

Ongoing: Marking safe operations in WSDL

Negotiation tactics

It's a REC! Party!

Connollys on the Plaza

What type of information is in the Architecture Document, Findings?

Example related to previous issue:

Principles, Constraints, Good Practice Notes
  • Agents do not incur obligations by retrieving a representation ("GET is safe").
Rationale
  • Benefits of URI addressability: linking, bookmarking, caching
  • Benefits of distinction in protocols of safe/unsafe: user agent alerts, caching
Stories and examples
  • Examples of safe (lookup) and unsafe (credit card purchase) interactions
  • Considerations for sensitive data
  • Practical considerations, ephemeral limitations

Architecture Tripod

  1. Identification
  2. Interaction
  3. Representation

Identification I: Why URIs?

Value of common syntax for global identifiers:

"Great multiplicative power of reuse derives from the fact that all languages use URIs as identifiers: This allows things written in one language to refer to things defined in another language. The use of URIs allows a language to leverage the many forms of persistence, identity, and various forms of equivalence." -- URIs, Addressability, and the use of HTTP GET and POST

Identification II: URI Usage

Due to global scope, URIs also used outside of Web protocols (e.g., as database keys).

Interaction I: Dereferencing a URI

Interaction II: Dereferencing a URI (illustration)

A resource (Oaxaca Weather Info) is
identified by a particular URI and is represented by pseudo-HTML
content

Interaction III: Managing Representations

Interaction IV: Issues Raised by Interaction

Representation: Data formats

Architecture Review of US Patent Office Web Site

Review of the United States Patent and Trademark Office revealed:

  1. HTTP GET used for database lookup (good)
  2. HTTP GET used for unsafe interactions (not good)
  3. URI for patent is actually URI for search (not optimal)
  4. POST used to protect sensitive login data (design choice)

HTTP GET used for database lookup

HTML "GET" form used for database lookup:

   <form action="/netacgi/nph-Parser" method="GET">

Use GET for queries, searches, database lookups.

HTTP GET used for unsafe interactions (not good)

Modifying state of shopping cart is unsafe since produces side-effect:

"Add to Cart" an HTML link:

   <a href=".../AddToShoppingCart?docNumber=6,678,889...">...

I cannot link to shopping cart from this slide; a search engine or pre-fetching agent might increment counter (cf. SVG 1.2, section 11.8.

In HTML, use "POST" form for unsafe operations.

URI for patent is actually URI for search (not optimal)

What might a URI for a patent look like?

   http://www.uspto.gov/patents/p6678889

Note that this is globally unambiguous; better than "6678889"

Search produced this URI for search on "hypertext":

   http://patft.uspto.gov/...s1=hypertext&OS=hypertext...

Search produced this URI for search by patent number 6,678,899:

   http://patft.uspto.gov/...s1=6,678,889.WKU....

Why are these URIs different if this is the same patent?

Cost of Arbitrarily Different URIs

At first, I thought these URIs were arbitrarily different URIs for the same resource. If so, machines cannot compare reliably, so:

Identify Results of Search, not Search

Resource only indirectly identified as query result.

Related in Architecture Document:

POST used to protect sensitive login data (design choice)

Think about these architecture issues, tradeoffs during design! See URIs, Addressability, and the use of HTTP GET and POST

Future work

Questions?

For more info: