
Author: Joseph Reagle
Audience: 4.297
Question: Basic Description of Web Architectural Principles
References:
Joseph M. Reagle Jr.
<reagle@w3.org>
Web Architecture in 1 line
http://www.w3.org/2001/Talks/0223-web-architecture/all.htm#WebArchitecture
- When TimBl invented the web he wrote three things: HTTP, HTML and the URI. The URI was
the most important thing because it allowed nearly everything to relate to nearly
everything else, where:
- scheme://host/path/resource.extension#fragment
http is a scheme
- "Just as there are many different methods of access to resources, there are a
variety of schemes for identifying such resources." [RFC2396]
Few schemes which are centrally allocated by IETF/IANA.
- www.w3.org is a Domain Name and corresponds to an
Internet address
- this name is allocated via hierarchical delegation
- the .org registrar gave us w3, and we created www. (www is not required, merely a common
convention. info.cern.ch was the first web host)
- many, many, debates about who controls this system.
/2001/Talks/0223-web-architecture/ is a path
- this frequently corresponds to a file system path, but doesn't have to as you can
configure the server to redirect or serve anything for a given URI.
- the host determines this an usually isn't that contentious though many organizations
have strict publishing requirements internally.
all.htm is a resource
- Typically there is some resource that corresponds to a URI, that is dereferenced and
used.
- This example happens to be an HTML file; it could be a GIF, JPEG, flash, or mpeg file
(these are registered as MIME types).
- HTML is an application of SGML: SGML is a language for allowing others to create markup
applications. SGML tells you how to create an element or attribute: HTML is a set of
specific elements and attributes with a meaning like <H1>foo<H1> is a heading.
- XML is a simplification and extension of SGML that has an initial low entry cost, and
more complex, but cool, functionality.
#WebArchitecture is a fragment
- A fragment doesn't mean anything to the scheme, it's a function of the resource's MIME
type.
- HTTP sends the whole resource even if the user only wants to look at the fragment.
- The browser, knowing it is an HTML file, does the appropriate thing, in this case pages
to the correct portion of the resource.
Why are these things important?
- Does the design lead to centralized assignment, or distributed flexibility? If someone
doesn't like your site, could your registrar take it away from you? (What are the
policy/scalability implications?)
- What happens if someone doesn't do the design correctly? Have you ever tried to bookmark
part of a Flash site? You can't.
- Interests try to bias the architecture to their own interest. New York city planner
Robert Moses designed his roads and over-passes so as to exclude the 12-foot high public
transit buses that carried people -- often poor or of color -- to the parks and beaches he
also designed. AvantoGo uses a conduit to make PDA accessible content unavailable on the
Web. (Give me the URL to the palm accessible New York Times? You probably can't find it.
Balkanization of the Web serves their interest.)
- Simplicity -- "Keep it simple, stupid!"
- Module Design
- Tolerance -- "Be liberal in what you require but conservative in what you do"
- Decentralization
- Test of Independent Invention -- If someone else had already invented your system, would
theirs work with yours?
- Principle of Least Power
- Universality -- Any
resource anywhere can be given a URI
- Global uniqueness -- It
doesn't matter to whom or where you specify that URI, it will have the same
- Sameness -- a URI will
repeatably refer to "the same" thing (does this mean the NYTimes front page, or
a specific article in their archive?)
- Identity -- the
significance of identity for a given URI is determined by the person who owns the URI, who
first determined what it points to.
- Not a unique space, just
universal -- URI space does not have to be the only universal space
Co-existing/Competing Concepts
Simple axioms, but great philosophical wars over the details:
- Universal Resource Identifier (URI) --
includes all of the following. Your identity comes from the fact that you a unique
identifier: 02139-0405
- Uniform Resource Locators (URL)
-- a location that can be dereferenced. Your identity come from your characteristics of
your "location": Main St., Technology Square.
- Uniform Resource Names (URN) --
" are intended to serve as persistent, location-independent, resource
identifiers." Your identity comes from an allocated name by a recognized authority:
NE43-350.
- Uniform Resource Characteristics (URC) -- Your probabilistic identify comes from your
characteristics: tall ugly building in the middle of all the construction.