Copyright © 2002-2003 W3C ® ( MIT , ERCIM , Keio ), All Rights Reserved. W3C liability , trademark , document use and software licensing rules apply. Your interactions with this site are in accordance with our public and Member privacy statements.
The World Wide Web is a networked information system. Web Architecture consists of the requirements, constraints, principles, and choices that influence the design of the system and the behavior of agents within the system. When Web Architecture is followed, the large-scale effect is that of an efficient, scalable, shared information space. The organization of this document reflects the three divisions of Web architecture: identification, representation, and interaction. This document also addresses some non-technical (social) issues that play a role in building the shared information space.
This document strives to establish a reference set of requirements, constraints, principles, and design choices for Web architecture.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. The latest status of this document series is maintained at the W3C.
This document has been developed by W3C's Technical Architecture Group (TAG) ( charter ).
The primary changes in this draft are the incorporation of suggestions from Dan Connolly editorial revisions to sections 2 and Stuart Williams 3 based on section 2, new text comments from Norm Walsh in section 3.2.4, Dan Connolly and the first paragraph in section 4 from Roy Fielding. Tim Bray. A complete list of changes is available on the Web.
This draft remains incomplete; sections 1, 2, and 3 are the most developed; 4 the least. The TAG has published a number of findings that address specific architecture issues. Parts of those findings may appear in subsequent drafts. Please also consult the list of issues under consideration by the TAG.
This draft includes some editorial notes and also references to open TAG issues . These do not represent all open issues in the document. They are expected to disappear from future drafts.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than "work in progress."
The latest information regarding patent disclosures related to this document is available on the Web. As of this publication, there are no disclosures.
Please send comments on this document to the public W3C TAG mailing list www-tag@w3.org ( archive ).
A list of current W3C Recommendations and other technical documents can be found at the W3C Web site.
Highlighted entries in this table of contents link to principles, constraints, good practice notes, and design choices emphasized in the document.
The World Wide Web (or, Web) is a networked information system consisting of agents (programs acting on behalf of a person, entity, or process) that exchange information. Here's a simple travel scenario illustrating a common Web interaction:
http://weather.example.com/oaxaca
"
in
a
glossy
travel
magazine.
Dan
has
enough
experience
with
the
Web
to
recognize
that
http://weather.example.com/oaxaca
is
a
URI.
He
can
expect
that
the
URI
should
allow
him
to
access
relevant
weather
information.
weather.example.com
.
This scenario illustrate the three architectural divisions of the Web that are discussed in this document:
http://weather.example.com/oaxaca
.
deleted text: Throughout this document, we elaborate on this travel scenario to introduce and illustrate architectural principles. </p> <p> <span class="ednote"> Editor's note </span>: The scenario has not yet been well-integrated into sections 3 and 4. </p> <p> Editor's note : Todo: Introduce notions of client and server. Relation of client to agent and user agent. Relation of server to resource owner.
The intended audience for this document includes:
This document is designed to balance the value of brevity and precision with the value of illustrative examples. TAG findings provide more background, motivation, and examples.
Readers will benefit from familiarity with the Requests for Comments ( RFC ) series from the IETF , some of which define pieces of the architecture discussed in this document.
The architecture described in this document is principally the result of experience. There has been some theoretical and modeling work in the area of Web Architecture, notably Roy Fielding's work on "Representational State Transfer" [ REST ].
deleted text: <div class="section"> <h4> 1.1.1. <a shape="rect" name="summaries" id="summaries"> About Properties, Constraints, Principles, and Good Practice Notes </a> </h4>The terms MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY are used in accordance with RFC 2119 [ RFC2119 ].
The TAG intends for this document to inform discussions about issues of Web Architecture. Where current practice conflicts with Throughout this document, we elaborate on the TAG expects to engage in constructive discussion with other parties. Some parts of this document may fill in gaps in published specifications or may call attention travel scenario to known weaknesses in those specifications. introduce and illustrate architectural principles.
This document promotes reuse of existing standards when suitable, Editor's note : The scenario has not yet been well-integrated into sections 3 and gives some guidance on how to innovate in a manner consistent with the Web architecture. 4.
deleted text: </div>This document focuses on the architecture of the Web. The authors We assume the reader is familiar with the rationale for some of the general design principles: minimal constraints (fewer rules makes the system more flexible), modularity, minimum redundancy, extensibility, simplicity, and robustness.
Other groups inside and outside W3C are writing down principles related to specialized aspects of Web architecture, including accessibility, internationalization, device independence, and Web Services. The section on Architectural Specifications includes some references.
deleted text: </div> </div> </div> <div class="section"> <h2> 2. <a shape="rect" name="identification" id="identification"> Identification and Resources </a> </h2>The TAG intends for this document to inform discussions about issues of Web architecture starts with Architecture. Where current practice conflicts with this document, the TAG expects to engage in constructive discussion with other parties. Some parts of this document may fill in gaps in published specifications or may call attention to known weaknesses in those specifications.
This document promotes reuse of existing standards when suitable, and gives some guidance on how to innovate in a manner consistent with the Web architecture.
Web architecture starts with Uniform Resource Identifiers (URI), defined by "Uniform Resource Identifiers (URI): Generic Syntax" [ URI ]. The Web relies on a worldwide agreement Parties who wish to follow communicate about something will establish a shared vocabulary, i.e. a shared set of bindings between identifiers and things. This shared vocabulary has a tangible value: it reduces the rules cost of URIs so that we can refer communication. The ability to things on the Web, access them, describe them, and share them. use common identifiers across communities is what motivates global naming in Web Architecture.
URIs identify resources. When a representation of one resource refers to another resource with a URI, a link is formed between the two resources. The networked information system is built of linked resources, and the large-scale effect is a shared information space. The value of the Web increases geometrically with grows exponentially as a function of the number of linked resources (the "network effect").
Principle
Use URIs: All important resources SHOULD be identified by a URI. 3
There are many benefits to making resources identifiable by URI. Some are by design (e.g., linking, bookmarking, and caching), deleted text: while others deleted text: have arisen naturally (e.g., global search services). services) were not predicted. 4
deleted text: Parties who wish to communicate about something will establish a shared vocabulary, i.e. a shared set of bindings between identifiers and things. This shared vocabulary has a tangible value: it reduces the cost of communication. The ability to use common identifiers across communities is what motivates global naming in Web Architecture. </p> <p> An important aspect of communication is to be able to establish when two parties are talking about the same thing. On In the context of the Web, if this means when two parties use identify the same URI, the resource. The most common way to establish that two parties are referring to identifying the same resource. resource is to compare the spelling (i.e., as strings) of the identifiers the parties are using. Section 6 of [ URI ] discusses URI comparison and the this type of analysis. In that specification, determination of deleted text: <a name="def-uri-equivalence" id="def-uri-equivalence"> <dfn> URI equivalence </dfn> </a>. In general, or difference of URIs is based on string comparison, perhaps augmented by reference to additional rules for determining URI equivalence come from more than one specification. Usually these specifications include the URI specification, a <a shape="rect" href="#URI-scheme"> provided by URI scheme </a> specification, and a protocol specification. definitions (e.g., for HTTP URIs, the authority component is case-insensitive). Depending on the application, an agent may invest more processing effort to reduce the likelihood of a false negative (i.e., two URIs are equivalent identify the same resource, but that was not detected).
There may be other ways to establish that two parties are identifying the same resource that are not based on string comparison; see the section on future directions for determining that two URIs identify the same resource .
Editor's note : Dan Connolly has suggested the term "coreference" instead of "equivalence" to communicate that two URIs are referring to the same resource.
Agents
that
reach
conclusions
about
identity
beyond
what
they
are
licensed
to
do
(e.g.,
by
specification,
or
community
convention,
or
site-specific
convention)
take
responsibility
for
any
problems
that
result.
For
instance,
agents
should
not
assume
that
http://weather.example.com/Oaxaca
and
http://weather.example.com/oaxaca
identify
the
same
resource,
since
none
of
the
specifications
involved
states
that
the
path
part
of
an
HTTP
URI
is
case-insensitive.
Web
servers
may
vary
in
how
they
are
configured
to
handle
case-sensitivity.
Agents
that
assume
these
URIs
identify
the
same
resource
take
responsibility
for
any
resulting
problems.
Although it is possible to determine that two URIs are equivalent, it is generally not possible by mere inspection of two URIs to be sure that they identify different resources. Web architecture does not constrain resources to be uniquely named.
Good practice
Spelling URIs: If an agent has been provided with a URI to refer to a resource, the agent SHOULD use the spelling of the URI as it was originally provided.
To
help
parties
know
when
they
are
referring
to
the
same
resource,
it
follows
that
URI
producers
should
be
conservative
about
the
number
of
different
URIs
they
produce
for
the
same
resource.
For
instance,
the
parties
responsible
for
weather.example.com
have
no
reason
to
use
both
http://weather.example.com/Oaxaca
and
http://weather.example.com/oaxaca
to
refer
to
the
same
resource;
agents
will
not
detect
the
equivalence
relationship.
In
this
case,
one
URI
should
be
chosen
and
used
consistently.
See
section
6.3
of
[
URI
]
for
further
advice
on
how
to
reduce
the
risk
of
false
negatives.
URI
consumers
cannot,
in
general,
determine
the
meaning
of
a
resource
by
inspection
of
a
URI
that
identifies
it.
In
our
travel
scenario
,
the
example
URI
(
http://weather.example.com/oaxaca
)
suggests
that
the
identified
resource
has
something
to
do
with
the
weather
in
Oaxaca.
Although
short,
meaningful
URIs
benefit
people,
URI
consumers
must
not
rely
on
the
URI
string
to
communicate
the
meaning
of
a
resource.
A
site
reporting
the
weather
in
Oaxaca
could
just
as
easily
be
identified
by
the
URI
http://vjc.example.com/315
.
And
the
URI
http://weather.example.com/vancouver
might
identify
the
resource
"my
photo
album."
See
the
section
on
retrieving
a
representation
for
information
about
how
the
meaning
of
a
resource
is
conveyed.
Editor's note : When finding available on URI opacity, link from here.
In
the
URI
http://weather.example.com/
,
the
"http"
that
appears
before
the
colon
(":")
is
a
URI
scheme
name.
There
are
other
scheme
names,
such
as
"mailto"
and
"ftp".
It
is
common
to
classify
URIs
by
scheme;
a
URI
with
scheme
"http"
is
called
an
"HTTP
URI."
Each URI begins with a URI scheme name. The scheme name that refers corresponds to a specification for assigning identifiers within that scheme. As such, the URI syntax is a federated and extensible naming system wherein each scheme's specification may further restrict the syntax and semantics of identifiers using that scheme. Furthermore, the URI scheme specification suggests specifies how an agent can dereference the URI .
Several URI schemes incorporate identification mechanisms that pre-date the Web into this syntax:
mailto:nobody@example.org
ftp://example.org/aDirectory/aFile
news:comp.infosystems.www
tel:+1-816-555-1212
Other URI schemes have been introduced since the advent of the Web, including those introduced as a consequence of new protocols. Examples of URIs for these schemes include:
http://www.example.org/something?with=arg1;and=arg2
ldap://ldap.itd.umich.edu/c=GB?objectClass?one
urn:oasis:SAML:1.0
The Internet Assigned Numbers Authority ( IANA ) maintains a registry [ IANASchemes ] maintains the of mapping between URI scheme names and their specifications. For instance, the IANA registry indicates that the "http" scheme is defined by [ RFC2616 ]. The process for registration of new URI schemes is defined by RFC2717 .
Since many aspects of URI processing are scheme-dependent, and since a huge range amount of deployed software already processes URIs of well-known schemes, the cost of introduction of new URI schemes is high. We note in passing that even more expensive than introducing a new URI scheme is introducing a new identification mechanism for the Web; this is considered prohibitively expensive.
Good practice
New URI schemes: Authors of specifications SHOULD avoid introducing new URI schemes when existing schemes can be used to meet the goals of the specifications.
Consider
our
travel
scenario
:
should
the
authority
providing
information
about
the
weather
in
Oaxaca
register
a
new
URI
scheme
"weather"
for
the
identification
of
resources
related
to
the
weather?
They
might
then
publish
URIs
such
as
weather://travel.example.com/oaxaca
.
While
the
Web
Architecture
allows
the
definition
of
new
schemes,
there
is
a
cost
to
registration
and
especially
deployment
of
new
schemes.
When
an
agent
dereferences
such
a
URI,
if
what
really
happens
is
that
HTTP
GET
is
invoked
to
retrieve
an
HTML
representation
of
the
resource,
then
an
HTTP
URI
would
have
sufficed.
If
a
URI
scheme
exists
that
meets
the
needs
of
an
application,
designers
should
use
it
rather
than
invent
one.
Furthermore,
designers
should
expect
that
it
will
prove
useful
to
be
able
to
share
a
URI
across
applications,
even
if
that
utility
is
not
initially
evident.
If the motivation behind registering a new scheme is to allow an agent to launch a particular application when retrieving a representation, such dispatching can be accomplished at lower expense by registering a new MIME type instead. Reasons for this include:
Editor's note : When finding available based on Tim Bray's discussion of this topic, link from here.
The use of unregistered URI schemes is discouraged for a number of reasons:
In
the
URI
http://weather.example.com/
,
the
string
weather.example.com
(between
"//"
and
the
next
"/")
called
the
authority
component.
Many
URI
schemes
include
a
hierarchical
element
for
a
naming
authority
such
that
governance
of
the
name
space
defined
by
the
remainder
of
the
URI
is
delegated
to
that
authority
(which
may,
in
turn,
delegate
it
further).
The
generic
syntax
provides
a
common
means
for
distinguishing
an
authority
based
on
a
registered
domain
name
or
server
address.
See
section
3.2
of
[
URI
]
for
more
information
about
the
authority
portion
of
a
URI.
How authority is delegated depends on the URI scheme. The deployment and use of different URI schemes may require varying degrees of central coordination and administration. For example, MAILTO, FTP, and HTTP URIs depend deleted text: (in practice at least) on the use of the DNS and IANA infrastructure; see "ICP-1: Internet Domain Name System Structure and Delegation" [ IANAICP1 ] for more information about how the IANA manages delegation of domain names. deleted text: For information about work on a central registry for URNs, see the various specifications for the "Dynamic Delegation Discovery System (DDDS)," starting with [ <a shape="rect" href="#RFC3401"> RFC3401 </a> ].
Successful communication between two parties about a piece of information relies on shared understanding of the meaning of the information. On the Web, thousands of independent parties can identify and communicate about a Web resource. To give these parties the confidence that they are all talking about the same thing when they refer to "the resource identified by the following URI ..." the design choice for the Web is, in general, that the owner of a resource assigns its authoritative meaning and the URIs that refer to it. See the draft TAG finding " Client handling of MIME headers" for related discussion.
In
our
travel
scenario
,
the
agent
responsible
for
weather.example.com
has
license
to
assign
the
meaning
of
the
resource
and
to
create
the
authoritative
representations
of
this
resource.
In
our
travel
scenario
the
server
returns
an
XHTML
representation
when
Dan
dereferences
the
URI
http://weather.example.com/oaxaca
.
Then,
by
navigating
within
the
XHTML
content,
Dan
finds
that
the
URI
http://weather.example.com/oaxaca#tom
refers
to
information
about
tomorrow's
weather
in
Oaxaca.
This
URI
includes
the
fragment
identifier
"tom"
(the
string
after
the
"#").
The fragment identifier component of a URI deleted text: (the string after the "#") allows indirect identification of a secondary resource resource, by reference to a primary resource and additional identifying information that is selective with respect to that resource. The identified secondary resource may be some portion or subset of the primary resource, some view on representations of the primary resource, or some other resource that is merely named with respect to the primary resource. deleted text: Like the primary resource, the meaning of the secondary resource is determined by the authority responsible for the URI.
Although the generic URI syntax allows any URI to end with a fragment identifier, some URI schemes do not specify the use of fragment identifiers. For instance, fragment identifier semantics are not defined for MAILTO URIs. </p> <p> For URI schemes that do support fragment identifiers, the semantics of a fragment identifier are defined by the set of representations that might result from a retrieval action on the primary resource. The fragment's format and resolution is therefore dependent on the media type [ <a shape="rect" href="#RFC2046"> RFC2046 </a> ] of a retrieved representation, even though such a retrieval is only performed if the URI is dereferenced. The presence of a fragment identifier component in a URI does not imply that a retrieval action will take place. </p> <p> Fragment identifier semantics may differ among formats. For instance, in XHTML, fragment identifiers refer to hypertext anchors specified by the <code> id </code> attribute. In the Resource Description Framework [ <a shape="rect" href="#RDF10"> RDF10 </a> ], fragments refer to the subject of RDF description. </p> <p> Suppose that in our <a shape="rect" href="#scenario"> travel scenario </a> the server returns an XHTML representation when Dan dereferences the URI <code> http://weather.example.com/oaxaca </code>. Then, by navigating within the XHTML content, Dan finds that the URI <code> http://weather.example.com/oaxaca#t34 </code> refers to information about the temperature in Oaxaca. The resource owner has appropriately made fragment identifier semantics are not specified for MAILTO URIs.
For URI schemes that do specify the meaning use of fragment identifiers, the secondary resource consistent with syntax and semantics of those identifiers is defined by the meaning set of representations that might result from a retrieval action on the primary resource. The presence of a fragment identifier component in a URI does not imply that a retrieval action will take place.
Interpretation of the fragment identifier during a retrieval action is performed solely by the user agent; the fragment identifier is not passed to other systems during the process of retrieval. Thus, there are This means that some deleted text: advantages to using a URI without a fragment identifier: only URIs without fragment identifiers work with intermediaries in the Web architecture (e.g., proxies) or with have no effect on fragment identifiers and that redirection (in HTTP [ RFC2616 ], for example). example) does not account for them.
Suppose
that
the
managers
of
weather.example.com
provide
a
visual
map
of
the
meteorological
conditions
in
Oaxaca
as
part
of
the
representation
served
for
http://weather.example.com/oaxaca
.
They
might
encode
the
same
visual
map
in
a
number
of
image
formats
to
meet
different
needs
(e.g.,
they
might
serve
PNG,
SVG,
and
JPEG/JFIF).
Dan's
user
agent
and
the
server
engage
in
HTTP
content
negotiation,
so
that
Dan
receives
the
best
image
format
his
user
agent
can
handle.
The
URI
http://weather.example.com/oaxaca/map#zicatela
refers
to
a
portion
of
the
weather
map
that
shows
the
Zicatela
Beach,
where
Dan
intends
to
go
surfing.
This
URI
makes
sense
for
the
SVG
representation,
since
SVG
defines
fragment
identifier
semantics.
However,
the
URI
does
not
make
sense
for
the
PNG
and
JPEG/JFIF
representations;
those
specifications
do
not
define
fragment
identifier
semantics.
Good practice
Content negotiation with fragments: Authors SHOULD NOT use HTTP content negotiation for different media types that do not share the same have incompatible fragment identifier semantics.
Given a URI, a system may attempt to perform a variety of operations on the resource, as might be characterized by such words as "access", "update", "replace", or "find attributes". Such operations are defined by the formats and protocols that make use of URIs. The URI specification (in [ URI ], section 1.2.2) defines the following terms related to interactions through a URI.
During URI resolution, an agent applies in succession a finite set of relevant specifications, beginning with the specification of the context in which the URI is found (e.g., a format or protocol specification, or an application). Any one of these specifications may define more than one access mechanism (e.g., the HTTP protocol defines a number of access methods, including GET, HEAD, and POST). Note that the information governing the choice of access mechanism may be found in the context, not the URI itself (e.g., the choice of HTTP GET v. HTTP HEAD). The draft TAG finding " URIs, Addressability, and the use of HTTP GET and POST." discusses issues surrounding multiple access mechanisms and the relation to URI addressability.
Some URI schemes do not define dereference mechanisms. The (e.g., the URN scheme [ RFC 2141 ] does ]) do not specify a dereference procedure, although [ <a shape="rect" href="#RFC3401"> RFC 3401 </a> ] describes a define dereference system for URNs. mechanisms.
TAG issue metadataInURI-31 : Should metadata (e.g., versioning information) be encoded in URIs?
TAG issue siteDate-26 : Web site metadata improving on robots.txt, w3c/p3p and favicon etc.
One of the most important actions on the Web is to retrieve a representation of a resource (for example, by using HTTP GET). As stated above, the authority responsible for a URI determines what the URI identifies and which representations are used for interaction with the resource. The representations communicate the meaning of the resource.
Good practice
Resource descriptions: Owners of important resources SHOULD make available representations that communicate the meaning of those resources.
As
an
example
of
dereferencing
a
URI
to
retrieve
a
representation,
representation
retrieval,
suppose
that
the
URI
http://weather.example.com/oaxaca
http://weather.example.com/budapest
is
used
within
an
a
element
of
an
SVG
document.
The
sequence
of
specifications
applied
is:
a
link
involves
retrieving
a
representation
of
a
resource,
identified
by
the
XLink
href
attribute:
"By
activating
these
links
(by
clicking
with
the
mouse,
through
keyboard
input,
and
voice
commands),
users
may
visit
these
resources."
xlink:href
is
defined
in
section
5.4
of
the
XLink
1.0
[
XLink10
]
specification
states
that
"The
value
of
the
href
attribute
must
be
a
URI
reference
as
defined
in
[IETF
RFC
2396],
or
must
result
in
a
URI
reference
after
the
escaping
procedure
described
below
is
applied."
Note
that,
in
general,
one
cannot
determine
the
media
type(s)
of
representation(s)
of
a
resource
by
inspecting
a
URI
for
that
resource.
For
example,
do
not
assume
that
a
URI
that
ends
with
all
representations
of
http://example.com/page.html
are
HTML.
The
HTTP
protocol
does
not
constrain
the
string
".html"
refers
media
type
based
on
the
path
component
of
the
URI;
the
server
is
free
to
return
a
resource
that
has
an
HTML
PNG
image
representation.
Dan's retrieval of weather information qualifies as a "safe" interaction; a safe interaction is one that where the user agent does not cause a change commit to anything beyond the state of a resource; it interaction and is not responsible for any consequences other than the interaction itself (e.g., a read-only query or lookup. lookup). Other Web interactions resemble orders more than queries. These unsafe interactions may cause a change to the state of a resource. resource; the user may be held responsible for the consequences of these interactions. Unsafe interactions include subscription services, posting to a list, or modifying a database.
Safe interactions are important because these are interactions where users can browse with confidence and where software programs (e.g., search engines and browsers that pre-cache data for the user) can follow links safely. Users (or software agents acting on their behalf) do not commit themselves to anything by querying a resource or following a link.
Principle
Safe retrieval: Agents do not incur obligations by retrieving a representation.
For
instance,
suppose
in
our
travel
scenario
that
the
managers
of
weather.example.com
offer
a
monthly
newsletter
available
by
subscription.
It
is
incorrect
and
harmful
to
publish
a
page
that
states
"By
following
this
link,
you
can
subscribe
to
the
Oaxaca
Newsletter
and
you
also
indicate
page
http://example.com/oxaca/aboutNewsLetter
that
you
agree
to
the
following
states
"...
terms
and
conditions..."
This
approach
does
not
account
for
the
fact
that
anyone
(in
particular,
with
a
link
to
http://example.com/oxaca/newsLetter
because
search
service
or
a
proxy)
can
follow
this
services
may
link
(or
another
one
elsewhere
with
the
same
URI),
directly
to
http://example.com/oxaca/newsLetter
and
anyone
who
follows
those
readers
that
follow
such
links
may
never
not
have
seen
seen,
let
alone
agreed
to,
the
terms
and
conditions.
For more information about safe and unsafe operations using HTTP GET and POST, and handling security concerns around the use of HTTP GET, see the draft TAG finding " URIs, Addressability, and the use of HTTP GET and POST."
The value of a URI increases with the predictability of interactions using that URI.
Good practice
URI Persistence: Parties responsible for a URI SHOULD service that URI predictably and consistently.
Service breakdowns include:
There are strong social expectations that once a URI identifies a particular resource, it should continue indefinitely to refer to that resource; this is called URI persistence . URI persistence is always a matter of policy and commitment on the part of authorities servicing URIs rather than a constraint imposed by technological means.
URI
persistence
also
improves
when
ambiguity
is
removed
about
what
the
URI
identifies.
For
instance,
saying
that
the
URI
http://www.example.com/moby
identifies
"Moby
Dick"
can
lead
to
confusion
because
this
might
be
interpreted
as
any
one
of
the
following
very
distinct
resources:
a
particular
printing
of
this
work
(say,
by
ISBN),
or
the
work
itself
in
an
abstract
sense
(for
example,
using
RDF),
or
the
fictional
white
whale,
or
a
particular
copy
of
the
book
on
the
shelves
of
a
library
(via
the
Web
interface
of
the
library's
online
catalog),
or
the
record
in
the
library's
electronic
catalog
which
contains
the
metadata
about
the
work,
or
the
Gutenberg
project's
online
version
.
Similarly,
one
should
not
use
the
same
URI
to
refer
to
a
person
and
to
that
person's
mailbox.
Ambiguous descriptions of what a URI identifies increase the likelihood that two parties will think the same URI identifies different resources, and thus that the parties will use the URI inconsistently. This can be costly, as in the case of two databases in which the same URI is used inconsistently; merging the two databases might lead to confusion or errors.
HTTP [ RFC2616 ] has been designed to help service URIs. For example, HTTP redirection (via some of the 3xx response codes) permits servers to tell an agent that further action needs to be taken by the agent in order to fulfill the request (e.g., the resource has been assigned a new URI). In addition, content negotiation also promotes consistency, as a site manager would not be required to define new URIs for each new format that is supported, as would be the case with protocols that don't support content negotiation, such as FTP.
For more discussion about URI persistence, refer to [ Cool ]. 5
As we have seen, identification of a resource on the Web is distinct from interacting with that resource. It is reasonable to control access to the resource (e.g., for security reasons), but it is unreasonable to prohibit others from merely identifying the resource.
As an analogy: A building might have a policy that the public may only enter via the main front door, and only during business hours. People employed in the building and in making deliveries to it might use other doors as appropriate. Such a policy would be enforced by a combination of security personnel and mechanical devices such as locks and pass-cards. One would not enforce this policy by hiding some of the building entrances, nor by requesting legislation requiring the use of the front door and forbidding anyone to reveal the fact that there are other doors to the building.
In
the
travel
scenario
,
imagine
that
Dan
and
Norm
both
subscribe
to
the
weather.example.com
newsletter.
Dan
wishes
to
point
out
an
article
of
particular
interest
to
Norm,
using
a
URI.
The
managers
of
weather.example.com
can
offer
Dan
and
Norm
the
benefits
of
URIs
(e.g.,
bookmarking
and
linking)
and
still
control
access
to
the
newsletter
by
authorized
parties.
The
Web
provides
several
mechanisms
to
control
access
to
resources,
none
of
which
relies
on
hiding
or
suppressing
URIs
for
those
resources.
For
more
information
on
identification
and
access
control,
please
refer
to
the
TAG
finding
"'
Deep
Linking'
in
the
World
Wide
Web
."
The integration of internationalized identifiers (i.e., composed of characters beyond those allowed by [ URI ]) into the Web Architecture is an important and open issue. See TAG issue IRIEverywhere-27 for discussion about work going on in this area.
Emerging
Semantic
Web
technologies,
including
"DAML+OIL"
[
DAMLOIL
]
and
"Web
Ontology
Language
(OWL)"
[
OWL10
],
define
RDF
properties
such
as
equivalentTo
and
FunctionalProperty
to
state
--
or
at
least
claim
--
formally
that
two
URIs
identify
the
same
resource.
There has been some discussion but no agreement that new access protocols should provide a means to convert fragment identifiers according to media type.
Fragment identifier semantics may differ among formats. See related TAG issues httpRange-14 and RDFinXHTML-35 and abstractComponentRefs-37 .
The Dynamic Delegation Discovery System ( DDDS ) ([ RFC3401 ] and related RFCs) is used to implement lazy binding of strings to data, in order to support dynamically configured delegation systems. This system is designed to allow resolution of any type of URI, in particular URNs.
One area of work involves the creation of globally unique identifiers in a file-sharing system without centralized or hierarchical administration.
A representation is data that represents or describes the state of a resource. It consists of:
Web agents use representations to modify as well as read resource state.
In our previous travel scenario the representation Dan receives (and whether he receives one at all) depends on a number of factors, including:
weather.example.com
respond
to
requests
at
all;
weather.example.com
make
available
one
or
more
representations
for
the
resource
identified
by
http://weather.example.com/oaxaca;
weather.example.com
have
provided
more
than
one
representation
(in
different
formats
such
as
HTML,
PNG,
or
RDF,
in
different
languages
such
as
English
and
Spanish,
etc.),
the
result
representation
may
depend
on
negotiation
between
the
user
agent
and
server
that
occurs
as
part
of
the
HTTP
transaction.
We discuss these issues in more detail below.
As discussed above, the owner of a resource assigns its authoritative meaning and the URIs that refer to it. This meaning is communicated in part through metadata that is part of the representation, notably the Internet Media Type. At times there may be inconsistencies between metadata and what is specified in a format. Examples of inconsistencies between headers and format data that have been observed on the Web include:
User agents should detect such inconsistencies but should not resolve them without involving the user (e.g., by securing permission or at least providing notification). User agents must not silently ignore authoritative server metadata.
See the draft TAG finding " Client handling of MIME headers" for more in-depth discussion and examples.
The Web can be used to interchange resource representations in any format. This flexibility is important, since there is continuing progress in the development of new data formats for new applications and the refinement of existing ones.
For a format to be usefully interoperable between two parties, the parties must have a shared understanding of its syntax and semantics. This is not to imply that a sender of data can count on constraining its treatment by a receiver; simply that making good use of electronic data usually requires knowledge of its designers' intentions.
For a format to be widely interoperable across the Web:
Although the Web architecture allows for the deployment of new data formats, the creation and deployment of new formats (and software able to handle them) can be very expensive. Thus, before inventing a new data format, designers should carefully consider re-using one that is already available. For example, if a format is required to contain human-readable text with embedded hyperlinks, it is almost certainly better to use HTML for this purpose than to invent a new format.
As noted above, the utility of data formats starts with the availability of a normative specification. Some of the desirable characteristics of a format include:
The section on architectural specifications includes references to additional format specification guidelines.
Other design issues:
This section discusses important characteristics of data formats which can together be used to describe and understand them.
A textual data format is one in which the data is specified as a linear sequence of characters. HTML, Internet e-mail, and all XML-based languages are textual. In modern textual data formats, the characters are usually taken from the Unicode repertoire.
Binary data formats are those in which portions of the data are encoded for direct use by computer processors, for example thirty-two bit little-endian two's-complement and sixty-four bit IEEE double-precision floating-point. The portions of data so represented are include numeric values, pointers, and compressed data of all sorts.
In principle, all data can be represented using textual formats.
The trade-offs between binary and textual data formats are complex and application-dependent. Binary formats can be substantially more compact, particularly for complex pointer-rich data structures. Also, they can be consumed more rapidly by software in those cases where they can be loaded into memory and used with little or no conversion.
Textual formats are often more portable and interoperable, since there are fewer choices for representation of the basic units (characters), and those choices are well-understood and widely implemented.
Textual formats also have the considerable advantage that they can be directly read and understood by human beings. This can simplify the tasks of creating and maintaining processing software, and allow the direct intervention of humans in the processing chain without recourse to tools any more complex than the ubiquitous text editor. Finally, it simplifies the necessary human task of learning about new data formats (the "View Source" effect).
All things being equal (a rare state of affairs) textual formats are generally preferable to binary ones in Web applications.
It is important to emphasize that intuition as to such matters as data size and processing speed are not a reliable guide in data format design; quantitative studies are essential to a correct understanding of the trade-offs.
TAG issue binaryXML-30 : Effect of Mobile on architecture - size, complexity, memory constraints. Binary infosets, storage efficiency.
Final-form data formats are not designed to allow modification or uses other than that intended by their designers. An example would be PDF, which is designed to support the presentation of page images on either screen or paper, and is not readily used in any other way. XSL Formatting Objects (XSL-FO) share this characteristic.
XHTML, on the other hand, can be and is put to a variety of uses including direct display (with highly flexible display semantics), processing by network-sensitive Web spiders to support search and retrieval operations, and reprocessing into a variety of derivative forms.
In general XML-based data formats are more re-usable and repurposable than the alternatives, although the example of XSL-FO shows that this is not an absolute.
There are many cases where final-form is an application requirement; representations which embody legally-binding transactions are an obvious example. In such cases, the use of digital signatures may be appropriate to achieve immutability, whether the format is naturally final-form or some XML vocabulary.
On the other hand, where such requirements are not in play, representations that are reusable and repurposable are in general higher in value, particularly in the case where the information's utility may be long-lived.
Some data formats are explicitly designed to be used in combination with others, while some are designed for standalone use. An example of a standalone data format is PDF; it is not typically embedded in representations encoded in other formats.
At the other extreme is SOAP, which is designed explicitly to contain a "payload" in some non-SOAP vocabulary. Another example is SVG, which is designed to be included in compound documents, and which may in turn contain information encoded in other XML vocabularies.
This characteristic is related to, but distinct from, the final-form/reusable distinction discussed above. For example, one can certainly imagine cases where it is useful for a representation to include data in multiple different formats, but be considered immutable and display-only.
TAG issue xmlProfiles-29 : When, whither and how to profile W3C specifications in the XML Family?
TAG issue mixedUIXMLNamespace-33 : Composability for user interface-oriented XML namespaces
TAG issue xmlFunctions-34 : XML Transformation and composability (e.g., XSLT, XInclude, Encryption)
TAG issue RDFinXHTML-35 : Syntax and semantics for embedding RDF in XHTML
More incoming from D. Orchard
Editor's note : Expect to add reference Web Architecture: Extensible Languages .
In many cases, the information contained in a separation is logically separable from the choice of ways in which it may be presented to a human, and the modes of interaction it may support.
While such separation is, where possible, often advantageous, it is clearly not always possible and in some cases not desirable either.
More incoming from C. Lilley
One of the greatest strengths of HTML as a resource representation format is the ability that it allows authors to embed cross references (links) inside it. (hyperlinks). The simplicity of <a href="#foo"> as a link to "foo" and <a name="foo"> as the anchor "foo" are partly (perhaps largely) responsible for the birth of the hypertext Web as we know it today.
Simple, single-ended, single-direction, inline links are not the most powerful linking paradigm imaginable. But they are very easy to understand. And they can be authored by individuals (or other agents) that have no control or write access to the other end point.
More sophisticated linking mechanisms have been invented for the Web. XPointer allows links to address content that does not have an explicit, named anchor. XLink allows links to have multiple ends and to be expressed either inline or in "link bases" stored external to any or all of the resources identified by the links it contains.
All of the current common linking mechanisms identify resources by URI and optionally identify portions (or views) of a resource with the fragment identifier. The almost universal appeal of linking between resources suggests that inventors of new representation formats SHOULD provide mechanisms for identifying links to other resources. Representation formats based on XML SHOULD examine XPointer and XLink for inspiration.
The common need to point into a resource, representation, that is, to identify some portion of its content (or some view of its content) besides the entire, monolithic resource representation suggests that inventors of new representation formats SHOULD provide mechanisms for identifying portions of their format. This can most often be achieved by describing the fragment identifier syntax for the media type that identifies their resource format. Representation formats based on XML SHOULD use at least the XPointer Framework and XPointer element() Schemes for their fragment identifier syntax.
If a future revision of RFC 3023 identifies the XPointer Framework, element(), and perhaps other ancillary schemes as the fragment identifier syntax for XML documents, authors will be able to rely on at least those schemes for all XML documents.
TAG issue: What is the scope of using XLink? xlinkScope-23 .
Many resource representations are encoded in formats which are XML vocabularies. This section discusses issues that are specific to such data formats.
Anyone seeking guidance in this area is urged to consult the "Guidelines For The Use of XML in IETF Protocols" [IETFXML] for the use of XML in Internet Protocols. This document contains a very thorough discussion of the considerations that govern whether or not XML ought to be used, as well as specific guidelines on how it ought to be used. While it is directed at Internet applications with specific reference to protocols, the discussion is generally applicable to Web scenarios as well.
The discussion here should be seen as ancillary to the content of the IETF BCP. Refer also to "XML Accessibility Guidelines" [XAG] for help designing XML formats that lower barriers to Web accessibility for people with disabilities.
XML defines textual data formats that are naturally suited to describing data objects which are hierarchical and processed in an in-order sequence. It is widely but not universally applicable for format specifications. For example, an audio or video format is unlikely to be well suited to representation in XML. Design constraints that would suggest the use of XML include:
Editor's note : Which XML Specifications make up the XML Family?
The Web is significantly a networked information system. Authors and applications can use URIs uniformly to identify different resources on the Web. After representations of these resources have been retrieved, they may be processed in a variety of ways. Some applications (and some users) will undoubtedly build new resources by combining several representations together. This is particularly easy, and potentially useful, when XML representations are available for all the resources.
However, combining representations in this way moves them out of their original context and places them in a new context. This change of context introduces the possibility of information loss. Any information that depended on the local context will no longer be available.
What is needed is a mechanism for establishing a global context for the elements and attributes in the XML resources. This problem bears a strong resemblance to the distinction between relative and absolute URIs. While the many hundreds of relative URI references to "index.html" on a typical web server may be entirely unambiguous in their respective contexts, they have no unambiguous global meaning. But each such relative URI has an unambiguous absolute URI that can be established in its local context and used when a document is moved. This solves the problem for URI references.
For elements and attributes, their names can be seen as analogous to relative URI. Within their original context, they have meanings that are clear and entirely unambiguous. Namespaces in XML provides a mechanism for establishing a globally unique name that can be understood in any context.
The "absolute" form of an XML element or attribute name is the combination of its namespace URI and its local name. This is represented lexically in documents by associating namespace names with (optional) prefixes and combining prefixes and local names with a colon as described in "Namespaces in XML" [ XMLNS ].
Designers that use namespaces are thus providing a global context for documents authored with their schema. Establishing this global context allows their documents (and portions of their documents) to be reused and combined in novel ways not yet imagined. Failure to provide a namespace makes such reuse more difficult, perhaps impractical in some cases.
The most significant technical drawback to using namespaces is that they do not interact well with DTDs. DTDs perform validation based on the lexical form of the name, making prefixes semantically significant in ways that are not desirable. As other schema language technologies become widely deployed, this drawback will diminish in significance.
Namespace designers SHOULD make available human-readable material to meet the needs of those who will use the namespace vocabulary. The simplest way to achieve this is for the namespace name to be an HTTP URI which may be dereferenced to access this material. The resource identified by such a URI is called a "namespace document."
There are many reasons why a person or agent might want more information about the namespace. A person might want to:
A namespace document should also support the automatic retrieval of other Web resources that support the processing markup from this vocabulary. Useful information to processors includes:
It follows that there is, in general, no single type of resource that can be returned in response to a request for the namespace name that will always be the most appropriate; see the section on future work regarding namespace document formats for more information.
Issue : namespaceDocument-8 : What should a "namespace document" look like?
Issue : abstractComponentRefs-37 : Definition of abstract components with namespace names and frag ids
Editor's note : Where should we put a section on mixing namespaces; is the section on processing model more appropriate? See issue mixedUIXMLNamespace-33 .
Suppose
that
the
URI
http://example.com/oaxaca
defines
a
resource
with
representations
encoded
in
XML.
What,
then,
is
the
interpretation
of
the
URI
http://example.org/oaxaca#weather
?
The URI specification [ URI ] makes it clear that the interpretation depends on the context of the media type of the representation. It follows from this that designers of XML-based data formats SHOULD include the semantics of fragment identifiers in their designs. The XPointer Framework [ XPTRFR ] provides a syntax designed for in such fragment identifiers, and it SHOULD be used for this purpose.
When
a
representation
is
provided
whose
media
type
is
application/xml
,
there
are
no
semantics
defined
for
fragment
identifiers,
and
thus
they
SHOULD
NOT
be
provided
for
such
representations.
This
is
also
the
case
if
the
representation
is
known
to
be
XML
because
the
media
type
has
a
suffix
of
+xml
as
described
in
"XML
Media
Types"
[
RFC3023
],
but
there
is
no
normative
specification
of
fragment
semantics.
It
is
common
practice
to
assume
that
when
an
element
has
an
attribute
that
is
declared
in
a
DTD
to
be
of
type
ID,
then
the
fragment
identifier
#abc
identifies
the
element
which
has
an
attribute
of
that
type
whose
value
is
"abc"
.
However,
there
is
no
normative
support
for
this
assumption
and
it
is
problematic
in
practice,
since
the
only
defined
way
to
establish
that
an
attribute
is
of
type
ID
is
via
a
DTD,
which
may
not
exist
or
may
not
be
available.
TAG issue fragmentInXML-28 : Do fragment identifiers refer to a syntactic element (at least for XML content), or can they refer to abstractions? See TAG issue.
TAG issue xmlIDSemantics-32 : How should the problem of identifying ID semantics in XML languages be addressed in the absence of a DTD? See draft TAG finding " How should the problem of identifying ID semantics in XML languages be addressed in the absence of a DTD? " .
RFC
3023
defines
the
media
types
application/xml
and
text/xml
,
and
describes
a
convention
whereby
XML-based
data
formats
use
media
types
with
a
+xml
suffix,
for
example
image/svg+xml
.
In
general,
media
types
beginning
with
text/
SHOULD
NOT
be
used
for
XML
representations.
They
create
two
problems:
First,
intermediate
agents
in
the
Web
are
allowed
to
"transcode",
i.e.,
convert
one
character
encoding
to
another.
Since
XML
documents
are
designed
to
allow
them
to
be
self-describing,
and
since
this
is
a
good
and
widely-followed
practice,
any
such
transcoding
will
make
the
self-description
false.
Secondly,
representations
whose
media
types
begin
with
text/
are
required,
unless
the
charset
parameter
is
specified,
to
be
considered
to
be
encoded
in
US-ASCII.
In
the
case
of
XML,
since
it
is
self-describing,
it
is
good
practice
to
omit
the
charset
parameter,
and
since
XML
is
very
often
not
encoded
in
US-ASCII,
the
use
of
"
text/
"
media
types
effectively
precludes
this
good
practice.
This section will describe the architectural principles and constraints regarding interactions between components, including such topics as network protocols and interaction styles, along with interactions between the Web as a system and the people that make use of it. This will include the role of architectural styles, such as REST and SOAP, and the impact of meta-architectures, such as Web Services and the Semantic Web.
Good practice
Glossary not yet completed .
Editor's note : The TAG is still experimenting with the categorization of points in this document. This list is likely to change. It has also been suggested that the categories clearly indicate their primary audience.
The important points of this document are categorized as follows:
*
had
been
chosen
instead,
the
large-scale
result
would,
most
likely,
have
been
the
same.
Other
design
choices
are
more
fundamental;
these
are
the
focus
of
this
document.
Editor's note : The usage of a normative reference in this document needs clarification.
The authors of this document are the participants of W3C's Technical Architecture Group: Tim Berners-Lee (Chair, W3C), Tim Bray (Antarctica Systems), Dan Connolly (W3C), Paul Cotton (Microsoft), Roy Fielding (Day Software), Chris Lilley (W3C), David Orchard (BEA Systems), Norman Walsh (Sun), and Stuart Williams (Hewlett-Packard).
The TAG thanks people for their thoughtful contributions on the TAG's public mailing list, www-tag ( archive ).