The
World
Wide
Web
is
a
network-spanning
information
space
of
resources
interconnected
by
links.
This
information
space
is
the
basis
of,
and
is
shared
by,
a
number
of
information
systems.
Within
each
of
these
systems,
agents
(people
and
software)
retrieve,
create,
display,
analyze,
and
reason
about
resources.
Web
architecture
includes
the
definition
of
the
information
space
in
terms
of
identification
and
representation
of
its
contents,
and
of
the
protocols
that
support
the
interaction
of
agents
in
an
information
system
making
use
of
the
space.
Web
architecture
is
influenced
by
social
requirements
and
software
engineering
principles
.
These
lead
to
design
choices
and
constraints
on
the
behavior
of
systems
that
use
the
Web
in
order
to
achieve
desired
properties
of
the
shared
information
space:
efficiency,
scalability,
and
the
potential
for
indefinite
growth
across
languages,
cultures,
and
media.
Good
practice
by
agents
in
the
system
is
also
important
to
the
success
of
the
system.
This
document
reflects
the
three
bases
of
Web
architecture:
identification,
interaction,
and
representation.
This
section
describes
the
status
of
this
document
at
the
time
of
its
publication.
Other
documents
may
supersede
this
document.
A
list
of
current
W3C
publications
and
the
latest
revision
of
this
technical
report
can
be
found
in
the
W3C
technical
reports
index
at
http://www.w3.org/TR/.
This
is
the
7
10
May
2004
Editor's
Draft
of
"Architecture
of
the
World
Wide
Web,
First
Edition."
This
draft
takes
into
account
a
number
of
changes
based
on
Last
Call
comments;
few
additional
TAG
resolutions
that
were
omitted
from
the
7
May
draft;
see
the
TAG
mailing
list
public-webarch-comments@w3.org
(
archive
)
.
This
document
has
been
developed
by
W3C's
Technical
Architecture
Group
(TAG)
(
charter
).
A
complete
list
of
changes
to
this
document
since
the
first
public
Working
Draft
is
available
on
the
Web.
The
TAG
charter
describes
a
process
for
issue
resolution
by
the
TAG.
In
accordance
with
those
provisions,
the
TAG
maintains
a
running
issues
list
.
The
First
Edition
of
"Architecture
of
the
World
Wide
Web"
does
not
address
every
issue
that
the
TAG
has
accepted
since
it
began
work
in
January
2002.
The
TAG
has
selected
a
subset
of
issues
that
the
First
Edition
does
address
to
the
satisfaction
of
the
TAG;
those
issues
are
identified
in
the
TAG's
issues
list.
The
TAG
intends
to
address
the
remaining
(and
future)
issues
after
publication
of
the
First
Edition
as
a
Recommendation.
This
document
uses
the
concepts
and
terms
regarding
URIs
as
defined
in
draft-fielding-uri-rfc2396bis-03,
preferring
them
to
those
defined
in
RFC
2396.
The
IETF
Internet
Draft
draft-fieldi
ng-uri-rfc2396bis-03
is
expected
to
obsolete
RFC
2396
,
which
is
the
current
URI
standard.
The
TAG
is
tracking
the
evolution
of
draft-fielding-uri-rfc2396bis-03.
Publication
as
a
Working
Draft
does
not
imply
endorsement
by
the
W3C
Membership.
This
is
a
draft
document
and
may
be
updated,
replaced
or
obsoleted
by
other
documents
at
any
time.
It
is
inappropriate
to
cite
this
document
as
other
than
"work
in
progress."
The
latest
information
regarding
patent
disclosures
related
to
this
document
is
available
on
the
Web.
World
Wide
Web
(
WWW
,
or
simply
Web
)
is
an
information
space
in
which
the
items
of
interest,
referred
to
as
resources
,
are
identified
by
global
identifiers
called
Uniform
Resource
Identifiers
(
URI
).
A
travel
scenario
is
used
throughout
this
document
to
illustrate
typical
behavior
of
Web
agents
—
people
or
software
(on
behalf
of
a
person,
entity,
or
process)
acting
on
this
information
space.
Software
agents
include
servers,
proxies,
spiders,
browsers,
and
multimedia
players.
Story
While
planning
a
trip
to
Mexico,
Nadia
reads
"Oaxaca
weather
information:
'http://weather.example.com/oaxaca'"
in
a
glossy
travel
magazine.
Nadia
has
enough
experience
with
the
Web
to
recognize
that
"http://weather.example.com/oaxaca"
is
a
URI.
Given
the
context
in
which
the
URI
appears,
she
expects
that
it
allows
her
to
access
weather
information.
When
Nadia
enters
the
URI
into
her
browser:
-
The
browser
performs
an
information
retrieval
action
in
accordance
with
its
configured
behavior
for
resources
identified
via
the
"http"
URI
scheme.
-
The
authority
responsible
for
"weather.example.com"
provides
information
in
a
response
to
the
retrieval
request.
-
The
browser
displays
the
retrieved
information,
which
includes
hypertext
links
to
other
information.
Nadia
can
follow
these
hypertext
links
to
retrieve
additional
information.
This
scenario
illustrates
the
three
architectural
bases
of
the
Web
that
are
discussed
in
this
document:
-
Identification
.
Each
resource
is
identified
by
a
URI.
In
this
travel
scenario,
the
resource
is
a
periodically-updated
report
on
the
weather
in
Oaxaca,
and
the
URI
is
"http://weather.example.com/oaxaca".
-
Interaction
.
Protocols
define
the
syntax
and
semantics
of
messages
exchanged
by
agents
over
a
network.
Web
agents
communicate
information
about
the
state
of
a
resource
through
the
exchange
of
representations
.
In
the
travel
scenario,
Nadia
(by
clicking
on
a
hypertext
link
)
tells
her
browser
to
request
a
representation
of
the
resource
identified
by
the
URI
in
the
hypertext
link.
The
browser
sends
an
HTTP
GET
request
to
the
server
at
"weather.example.com".
The
server
responds
with
a
representation
that
includes
XHTML
data
and
the
Internet
media
type
"application/xhtml+xml".
-
Formats
.
Representations
are
built
from
a
non-exclusive
set
of
data
formats,
used
separately
or
in
combination
(including
XHTML,
CSS,
PNG,
XLink,
RDF/XML,
SVG,
and
SMIL
animation).
In
this
scenario,
the
representation
data
format
is
XHTML.
While
interpreting
the
XHTML
representation
data,
the
browser
retrieves
and
displays
weather
maps
identified
by
URIs
within
the
XHTML.
The
following
illustration
shows
the
relationship
between
identifier,
resource,
and
representation.
This
document
describes
the
properties
we
desire
of
the
Web
and
the
design
choices
that
have
been
made
to
achieve
them.
This
document
promotes
re-use
of
existing
standards
when
suitable,
and
gives
guidance
on
how
to
innovate
in
a
manner
consistent
with
the
Web
architecture.
The
terms
MUST,
MUST
NOT,
SHOULD,
SHOULD
NOT,
and
MAY
are
used
in
the
principles,
constraints,
and
good
practice
notes
in
accordance
with
RFC
2119
[
RFC2119
].
However,
this
document
does
not
include
conformance
provisions
for
these
reasons:
-
Conforming
software
is
expected
to
be
so
diverse
that
it
would
not
be
useful
to
be
able
to
refer
to
the
class
of
conforming
software
agents.
-
Some
of
the
good
practice
notes
concern
people;
specifications
generally
define
conformance
for
software,
not
people.
-
The
addition
of
a
conformance
section
is
not
likely
to
increase
the
utility
of
the
document.
This
document
is
intended
to
inform
discussions
about
issues
of
Web
architecture.
The
intended
audience
for
this
document
includes:
-
Participants
in
W3C
Activities;
i.e.,
designers
of
Web
technologies
and
specifications
in
W3C
-
Other
groups
and
individuals
designing
technologies
to
be
integrated
into
the
Web
-
Implementers
of
W3C
specifications
-
Web
content
authors
and
publishers
Readers
will
benefit
from
familiarity
with
the
Requests
for
Comments
(
RFC
)
series
from
the
IETF
,
some
of
which
define
pieces
of
the
architecture
discussed
in
this
document.
Note:
This
document
does
not
distinguish
in
any
formal
way
the
terms
"language"
and
"format."
Context
determines
which
term
is
used.
The
phrase
"specification
designer"
encompasses
language,
format,
and
protocol
designers.
This
document
presents
the
general
architecture
of
the
Web.
Other
groups
inside
and
outside
W3C
also
address
specialized
aspects
of
Web
architecture,
including
accessibility,
internationalization,
device
independence,
and
Web
Services.
The
section
on
Architectural
Specifications
includes
references.
This
document
strikes
a
balance
between
brevity
and
precision
while
including
illustrative
examples.
TAG
findings
are
informational
documents
that
complement
the
current
document
by
providing
more
detail
about
selected
topics.
This
document
includes
some
excerpts
from
the
findings.
Since
the
findings
evolve
independently,
this
document
also
includes
references
to
approved
TAG
findings.
For
other
TAG
issues
covered
by
this
document
but
without
an
approved
finding,
references
are
to
entries
in
the
TAG
issues
list
.
Many
of
the
examples
in
this
document
involve
human
activity
suppose
the
familiar
Web
interaction
model
where
a
person
follows
a
link
via
a
user
agent,
the
user
agent
retrieves
and
presents
data,
the
user
follows
another
link,
etc.
This
document
does
not
discuss
in
any
detail
other
interaction
models
such
as
voice
browsing.
For
instance,
when
a
graphical
user
agent
running
on
a
laptop
computer
or
hand-held
device
encounters
an
error,
the
user
agent
can
report
errors
directly
to
the
user
through
visual
and
audio
cues,
and
present
the
user
with
options
for
resolving
the
errors.
On
the
other
hand,
when
someone
is
browsing
the
Web
through
voice
input
and
audio-only
output,
stopping
the
dialog
to
wait
for
user
input
may
reduce
usability
since
it
is
so
easy
to
"lose
one's
place"
when
browsing
with
only
audio-output.
This
document
does
not
discuss
how
the
principles,
constraints,
and
good
practices
identified
here
apply
in
all
interaction
contexts.
The
important
points
of
this
document
are
categorized
as
follows:
-
Principle
-
An
architectural
principle
is
a
fundamental
rule
that
applies
to
a
large
number
of
situations
and
variables.
Architectural
principles
include
"separation
of
concerns",
"generic
interface",
"self-descriptive
syntax,"
"visible
semantics,"
"network
effect"
(Metcalfe's
Law),
and
Amdahl's
Law:
"The
speed
of
a
system
is
limited
by
its
slowest
component."
-
Constraint
-
In
the
design
of
the
Web,
some
design
choices,
like
the
names
of
the
p
and
li
elements
in
HTML,
or
the
choice
of
the
colon
(:)
character
in
URIs,
are
somewhat
arbitrary;
if
paragraph
had
been
chosen
instead
of
p
or
asterisk
(*)
instead
of
colon,
the
large-scale
result
would,
most
likely,
have
been
the
same.
Other
design
choices
are
more
fundamental;
these
are
the
focus
of
this
document.
Design
choices
can
lead
to
constraints,
i.e.,
restrictions
in
behavior
or
interaction
within
the
system.
Constraints
may
be
imposed
for
technical,
policy,
or
other
reasons
to
achieve
certain
properties
of
the
system,
such
as
accessibility
and
global
scope,
and
non-functional
properties,
such
as
relative
ease
of
evolution,
re-usability
of
components,
efficiency,
and
dynamic
extensibility.
-
Good
practice
-
Good
practice
—
by
software
developers,
content
authors,
site
managers,
users,
and
specification
designers
—
increases
the
value
of
the
Web.
This
categorization
is
derived
from
Roy
Fielding's
work
on
"Representational
State
Transfer"
[
REST
].
A
number
of
general
architecture
principles
apply
to
all
three
bases
of
Web
architecture.
Identification,
interaction,
and
representation
are
independent
(or,
"orthogonal",
or
"loosely
coupled")
concepts:
-
one
identifies
a
resource
with
a
URI.
One
may
publish
and
use
a
URI
without
building
any
representations
of
the
resource
or
determining
whether
any
representations
are
available.
-
a
generic
URI
syntax
allows
agents
to
function
in
many
cases
without
knowing
specifics
of
URI
schemes.
-
in
many
cases
one
may
change
the
representation
of
a
resource
without
disrupting
references
to
the
resource.
Independence
of
specifications
facilitates
a
flexible
design
that
can
evolve
over
time.
For
example,
one
may
refer
to
an
image
with
a
URI
without
worrying
about
the
format
chosen
to
represent
the
image.
This
independence
has
allowed
the
introduction
of
image
formats
such
as
PNG
and
SVG
without
disrupting
references
to
image
resources.
Independent
abstractions
benefit
from
independent
specifications.
Specifications
should
clearly
indicate
those
features
that
simultaneously
access
information
from
otherwise
independent
abstractions.
For
example
a
specification
should
draw
attention
to
a
feature
that
requires
information
from
both
the
header
and
the
body
of
a
message.
Although
the
HTTP,
HTML,
and
URI
specifications
are
independent
for
the
most
part,
they
are
not
completely
independent.
Experience
demonstrates
that
where
they
are
not,
problems
have
arisen:
-
The
HTML
specification
includes
a
protocol
extension
of
sorts:
it
specifies
how
a
user
agent
sends
HTML
form
data
to
a
server
(as
a
URI
query
string).
The
design
works
reasonably
well,
although
there
are
limitations
related
to
internationalization
(see
the
TAG
finding
"
URIs,
Addressability,
and
the
use
of
HTTP
GET
and
POST
"
)
and
the
query
string
design
impinges
on
the
server
design.
Software
developers
(for
example,
of
[
CGI
]
applications)
might
have
an
easier
time
finding
the
specification
if
it
were
published
separately
and
then
cited
from
the
HTTP,
URI,
and
HTML
specifications.
-
The
HTML
specification
allows
content
providers
to
instruct
HTTP
servers
to
build
response
headers
from
META
element
instances.
This
is
an
abstraction
violation;
the
software
developer
community
would
benefit
from
being
able
to
find
all
HTTP
headers
from
the
HTTP
specification
(including
any
associated
extension
registries
and
specification
updates
per
IETF
process).
Perhaps
as
a
result,
this
feature
of
the
HTML
specification
is
not
widely
deployed.
Furthermore,
this
design
has
led
to
confusion
in
user
agent
development.
The
HTML
specification
states
that
META
in
conjunction
with
http-equiv
is
intended
for
HTTP
servers,
but
many
HTML
user
agents
interpret
http-equiv='refresh'
as
a
client-side
instruction.
-
Some
content
authors
use
the
META
/
http-equiv
approach
to
declare
the
character
encoding
scheme
of
an
HTML
document.
By
design,
this
is
a
hint
that
an
HTTP
server
should
emit
a
corresponding
"Content-Type"
header
field.
In
practice,
the
use
of
the
hint
in
servers
is
not
widely
deployed.
Furthermore,
many
user
agents
use
this
information
to
override
the
"Content-Type"
header
sent
by
the
server.
This
works
against
the
principle
of
authoritative
representation
metadata
.
The
information
in
the
Web
and
the
technologies
used
to
represent
that
information
change
over
time.
Some
examples
of
successful
technologies
designed
to
allow
change
while
minimizing
disruption
include:
-
the
fact
that
URI
schemes
are
independently
specified;
-
the
use
of
an
open
set
of
Internet
media
types
in
mail
and
HTTP
to
specify
document
interpretation;
-
the
separation
of
the
generic
XML
grammar
and
the
open
set
of
XML
namespaces
for
element
and
attribute
names;
-
extensibility
models
in
Cascading
Style
Sheets
(CSS),
XSLT
1.0,
and
SOAP;
-
user
agent
plug-ins.
Below
we
discuss
the
property
of
"extensibility,"
exhibited
by
URIs
and
some
data
and
message
formats,
which
promotes
technology
evolution
and
interoperability.
Language
subset
:
one
language
is
a
subset
(or,
"profile")
of
a
second
language
if
any
document
in
the
first
language
is
also
a
valid
document
in
the
second
language
and
has
the
same
interpretation
in
the
second
language.
Language
extension
:
one
language
is
an
extension
of
a
second
language
if
the
second
is
a
language
subset
of
the
first
(thus,
the
extension
is
a
superset).
Clearly,
creating
an
language
extension
is
better
for
interoperability
than
creating
an
incompatible
language.
Ideally,
many
instances
of
a
superset
language
can
be
safely
and
usefully
processed
as
though
they
were
in
the
language
subset.
Languages
that
exhibit
this
property
are
said
to
be
"extensible."
Language
designers
can
facilitate
extensibility
by
defining
how
implementations
must
handle
unknown
extensions
--
for
example,
that
they
be
ignored
(in
some
way)
or
should
be
considered
errors.
For
example,
from
early
on
in
the
Web,
HTML
agents
followed
the
convention
of
ignoring
unknown
elements.
This
choice
left
room
for
innovation
(i.e.,
non-standard
elements)
and
encouraged
the
deployment
of
HTML.
However,
interoperability
problems
arose
as
well.
In
this
type
of
environment,
there
is
an
inevitable
tension
between
interoperability
in
the
short
term
and
the
desire
for
extensibility.
Experience
shows
that
designs
that
strike
the
right
balance
between
allowing
change
and
preserving
interoperability
are
more
likely
to
thrive
and
are
less
likely
to
disrupt
the
Web
community.
Independent
specifications
help
reduce
the
risk
of
disruption.
For
further
discussion,
see
the
section
on
versioning
and
extensibility
.
See
also
TAG
issue
xmlProfiles-29
.
Errors
occur
in
networked
information
systems.
The
manner
in
which
they
are
dealt
with
depends
on
application
context.
A
user
agent
acts
on
behalf
of
the
user
and
therefore
is
expected
to
help
the
user
understand
the
nature
of
errors,
and
possibly
overcome
them.
User
agents
that
correct
errors
without
the
consent
of
the
user
are
not
acting
on
the
user's
behalf.
Principle:
Error
recovery
Recovery
Agent
recovery
from
error
without
user
consent
is
harmful.
Consent
does
not
necessarily
imply
that
the
receiving
agent
must
interrupt
the
user
and
require
selection
of
one
option
or
another.
The
user
may
indicate
through
pre-selected
configuration
options,
modes,
or
selectable
user
interface
toggles,
with
appropriate
reporting
to
the
user
when
the
agent
detects
an
error.
To
promote
interoperability,
specification
designers
should
set
expectations
about
behavior
in
the
face
of
known
error
conditions.
Experience
has
led
to
the
following
observations
about
error-handling
approaches.
-
Protocol
designers
should
provide
enough
information
about
the
error
condition
so
that
an
agent
can
address
the
error
condition.
For
instance,
an
HTTP
404
message
("resource
not
found")
is
useful
because
it
allows
user
agents
to
present
relevant
information
to
users,
enabling
them
to
contact
the
representation
provider
in
case
of
problems.
-
Experience
with
the
cost
of
building
a
user
agent
to
handle
the
diverse
forms
of
ill-formed
HTML
content
convinced
the
designers
of
the
XML
specification
to
require
that
agents
fail
upon
encountering
ill-formed
content.
Because
users
are
unlikely
to
tolerate
such
failures,
this
design
choice
has
pressured
all
parties
into
respecting
XML's
constraints,
to
the
benefit
of
all.
-
An
agent
that
encounters
unrecognized
content
may
handle
it
in
a
number
of
ways,
including
as
an
error;
see
also
the
section
on
extensibility
and
versioning
.
-
Error
behavior
that
is
appropriate
for
a
person
may
not
be
appropriate
for
software.
People
are
capable
of
exercising
judgement
in
ways
that
software
applications
generally
cannot.
An
informal
error
response
may
suffice
for
a
person
but
not
for
a
processor.
See
the
TAG
issues
contentTypeOverride-24
and
errorHandling-20
.
The
Web
follows
Internet
tradition
in
that
its
important
interfaces
are
defined
in
terms
of
protocols,
by
specifying
the
syntax,
semantics,
and
sequence
of
the
messages
interchanged.
The
technology
shared
among
Web
agents
lasts
longer
than
the
agents
themselves.
It
is
common
for
programmers
working
with
the
Web
to
write
code
that
generates
and
parses
these
messages
directly.
It
is
less
common,
but
not
unusual,
for
end
users
to
have
direct
exposure
to
these
messages.
It
is
often
desirable
to
provide
users
with
access
to
format
and
protocol
details:
allowing
them
to
"
view
source
,"
whereby
they
may
gain
expertise
in
the
workings
of
the
underlying
system.
Parties
who
wish
to
communicate
effectively
must
agree
(to
a
reasonable
extent)
upon
a
shared
set
of
identifiers
and
on
their
meanings.
The
ability
to
use
common
identifiers
across
communities
motivates
global
identifiers
in
Web
architecture.
Thus,
Uniform
Resource
Identifiers
([
URI
],
currently
being
revised)
which
are
global
identifiers
in
the
context
of
the
Web,
are
central
to
Web
architecture.
Constraint:
Identify
with
URIs
The
identification
mechanism
for
the
Web
is
the
URI.
A
URI
must
be
assigned
to
a
resource
in
order
for
agents
to
be
able
to
refer
to
the
resource.
It
follows
that
a
resource
should
be
assigned
a
URI
if
a
third
party
might
reasonably
want
to
link
to
it,
make
or
refute
assertions
about
it,
retrieve
or
cache
a
representation
of
it,
include
all
or
part
of
it
by
reference
into
another
representation,
annotate
it,
or
perform
other
operations
on
it.
Formats
that
allow
content
authors
to
use
URIs
instead
of
local
identifiers
foster
the
"network
effect":
the
value
of
these
formats
grows
with
the
size
of
the
deployed
Web.
Resources
exist
before
URIs;
a
resource
may
be
identified
by
zero
URIs.
However,
there
are
many
benefits
to
assigning
a
URI
to
a
resource,
including
linking,
bookmarking,
caching,
and
indexing
by
search
engines.
Software
developers
should
expect
that
it
will
prove
useful
to
be
able
to
share
a
URI
across
applications,
even
if
that
utility
is
not
initially
evident.
The
scope
of
a
URI
is
global;
the
resource
identified
by
a
URI
does
not
depend
on
the
context
in
which
the
URI
appears
(see
also
the
section
about
URIs
in
other
roles
).
Of
course,
what
an
agent
does
with
a
URI
may
vary.
The
TAG
finding
"
URIs,
Addressability,
and
the
use
of
HTTP
GET
and
POST
"
discusses
additional
benefits
and
considerations
of
URI
addressability.
Principle:
URI
assignment
One
should
assign
a
URI
to
anything
that
others
will
expect
to
refer
to.
This
principle
dates
back
at
least
as
far
as
Douglas
Engelbart's
seminal
work
on
open
hypertext
systems;
see
section
Every
Object
Addressable
in
[
Eng90
].
The
most
straightforward
way
of
establishing
that
two
parties
are
referring
to
the
same
resource
is
to
compare,
character-by-character,
the
URIs
they
are
using.
Two
URIs
that
are
identical
(character
for
character)
refer
to
the
same
resource.
However,
Web
architecture
allows
people
to
assign
more
than
one
URI
to
a
resource.
Constraint:
URI
multiplicity
Web
architecture
does
not
constrain
a
resource
to
be
identified
by
a
single
URI.
Consequently,
two
URIs
that
are
not
identical
(character
for
character)
can
still
refer
to
the
same
resource
(i.e.,
they
do
not
necessarily
refer
to
different
resources).
To
reduce
the
risk
of
a
false
negative
comparison
(i.e.,
an
incorrect
conclusion
that
two
URIs
do
not
refer
to
the
same
resource)
or
a
false
positive
comparison
(i.e.,
an
incorrect
conclusion
that
two
URIs
do
refer
to
the
same
resource),
certain
specifications
license
applications
to
apply
tests
in
addition
to
character-by-character
comparison.
For
example,
for
"http"
URIs,
the
authority
component
(the
part
after
"//"
and
before
the
next
"/")
is
defined
to
be
case-insensitive.
Thus,
the
"http"
URI
specification
licenses
applications
to
conclude
that
authority
components
in
two
"http"
URIs
are
equivalent
when
those
strings
are
character-by-character
equivalent
or
differ
only
by
case.
By
following
the
"http"
URI
specification,
agents
are
licensed
to
conclude
that
"http://Weather.Example.Com/Oaxaca"
and
"http://weather.example.com/Oaxaca"
identify
the
same
resource.
Agents
that
reach
conclusions
based
on
comparisons
that
are
not
licensed
by
relevant
specifications
take
responsibility
for
any
problems
that
result.
Agents
should
not
assume,
for
example,
that
"http://weather.example.com/Oaxaca"
and
"http://weather.example.com/OAXACA"
identify
the
same
resource,
since
none
of
the
specifications
involved
states
that
the
path
component
of
an
"http"
URI
is
case-insensitive.
Section
6
[
URI
]
provides
more
information
about
comparing
URIs
and
reducing
the
risk
of
false
negatives
and
positives.
See
the
section
below
on
approaches
other
than
string
comparison
that
allow
different
parties
to
assert
that
two
URIs
identify
the
same
resource
.
There
are
many
benefits
to
ensuring
that
software
can
determine,
by
following
specifications,
that
two
URIs
refer
to
the
same
resource.
URI
producers
should
be
conservative
about
the
number
of
different
URIs
they
produce
for
the
same
resource,
especially
when
software
cannot
determine
the
equivalence
of
those
URIs.
For
example,
the
parties
responsible
for
weather.example.com
should
not
use
both
"http://weather.example.com/Oaxaca"
and
"http://weather.example.com/oaxaca"
to
refer
to
the
same
resource;
software
will
not
detect
the
equivalence
relationship
by
following
specifications.
Good
practice:
Avoiding
URI
aliases
A
URI
owner
should
not
create
arbitrarily
different
URIs
for
the
same
resource.
There
may,
of
course,
be
good
reasons
for
creating
similar-looking
URIs.
For
instance,
one
might
reasonably
create
URIs
that
begin
with
"http://www.example.com/tempo"
and
"http://www.example.com/tiempo"
to
provide
access
to
resources
by
users
who
speak
Italian
and
Spanish.
Likewise,
URI
consumers
should
ensure
URI
consistency.
For
instance,
when
transcribing
a
URI,
agents
should
not
gratuitously
escape
characters.
The
term
"character"
refers
to
URI
characters
as
defined
in
section
2
of
[
URI
].
Good
practice:
Consistent
URI
usage
If
a
URI
has
been
assigned
to
a
resource,
agents
SHOULD
refer
to
the
resource
using
the
same
URI,
character
for
character.
When
a
URI
alias
does
become
common
currency,
the
URI
owner
should
use
protocol
techniques
such
as
server-side
redirects
to
connect
the
two
resources.
The
community
benefits
when
the
URI
owner
supports
both
the
"unofficial"
URI
and
the
alias.
At
times,
different
agents
intentionally
or
unintentionally
use
the
same
URI
to
identify
different
resources.
URI
overloading
refers
to
the
use,
in
the
context
of
Web
protocols
and
formats,
of
one
URI
to
refer
to
more
than
one
resource.
Just
as
promoting
a
shared
vocabulary
has
tangible
value,
overloading
often
imposes
a
cost
in
communication.
Suppose
that
one
organization
uses
a
URI
on
their
site
to
refer
to
the
movie
"The
Sting",
and
another
organization
uses
the
same
URI
to
refer
to
a
resource
that
talks
about
"The
Sting."
Inconsistent
use
of
the
URI
creates
confusion
about
what
the
URI
identifies.
In
many
contexts,
inconsistent
use
may
not
lead
to
error
or
cause
harm.
However,
in
some
contexts
such
as
the
Semantic
Web,
software
relies
on
consistent
use
of
URIs.
If
one
wanted
to
talk
about
the
creation
date
of
the
resource
identified
by
the
URI,
for
instance,
it
would
not
be
clear
whether
this
meant
"when
the
movie
created"
or
"when
the
resource
about
the
movie
was
created."
Good
practice:
Avoiding
URI
Overloading
Avoid
URI
overloading.
The
section
below
on
URI
ownership
examines
approaches
for
establishing
the
authoritative
source
of
information
about
what
resource
a
URI
identifies.
In
Web
architecture,
URIs
identify
resources.
Outside
the
context
of
Web
architecture
specifications,
URIs
can
be
useful
for
other
purposes,
for
example,
as
database
keys.
For
instance,
the
organizers
of
a
conference
might
use
"mailto:nadia@example.com"
to
refer
to
Nadia.
While
this
usage
is
not
licensed
by
Web
architecture
specifications,
in
the
context
of
the
conference,
all
parties
may
agree
to
that
local
policy
and
understand
one
another.
Certain
properties
of
URIs,
such
as
their
potential
for
global
uniqueness,
make
them
appealing
as
general-purpose
identifiers.
In
the
Web
architecture,
"mailto:nadia@example.com"
identifies
an
Internet
mailbox;
that
is
what
is
licensed
by
the
"mailto"
URI
scheme
specification.
The
fact
that
the
URI
serves
other
purposes
in
non-Web
contexts
does
not
lead
to
URI
overloading.
URI
overloading
arises
when
a
URI
is
used
to
identify
two
different
resources
within
the
context
of
Web
protocols
and
formats.
The
requirement
that
URIs
not
be
overloaded
(explained
below)
demands
that
different
agents
do
not
assign
the
same
URI
to
different
resources.
URI
scheme
specifications
assure
this
using
a
variety
of
techniques,
including:
-
Hierarchical
delegation
of
authority.
This
approach,
exemplified
by
the
"http"
and
"mailto"
schemes,
allows
the
assignment
of
a
part
of
URI
space
to
one
party,
reassignment
of
a
piece
of
that
space
to
another,
and
so
forth.
-
Large
numbers.
The
generation
of
a
fairly
large
random
number
or
a
checksum
reduces
the
risk
of
URI
overloading
to
a
calculated
small
risk.
A
draft
"uuid"
scheme
adopted
this
approach;
one
could
also
imagine
a
scheme
based
on
md5
checksums.
-
Combination
of
approaches.
The
"mid"
and
"cid"
schemes
combine
some
of
the
above
approaches.
The
approach
taken
for
the
"http"
URI
scheme
follows
the
pattern
whereby
the
Internet
community
delegates
authority,
via
the
IANA
URI
scheme
registry
[
IANASchemes
]
and
the
DNS,
over
a
set
of
URIs
with
a
common
prefix
to
one
particular
owner.
One
consequence
of
this
approach
is
the
Web's
heavy
reliance
on
the
central
DNS
registry.
Except
when
a
URI
is
constructed
from
a
checksum,
all
of
the
techniques
seek
to
establish
a
unique
relationship
between
a
social
entity
and
a
URI.
This
relationship
is
called
URI
ownership
.
In
this
document,
the
phrase
"authority
responsible
for
domain
X"
indicates
that
the
same
entity
owns
those
URIs
where
the
authority
component
is
domain
X.
This
document
does
not
address
how
the
benefits
and
responsibilities
of
URI
ownership
may
be
delegated
to
other
parties
(e.g.,
to
individuals
managing
an
HTTP
server).
A
URI
owner
may
provide
representations
of
the
resource
identified
by
the
URI
upon
request.
When
the
HTTP
protocol
is
used
to
provide
representations,
the
HTTP
origin
server
(defined
in
[
RFC2616
])
is
the
software
agent
acting
on
behalf
of
the
URI
owner.
The
URI
owner
has
a
privileged
position
in
the
Web
architecture
as
the
entity
that
assigns
authoritative
metadata
to
such
representations;
see
the
section
on
authoritative
metadata
for
more
information.
There
are
also
social
expectations
for
responsible
representation
management
by
URI
owners.
Additional
social
implications
of
URI
ownership
are
not
discussed
here.
However,
the
success
or
failure
of
these
different
approaches
depends
on
the
extent
to
which
there
is
consensus
in
the
Internet
community
on
abiding
by
the
defining
specifications.
In
the
URI
"http://weather.example.com/",
the
"http"
that
appears
before
the
colon
(":")
names
a
URI
scheme.
Each
URI
scheme
has
a
normative
specification
that
explains
how
identifiers
are
assigned
within
that
scheme.
The
URI
syntax
is
thus
a
federated
and
extensible
naming
mechanism
wherein
each
scheme's
specification
may
further
restrict
the
syntax
and
semantics
of
identifiers
within
that
scheme.
Examples
of
URIs
from
various
schemes
include:
-
mailto:joe@example.org
-
ftp://example.org/aDirectory/aFile
-
news:comp.infosystems.www
-
tel:+1-816-555-1212
-
ldap://ldap.example.org/c=GB?objectClass?one
-
urn:oasis:names:tc:entity:xmlns:xml:catalog
While
the
Web
architecture
allows
the
definition
of
new
schemes,
introducing
a
new
scheme
is
costly.
Many
aspects
of
URI
processing
are
scheme-dependent,
and
a
significant
amount
of
deployed
software
already
processes
URIs
of
well-known
schemes.
Introducing
a
new
URI
scheme
requires
the
development
and
deployment
not
only
of
client
software
to
handle
the
scheme,
but
also
of
ancillary
agents
such
as
gateways,
proxies,
and
caches.
See
[
RFC2718
]
for
other
considerations
and
costs
related
to
URI
scheme
design.
Because
of
these
costs,
if
a
URI
scheme
exists
that
meets
the
needs
of
an
application,
designers
should
use
it
rather
than
invent
one.
Good
practice:
New
URI
schemes
A
specification
SHOULD
NOT
introduce
a
new
URI
scheme
when
an
existing
scheme
provides
the
desired
properties
of
identifiers
and
their
relation
to
resources.
Consider
our
travel
scenario
:
should
the
agent
providing
information
about
the
weather
in
Oaxaca
register
a
new
URI
scheme
"weather"
for
the
identification
of
resources
related
to
the
weather?
They
might
then
publish
URIs
such
as
"weather://travel.example.com/oaxaca".
When
a
software
agent
dereferences
such
a
URI,
if
what
really
happens
is
that
HTTP
GET
is
invoked
to
retrieve
a
representation
of
the
resource,
then
an
"http"
URI
would
have
sufficed.
If
the
motivation
behind
registering
a
new
scheme
is
to
allow
a
software
agent
to
launch
a
particular
application
when
retrieving
a
representation,
such
dispatching
can
be
accomplished
at
lower
expense
via
Internet
media
types.
When
designing
a
new
data
format,
the
appropriate
mechanism
to
promote
its
deployment
on
the
Web
is
the
Internet
media
type.
Note
that
even
if
an
agent
cannot
process
representation
data
in
an
unknown
format,
it
can
at
least
retrieve
it.
The
data
may
contain
enough
information
to
allow
a
user
or
user
agent
to
make
some
use
of
it.
When
an
agent
does
not
handle
a
new
URI
scheme,
it
cannot
retrieve
a
representation.
The
Internet
Assigned
Numbers
Authority
(
IANA
)
maintains
a
registry
[
IANASchemes
]
of
mappings
between
URI
scheme
names
and
scheme
specifications.
For
instance,
the
IANA
registry
indicates
that
the
"http"
scheme
is
defined
in
[
RFC2616
].
The
process
for
registering
a
new
URI
scheme
is
defined
in
[
RFC2717
].
The
use
of
unregistered
URI
schemes
is
discouraged
for
a
number
of
reasons:
-
There
is
no
generally
accepted
way
to
locate
the
scheme
specification.
-
Someone
else
may
be
using
the
scheme
for
other
purposes.
-
One
should
not
expect
that
general-purpose
software
will
do
anything
useful
with
URIs
of
this
scheme
beyond
URI
comparison;
the
network
effect
is
lost.
Note:
Some
URI
scheme
specifications
(such
as
the
"ftp"
URI
scheme
specification)
use
the
term
"designate"
where
the
current
document
uses
"identify."
TAG
issue
siteData-36
is
about
expropriation
of
naming
authority.
It
is
tempting
to
guess
the
nature
of
a
resource
by
inspection
of
a
URI
that
identifies
it.
However,
the
Web
is
designed
so
that
agents
communicate
resource
state
through
representations
,
not
identifiers.
In
general,
one
cannot
determine
the
Internet
media
type
of
representations
of
a
resource
by
inspecting
a
URI
for
that
resource.
For
example,
the
".html"
at
the
end
of
"http://example.com/page.html"
provides
no
guarantee
that
representations
of
the
identified
resource
will
be
served
with
the
Internet
media
type
"text/html".
The
HTTP
protocol
does
not
constrain
the
Internet
media
type
based
on
the
path
component
of
the
URI;
the
URI
owner
is
free
to
configure
the
server
to
return
a
representation
using
PNG
or
any
other
data
format.
Resource
state
may
evolve
over
time.
Requiring
a
URI
owner
to
publish
a
new
URI
for
each
change
in
resource
state
would
lead
to
a
significant
number
of
broken
links.
For
robustness,
Web
architecture
promotes
independence
between
an
identifier
and
the
identified
resource.
Good
practice:
URI
opacity
Agents
making
use
of
URIs
MUST
NOT
attempt
to
infer
properties
of
the
referenced
resource
except
as
licensed
by
relevant
specifications.
The
example
URI
used
in
the
travel
scenario
("http://weather.example.com/oaxaca")
suggests
that
the
identified
resource
has
something
to
do
with
the
weather
in
Oaxaca.
A
site
reporting
the
weather
in
Oaxaca
could
just
as
easily
be
identified
by
the
URI
"http://vjc.example.com/315".
And
the
URI
"http://weather.example.com/vancouver"
might
identify
the
resource
"my
photo
album."
On
the
other
hand,
the
URI
"mailto:joe@example.com"
indicates
that
the
URI
refers
to
a
mailbox.
The
"mailto"
URI
scheme
specification
authorizes
agents
to
infer
that
URIs
of
this
form
identify
Internet
mailboxes.
In
some
cases,
relevant
technical
specifications
license
URI
assignment
authorities
to
publish
assignment
policies.
For
more
information
about
URI
opacity,
see
TAG
issue
metaDataInURI-31
.
Story
When
navigating
within
the
XHTML
data
that
Nadia
receives
as
a
representation
of
the
resource
identified
by
"http://weather.example.com/oaxaca",
Nadia
finds
that
the
URI
"http://weather.example.com/oaxaca#tom"
refers
to
information
about
tomorrow's
weather
in
Oaxaca.
This
URI
includes
the
fragment
identifier
"tom"
(the
string
after
the
"#").
The
fragment
identifier
component
of
a
URI
allows
indirect
identification
of
a
secondary
resource
by
reference
to
a
primary
resource
and
additional
identifying
information.
The
secondary
resource
may
be
some
portion
or
subset
of
the
primary
resource,
some
view
on
representations
of
the
primary
resource,
or
some
other
resource
defined
or
described
by
those
representations.
The
interpretation
of
fragment
identifiers
is
discussed
in
the
section
on
media
types
and
fragment
identifier
semantics
.
See
TAG
issues
abstractComponentRefs-37
and
DerivedResources-43
.
There
remain
open
questions
regarding
identifiers
on
the
Web.
The
following
sections
identify
a
few
areas
of
future
work
in
the
Web
community.
The
integration
of
internationalized
identifiers
(i.e.,
composed
of
characters
beyond
those
allowed
by
[
URI
])
into
the
Web
architecture
is
an
important
and
open
issue.
See
TAG
issue
IRIEverywhere-27
for
discussion
about
work
going
on
in
this
area.
Emerging
Semantic
Web
technologies,
including
the
"Web
Ontology
Language
(OWL)"
[
OWL10
],
define
RDF
[
RDF10
]
properties
such
as
sameAs
to
assert
that
two
URIs
identify
the
same
resource
or
functionalProperty
to
imply
it.
One
consequence
of
this
direction
is
that
URIs
syntactically
different
can
be
used
to
identify
the
same
resource.
This
means
that
multiple
parties
may
create
representations
of
the
(same)
resource,
all
available
for
retrieval
using
multiple
URIs.
A
URI
owner's
rights
(e.g.,
to
provide
authoritative
representation
metadata)
extend
only
to
the
representations
served
for
requests
given
that
URI.
Note
also
that
to
URIs
that
are
sameAs
one
another
does
not
mean
they
are
interchangeable.
For
instance,
suppose
that
two
different
organizations
own
the
URIs
"http://weather.example.org/stations/oaxaca#ws17a"
and
"http://weather.example.com/rdfdump?region=oaxaca&station=ws17a".
The
URIs
might
both
identify
the
same
resource,
a
certain
collection
of
weather-measuring
equipment
shared
by
the
two
organizations.
Although
the
URIs
might
be
declared
"owl:sameAs"
each
other,
the
two
URI
owners
might
provide
very
different
content
when
the
URIs
are
dereferenced.
Communication
between
agents
over
a
network
about
resources
involves
URIs,
messages,
and
data.
Story
Nadia
follows
a
hypertext
link
labeled
"satellite
image"
expecting
to
retrieve
a
satellite
photo
of
the
Oaxaca
region.
The
link
to
the
satellite
image
is
an
XHTML
link
encoded
as
<a href="http://example.com/satimage/oaxaca">satellite image</a>
.
Nadia's
browser
analyzes
the
URI
and
determines
that
its
scheme
is
"http".
The
browser
configuration
determines
how
it
locates
the
identified
information,
which
might
be
via
a
cache
of
prior
retrieval
actions,
by
contacting
an
intermediary
(such
as
a
proxy
server),
or
by
direct
access
to
the
server
identified
by
a
portion
of
the
URI.
In
this
example,
the
browser
opens
a
network
connection
to
port
80
on
the
server
at
"example.com"
and
sends
a
"GET"
message
as
specified
by
the
HTTP
protocol,
requesting
a
representation
of
the
resource
identified
by
"/satimage/oaxaca".
The
server
sends
a
response
message
to
the
browser,
once
again
according
to
the
HTTP
protocol.
The
message
consists
of
several
headers
and
a
JPEG
image.
The
browser
reads
the
headers,
learns
from
the
"Content-Type"
field
that
the
Internet
media
type
of
the
representation
is
"image/jpeg",
reads
the
sequence
of
octets
that
make
up
the
representation
data,
and
renders
the
image.
This
section
describes
the
architectural
principles
and
constraints
regarding
interactions
between
agents,
including
such
topics
as
network
protocols
and
interaction
styles,
along
with
interactions
between
the
Web
as
a
system
and
the
people
that
make
use
of
it.
The
fact
that
the
Web
is
a
highly
distributed
system
affects
architectural
constraints
and
assumptions
about
interactions.
See
the
related
TAG
issue
httpRange-14
.
Agents
may
use
a
URI
to
access
the
referenced
resource;
this
is
called
dereferencing
the
URI
.
Access
may
take
many
forms,
including
retrieving
a
representation
of
the
state
of
the
resource
(for
instance,
by
using
HTTP
GET
or
HEAD),
adding
or
modifying
a
representation
of
the
state
of
the
resource
(for
instance,
by
using
HTTP
POST
or
PUT,
which
in
some
cases
may
change
the
actual
state
of
the
resource
if
the
submitted
representations
are
interpreted
as
instructions
to
that
end),
and
deleting
some
or
all
representations
of
the
state
of
the
resource
(for
instance,
by
using
HTTP
DELETE,
which
in
some
cases
may
result
in
the
deletion
of
the
resource
itself).
There
may
be
more
than
one
way
to
access
a
resource
for
a
given
URI;
application
context
determines
which
access
mechanism
an
agent
uses.
For
instance,
a
browser
might
use
HTTP
GET
to
retrieve
a
representation
of
a
resource,
whereas
a
link
checker
might
use
HTTP
HEAD
on
the
same
URI
simply
to
establish
whether
a
representation
is
available.
Some
URI
schemes
set
expectations
about
available
access
mechanisms,
others
(such
as
the
URN
scheme
[
RFC
2141
])
do
not.
Section
1.2.2
of
[
URI
]
discusses
the
separation
of
identification
and
interaction
in
more
detail.
For
more
information
about
relationships
between
multiple
access
mechanisms
and
URI
addressability,
see
the
TAG
finding
"
URIs,
Addressability,
and
the
use
of
HTTP
GET
and
POST
"
.
Although
many
URI
schemes
are
named
after
protocols,
this
does
not
imply
that
use
of
such
a
URI
will
necessarily
result
in
access
to
the
resource
via
the
named
protocol.
Even
when
an
agent
uses
a
URI
to
retrieve
a
representation,
that
access
might
be
through
gateways,
proxies,
caches,
and
name
resolution
services
that
are
independent
of
the
protocol
associated
with
the
scheme
name.
Dereferencing
a
URI
generally
involves
a
succession
of
steps
as
described
in
multiple
independent
specifications
and
implemented
by
the
agent.
The
following
example
illustrates
the
series
of
specifications
that
are
involved
when
a
user
instructs
a
user
agent
to
follow
a
hypertext
link
that
is
part
of
an
SVG
document.
In
this
example,
the
URI
is
"http://weather.example.com/oaxaca"
and
the
application
context
calls
for
the
user
agent
to
retrieve
and
render
a
representation
of
the
identified
resource.
-
Since
the
URI
is
part
of
a
hypertext
link
in
an
SVG
document,
the
first
relevant
specification
is
the
SVG
1.1
Recommendation
[
SVG11
].
Section
17.1
of
this
specification
imports
the
link
semantics
defined
in
XLink
1.0
[
XLink10
]:
"The
remote
resource
(the
destination
for
the
link)
is
defined
by
a
URI
specified
by
the
XLink
href
attribute
on
the
'a'
element."
The
SVG
specification
goes
on
to
state
that
interpretation
of
an
a
element
involves
retrieving
a
representation
of
a
resource,
identified
by
the
href
attribute
in
the
XLink
namespace:
"By
activating
these
links
(by
clicking
with
the
mouse,
through
keyboard
input,
voice
commands,
etc.),
users
may
visit
these
resources."
-
The
XLink
1.0
[
XLink10
]
specification,
which
defines
the
href
attribute
in
section
5.4,
states
that
"The
value
of
the
href
attribute
must
be
a
URI
reference
as
defined
in
[IETF
RFC
2396],
or
must
result
in
a
URI
reference
after
the
escaping
procedure
described
below
is
applied."
-
The
URI
specification
[
URI
]
states
that
"Each
URI
begins
with
a
scheme
name
that
refers
to
a
specification
for
assigning
identifiers
within
that
scheme."
The
URI
scheme
name
in
this
example
is
"http".
-
[
IANASchemes
]
states
that
the
"http"
scheme
is
defined
by
the
HTTP/1.1
specification
(RFC
2616
[
RFC2616
],
section
3.2.2).
-
In
this
SVG
context,
the
agent
constructs
an
HTTP
GET
request
(per
section
9.3
of
[
RFC2616
])
to
retrieve
the
representation.
-
Section
6
of
[
RFC2616
]
defines
how
the
server
constructs
a
corresponding
response
message,
including
the
'Content-Type'
field.
-
Section
1.4
of
[
RFC2616
]
states
"HTTP
communication
usually
takes
place
over
TCP/IP
connections."
This
example
does
not
address
that
step
in
the
process,
or
other
steps
such
as
Domain
Name
System
(
DNS
)
resolution.
-
The
agent
interprets
the
returned
representation
according
to
the
data
format
specification
that
corresponds
to
the
representation's
Internet
Media
Type
(the
value
of
the
HTTP
'Content-Type')
in
the
relevant
IANA
registry
[
MEDIATYPEREG
].
The
Web's
protocols
(including
HTTP,
FTP,
SOAP,
NNTP,
and
SMTP)
are
based
on
the
exchange
of
messages.
A
message
may
include
representation
data
as
well
as
metadata
about
the
resource
(such
as
the
"Alternates"
and
"Vary"
HTTP
headers),
the
representation,
and
the
message
itself
(such
as
the
"Transfer-encoding"
HTTP
header).
A
message
may
even
include
metadata
about
the
message
metadata
(for
message-integrity
checks,
for
instance).
Two
important
classes
of
message
are
those
that
request
a
representation
of
a
resource,
and
those
that
return
the
result
of
such
a
request.
Such
a
response
message
(for
example,
a
response
to
an
HTTP
GET)
includes
a
representation
of
the
resource.
A
representation
is
an
octet
sequence
that
consists
logically
of
two
parts:
-
Representation
data
,
data
about
resource
state,
expressed
in
one
or
more
formats
used
separately
or
in
combination,
and
-
Representation
metadata
.
One
important
piece
of
metadata
is
the
Internet
media
type
,
discussed
below.
Agents
use
representations
to
modify
as
well
as
retrieve
resource
state.
Note
that
even
though
the
response
to
an
HTTP
POST
request
may
contain
the
above
types
of
data,
the
response
to
an
HTTP
POST
request
is
not
necessarily
a
representation
of
the
resource
identified
in
the
POST
request.
The
Internet
media
type
[
RFC2046
])
of
a
representation
determines
which
data
format
specification(s)
provide
the
authoritative
interpretation
of
the
representation
data
(including
fragment
identifier
syntax
and
semantics
,
if
any).
The
IANA
registry
[
MEDIATYPEREG
]
maps
media
types
to
data
formats
.
See
the
TAG
finding
"
Internet
media
type
registration,
consistency
of
use
"
for
more
information
about
media
type
registration.
Story
In
one
of
his
XHTML
pages,
Dirk
links
to
an
image
that
Nadia
has
published
on
the
Web.
He
creates
a
hypertext
link
with
<a
href="http://www.example.com/images/nadia#hat">Nadia's
hat</a>
.
Nadia
serves
an
SVG
representation
of
the
image
(with
Internet
media
type
"image/svg+xml"),
so
the
authoritative
interpretation
of
the
fragment
identifier
"hat"
depends
on
the
SVG
specification.
Per
[
URI
],
given
a
URI
"U#F",
and
a
representation
retrieved
by
dereferencing
URI
"U"
(which
is
authoritative),
the
(
secondary
)
resource
identified
by
"U#F"
is
determined
by
interpreting
"F"
according
to
the
specification
associated
with
the
Internet
media
type
of
the
representation
data.
Thus,
in
the
case
of
Dirk
and
Nadia,
the
authoritative
interpretation
of
the
fragment
identifier
is
given
by
the
SVG
specification,
not
the
XHTML
specification
(i.e.,
the
context
where
the
URI
appears).
The
semantics
of
a
fragment
identifier
are
defined
by
the
set
of
representations
that
might
result
from
a
retrieval
action
on
the
primary
resource.
The
fragment's
format
and
resolution
is
therefore
dependent
on
the
media
type
[
RFC2046
]
of
a
potentially
retrieved
representation,
even
though
such
a
retrieval
is
only
performed
if
the
URI
is
dereferenced.
If
no
such
representation
exists,
then
the
semantics
of
the
fragment
are
considered
unknown
and,
effectively,
unconstrained.
Fragment
identifier
semantics
are
independent
of
the
URI
scheme
and
thus
cannot
be
redefined
by
URI
scheme
specifications.
Interpretation
of
the
fragment
identifier
during
a
retrieval
action
is
performed
solely
by
the
agent;
the
fragment
identifier
is
not
passed
to
other
systems
during
the
process
of
retrieval.
This
means
that
some
intermediaries
in
the
Web
architecture
(such
as
proxies)
have
no
interaction
with
fragment
identifiers
and
that
redirection
(in
HTTP
[
RFC2616
],
for
example)
does
not
account
for
them.
Note
also
that
since
dereferencing
a
URI
(e.g.,
using
HTTP)
does
not
involve
sending
a
fragment
identifier
to
a
server
or
other
agent,
certain
access
methods
(e.g.,
HTTP
PUT,
POST,
and
DELETE)
cannot
be
used
to
interact
with
secondary
resources.
As
with
any
URI,
use
of
a
fragment
identifier
component
does
not
imply
that
a
retrieval
action
will
take
place.
A
URI
with
a
fragment
identifier
may
be
used
to
refer
to
the
secondary
resource
without
any
implication
that
the
primary
resource
is
accessible
or
will
ever
be
accessed.
One
may
compare
URIs
with
fragment
identifiers
without
a
retrieval
action.
Parties
that
draw
conclusions
about
the
interpretation
of
a
fragment
identifier
without
retrieving
a
representation
do
so
at
their
own
risk;
such
interpretations
are
not
authoritative.
Story
Dirk
informs
Nadia
that
he
would
also
like
her
to
make
her
images
available
in
formats
other
than
SVG.
For
the
same
resource,
Nadia
makes
available
a
PNG
image
as
well.
Dirk's
user
agent
and
Nadia's
server
negotiate
so
that
the
user
agent
retrieves
a
suitable
representation.
Which
specification
specifies
the
authoritative
interpretation
of
the
"hat"
fragment
identifier,
the
PNG
specification
or
the
SVG
specification?
For
a
given
resource,
an
agent
may
have
the
choice
between
representation
data
in
more
than
one
data
format
(through
HTTP
content
negotiation,
for
example).
Individual
data
formats
may
define
their
own
restrictions
on,
or
structure
within,
the
fragment
identifier
syntax
for
specifying
different
types
of
subsets,
views,
or
external
references
that
are
identifiable
as
secondary
resources
by
that
media
type.
If
the
primary
resource
has
multiple
representations,
as
is
often
the
case
for
resources
whose
representation
is
selected
based
on
attributes
of
the
retrieval
request
("content
negotiation"),
then
whatever
is
identified
by
the
fragment
should
be
consistent
across
all
of
those
representations:
each
representation
should
either
define
the
fragment
such
that
it
corresponds
to
the
same
secondary
resource,
regardless
of
how
it
is
represented,
or
the
fragment
should
be
left
undefined
by
the
representation
(i.e.,
not
found).
Suppose,
for
example,
that
the
owner
of
"http://weather.example.com/oaxaca/map#zicatela"
provides
representations
of
the
resource
identified
by
http://weather.example.com/oaxaca/map
using
three
image
formats:
SVG,
PNG,
and
JPEG/JFIF.
The
SVG
specification
defines
semantics
for
fragment
identifiers
while
the
other
specifications
do
not.
It
is
not
considered
an
error
that
only
one
of
the
data
formats
specifies
semantics
for
the
fragment
identifier.
Because
the
Web
is
a
distributed
system
in
which
formats
and
agents
are
deployed
in
a
non-uniform
manner,
the
architecture
allows
this
sort
of
discrepancy.
This
design
allows
content
authors
to
take
advantage
of
new
data
formats
while
still
ensuring
reasonable
backward-compatibility
for
users
whose
agents
do
not
yet
implement
them.
Good
practice:
Fragment
identifier
consistency
The
owner
of
a
URI
with
a
fragment
identifier
who
uses
content
negotiation
to
serve
multiple
representations
of
the
identified
resource
SHOULD
NOT
serve
representations
with
inconsistent
fragment
identifier
semantics.
URI
overloading
is
one
possible
consequence
of
inconsistent
fragment
identifier
semantics.
See
related
TAG
issues
httpRange-14
and
RDFinXHTML-35
.
Successful
communication
between
two
parties
using
a
piece
of
information
relies
on
shared
understanding
of
the
meaning
of
the
information.
Arbitrary
numbers
of
independent
parties
can
identify
and
communicate
about
a
resource.
To
give
these
parties
the
confidence
that
they
are
all
talking
about
the
same
thing
when
they
refer
to
"the
resource
identified
by
the
following
URI
..."
the
design
choice
for
the
Web
is,
in
general,
that
metadata
provided
by
a
message
sender
is
authoritative.
When
a
message
is
a
response
to
a
request
for
a
representation
of
a
resource
identified
by
a
given
URI,
the
representation
metadata
provided
by
the
owner
of
that
URI
is
authoritative
for
that
representation
data.
In
our
travel
scenario
,
the
owner
of
"http://weather.example.com/oaxaca"
provides
the
authoritative
metadata
for
representations
retrieved
for
that
URI.
Precisely
which
representation(s)
Nadia
receives
depends
on
a
number
of
factors,
including:
-
Whether
the
authority
responsible
for
"weather.example.com"
responds
to
requests
at
all;
-
Whether
the
authority
responsible
for
"weather.example.com"
makes
available
one
or
more
representations
for
the
resource
identified
by
"http://weather.example.com/oaxaca";
-
Whether
Nadia
has
access
privileges
to
such
representations
(see
the
section
on
linking
and
access
control
);
-
If
the
authority
responsible
for
"weather.example.com"
has
provided
more
than
one
representation
(in
different
formats
such
as
HTML,
PNG,
or
RDF;
in
different
languages
such
as
English
and
Spanish;
or
transformed
dynamically
according
to
the
hardware
or
software
capabilities
of
the
recipient),
the
resulting
representation
may
depend
on
negotiation
between
the
user
agent
and
server
that
occurs
as
part
of
the
HTTP
transaction.
-
When
Nadia
made
the
request.
Since
the
weather
in
Oaxaca
changes,
Nadia
should
expect
that
representations
will
change
over
time.
Note
that
the
choice
and
expressive
power
of
a
format
can
affect
how
precisely
a
representation
provider
communicates
resource
state.
The
use
of
natural
language
to
communicate
resource
state
may
lead
to
ambiguity
about
what
the
associated
resource
is.
This
ambiguity
can
in
turn
lead
to
URI
overloading
.
See
TAG
issues
contentTypeOverride-24
and
rdfURIMeaning-39
.
Inconsistencies
between
the
data
format
of
representation
data
and
assigned
representation
metadata
do
occur.
Examples
that
have
been
observed
in
practice
include:
-
The
actual
character
encoding
of
a
representation
(e.g.,
"iso-8859-1",
specified
by
the
encoding
attribute
in
an
XML
declaration)
is
inconsistent
with
the
charset
parameter
in
the
representation
metadata
(e.g.,
"utf-8",
specified
by
the
'Content-Type'
field
in
an
HTTP
header).
-
The
namespace
of
the
root
element
of
XML
representation
data
(e.g.,
as
specified
by
the
"xmlns"
attribute)
is
inconsistent
with
the
value
of
the
'Content-Type'
field
in
an
HTTP
header.
Agents
should
detect
such
inconsistencies
but
should
not
resolve
them
without
the
consent
of
the
user;
see
the
section
on
error
handling
for
more
information.
Principle:
Authoritative
metadata
Agents
MUST
NOT
ignore
authoritative
metadata
without
the
consent
of
the
user.
Thus,
for
example,
if
the
parties
responsible
for
"weather.example.com"
mistakenly
label
the
satellite
photo
of
Oaxaca
as
"image/gif"
instead
of
"image/jpeg",
and
if
Nadia's
browser
detects
a
problem,
Nadia's
browser
must
not
ignore
the
problem
(e.g.,
by
simply
rendering
the
JPEG
image)
without
Nadia's
consent.
Nadia's
browser
can
notify
Nadia
of
the
problem
or
notify
Nadia
and
take
corrective
action.
Of
course,
user
agent
developers
should
not
ignore
usability
issues
when
handling
this
type
of
error;
notification
may
be
discreet,
and
handling
may
be
tuned
to
meet
the
user's
preferences.
See
the
TAG
finding
"
Client
handling
of
MIME
headers
"
for
more
in-depth
discussion
and
examples.
Furthermore,
representation
providers
can
help
reduce
the
risk
of
error
through
careful
assignment
of
representation
metadata
(especially
that
which
applies
across
representations).
The
section
on
media
types
for
XML
presents
an
example
of
reducing
the
risk
of
error
by
providing
no
metadata
about
character
encoding
when
serving
XML.
Nadia's
retrieval
of
weather
information
(an
example
of
a
read-only
query
or
lookup)
qualifies
as
a
"safe"
interaction;
a
safe
interaction
is
one
where
the
agent
does
not
incur
any
obligation
beyond
the
interaction.
An
agent
may
incur
an
obligation
through
other
means
(such
as
by
signing
a
contract).
If
an
agent
does
not
have
an
obligation
before
a
safe
interaction,
it
does
not
have
that
obligation
afterwards.
Other
Web
interactions
resemble
orders
more
than
queries.
These
unsafe
interactions
may
cause
a
change
to
the
state
of
a
resource
and
the
user
may
be
held
responsible
for
the
consequences
of
these
interactions.
Unsafe
interactions
include
subscribing
to
a
newsletter,
posting
to
a
list,
or
modifying
a
database.
Note:
In
this
context,
the
word
"unsafe"
does
not
mean
"dangerous";
the
term
"safe"
is
used
in
section
9.1.1
of
[
RFC2616
]
and
"unsafe"
is
the
natural
opposite.
Story
Nadia
decides
to
book
a
vacation
to
Oaxaca
at
"booking.example.com."
She
enters
data
into
a
series
of
online
forms
and
is
ultimately
asked
for
credit
card
information
to
purchase
the
airline
tickets.
She
provides
this
information
in
another
form.
When
she
presses
the
"Purchase"
button,
her
browser
opens
another
network
connection
to
the
server
at
"booking.example.com"
and
sends
a
message
composed
of
form
data
using
the
POST
method.
This
is
an
unsafe
interaction
;
Nadia
wishes
to
change
the
state
of
the
system
by
exchanging
money
for
airline
tickets.
The
server
reads
the
POST
request,
and
after
performing
the
booking
transaction
returns
a
message
to
Nadia's
browser
that
contains
a
representation
of
the
results
of
Nadia's
request.
The
representation
data
is
in
XHTML
so
that
it
can
be
saved
or
printed
out
for
Nadia's
records.
Safe
interactions
are
important
because
these
are
interactions
where
users
can
browse
with
confidence
and
where
agents
(including
search
engines
and
browsers
that
pre-cache
data
for
the
user)
can
follow
links
safely.
Users
(or
agents
acting
on
their
behalf)
do
not
commit
themselves
to
anything
by
querying
a
resource
or
following
a
link.
Principle:
Safe
retrieval
Agents
do
not
incur
obligations
by
retrieving
a
representation.
For
instance,
it
is
incorrect
to
publish
a
link
that,
when
followed,
subscribes
a
user
to
a
mailing
list.
Remember
that
search
engines
may
follow
such
links.
For
more
information
about
safe
and
unsafe
operations
using
HTTP
GET
and
POST,
and
handling
security
concerns
around
the
use
of
HTTP
GET,
see
the
TAG
finding
"
URIs,
Addressability,
and
the
use
of
HTTP
GET
and
POST
"
.
Story
Nadia
pays
for
her
airline
tickets
online
(through
a
POST
interaction
as
described
above).
She
receives
a
Web
page
with
confirmation
information
and
wishes
to
bookmark
it
so
that
she
can
refer
to
it
when
she
calculates
her
expenses.
Although
Nadia
can
print
out
the
results,
or
save
them
to
a
file,
she
would
also
like
to
bookmark
them.
Note
that
neither
the
data
transmitted
with
the
POST
nor
the
data
received
in
the
response
necessarily
correspond
to
any
resource
identified
by
a
URI.
Although
HTTP
includes
mechanisms
to
allow
representation
providers
to
assign
a
URI
to
POST
results,
the
mechanism
is
not
widely
deployed.
Thus,
in
practice,
Nadia
cannot
bookmark
her
commitment
to
pay
(expressed
via
the
POST
request)
or
the
airline
company's
acknowledgment
and
commitment
(expressed
via
the
response
to
the
POST).
It
is
a
breakdown
of
the
Web
architecture
if
agents
cannot
use
URIs
to
reconstruct
a
"paper
trail"
of
transaction
results,
i.e.,
to
refer
to
receipts
and
other
evidence
of
accepting
an
obligation.
Indeed,
each
electronic
mail
message
includes
a
unique
message
identifier,
one
reason
why
email
is
so
useful
for
managing
accountability
(since,
for
example,
email
can
be
copied
to
public
archives).
On
the
other
hand,
HTTP
servers
and
deployed
user
agents
do
not
generally
keep
records
of
POST
transactions,
making
it
difficult
for
all
parties
to
reconstruct
a
series
of
transactions.
There
are
mechanisms
in
HTTP,
not
widely
deployed,
to
remedy
this
situation.
HTTP
servers
can
assign
a
URI
to
the
results
of
a
POST
transaction
using
the
"Content-Location"
header
(described
in
section
14.14
of
[
RFC2616
]),
and
allow
authorized
parties
to
retrieve
a
record
of
the
transaction
thereafter
via
this
URI
(the
value
of
URI
persistence
is
apparent
in
this
case).
User
agents
can
provide
an
interface
for
managing
transactions
where
the
user
agent
has
incurred
an
obligation
on
behalf
of
the
user.
Story
Since
Nadia
finds
the
Oaxaca
weather
site
useful,
she
emails
a
review
to
her
friend
Dirk
recommending
that
he
check
out
'http://weather.example.com/oaxaca'.
Dirk
clicks
on
the
link
in
the
email
he
receives
and
is
frustrated
by
a
"404
page
not
found".
Dirk
tries
again
the
next
day
and
receives
a
representation
with
"news"
that
is
two-weeks
old.
He
tries
one
more
time
the
next
day
only
to
receive
a
representation
that
claims
that
the
weather
in
Oaxaca
is
sunny,
even
though
his
friends
in
Oaxaca
tell
him
by
phone
that
it
in
fact
it
is
raining
(and
he
trusts
them
more
than
he
trusts
the
Web
site
in
question).
Dirk
and
Nadia
conclude
that
the
URI
owners
are
unreliable.
unpredictable.
Although
the
URI
owner
has
chosen
the
Web
as
a
communication
medium,
they
have
lost
two
customers
due
to
ineffective
resource
management.
The
usefulness
of
a
URI
depends
on
good
management
by
its
owner.
As
is
the
case
with
many
human
interactions,
confident
interactions
with
a
resource
depend
on
stability
and
predictability.
The
value
of
a
URI
increases
with
the
predictability
of
interactions
using
that
URI.
Avoiding
unnecessary
URI
aliases
is
one
aspect
of
proper
resource
management.
Good
practice:
Consistent
representation
A
URI
owner
SHOULD
provide
representations
of
the
identified
resource
consistently
and
predictably.
This
section
discusses
important
aspects
of
representation
management.
A
URI
owner
may
supply
zero
or
more
representations
of
the
resource
identified
by
that
URI.
That
agent
is
also
responsible
for
accepting
or
rejecting
requests
to
modify
the
resource
identified
by
that
URI,
for
example,
by
configuring
a
server
to
accept
or
reject
HTTP
PUT
data
based
on
Internet
media
type,
validity
constraints,
or
other
constraints.
Good
practice:
Available
representation
A
URI
owner
SHOULD
provide
representations
of
the
identified
resource.
For
example,
the
owner
of
an
XML
Namespace
should
provide
a
<a href="#namespace-documents" shape="rect">
Namespace
Document
namespace
representations
;
below
we
discuss
useful
characteristics
of
a
Namespace
Document.
namespace
representation.
There
are
strong
social
expectations
that
once
a
URI
identifies
a
particular
resource,
it
should
continue
indefinitely
to
refer
to
that
resource;
this
is
called
URI
persistence
.
URI
persistence
is
a
matter
of
policy
and
commitment
on
the
part
of
authorities
servicing
URIs.
The
choice
of
a
particular
URI
scheme
provides
no
guarantee
that
those
URIs
will
be
persistent
or
that
they
will
not
be
persistent.
Since
representations
are
used
to
communicate
resource
state,
persistence
is
directly
affected
by
how
well
representations
are
served.
Service
breakdowns
include:
-
Inconsistent
representations
served.
Note
the
difference
between
a
URI
owner
changing
representations
predictably
in
light
of
the
nature
of
the
resource
(the
changing
weather
of
Oaxaca)
and
the
URI
owner
changing
representations
arbitrarily.
-
Improper
use
of
content
negotiation,
such
as
serving
two
images
as
equivalent
through
HTTP
content
negotiation,
where
one
image
represents
a
square
and
the
other
a
circle.
HTTP
[
RFC2616
]
has
been
designed
to
help
manage
URIs.
For
example,
HTTP
redirection
(using
the
3xx
response
codes)
permits
servers
to
tell
an
agent
that
further
action
needs
to
be
taken
by
the
agent
in
order
to
fulfill
the
request
(for
example,
the
resource
has
been
assigned
a
new
URI).
In
addition,
content
negotiation
also
promotes
consistency,
as
a
site
manager
is
not
required
to
define
new
URIs
when
adding
support
for
a
new
format
specification.
Protocols
that
do
not
support
content
negotiation
(such
as
FTP)
require
a
new
identifier
when
a
new
data
format
is
introduced.
For
more
discussion
about
URI
persistence,
see
[
Cool
].
It
is
reasonable
to
limit
access
to
a
resource
(for
commercial
or
security
reasons,
for
example),
but
it
is
unreasonable
to
prohibit
others
from
merely
identifying
the
resource.
As
an
analogy:
The
owners
of
a
building
might
have
a
policy
that
the
public
may
only
enter
the
building
via
the
main
front
door,
and
only
during
business
hours.
People
who
work
in
the
building
and
who
make
deliveries
to
it
might
use
other
doors
as
appropriate.
Such
a
policy
would
be
enforced
by
a
combination
of
security
personnel
and
mechanical
devices
such
as
locks
and
pass-cards.
One
would
not
enforce
this
policy
by
hiding
some
of
the
building
entrances,
nor
by
requesting
legislation
requiring
the
use
of
the
front
door
and
forbidding
anyone
to
reveal
the
fact
that
there
are
other
doors
to
the
building.
Story
Nadia
and
Dirk
both
subscribe
to
the
"weather.example.com"
newsletter.
Nadia
wishes
to
point
out
an
article
of
particular
interest
to
Dirk,
using
a
URI.
The
authority
responsible
for
"weather.example.com"
can
offer
newsletter
subscribers
such
as
Nadia
and
Dirk
the
benefits
of
URIs
(such
as
bookmarking
and
linking)
and
still
limit
access
to
the
newsletter
to
authorized
parties.
The
Web
provides
several
mechanisms
to
control
access
to
resources;
these
mechanisms
do
not
rely
on
hiding
or
suppressing
URIs
for
those
resources.
For
more
information,
see
the
TAG
finding
"
'Deep
Linking'
in
the
World
Wide
Web
"
.
There
remain
open
questions
regarding
Web
interactions.
The
TAG
expects
future
versions
of
this
document
to
address
in
more
detail
the
relationship
between
the
architecture
described
herein,
Web
Services
,
the
Semantic
Web
,
peer-to-peer
systems
(including
Freenet
,
MLdonkey
,
and
NNTP
[
RFC977
]),
instant
messaging
systems
(including
[
XMPP
]),
and
voice-over-IP
(including
RTSP
[
RFC2326
]).
A
data
format
(including
XHTML,
CSS,
PNG,
XLink,
RDF/XML,
and
SMIL
animation)
specifies
the
interpretation
of
representation
data
.
The
first
data
format
used
on
the
Web
was
HTML.
Since
then,
data
formats
have
grown
in
number.
The
Web
architecture
does
not
constrain
which
data
formats
content
providers
can
use.
This
flexibility
is
important
because
there
is
constant
evolution
in
applications,
resulting
in
new
data
formats
and
refinements
of
existing
formats.
Although
the
Web
architecture
allows
for
the
deployment
of
new
data
formats,
the
creation
and
deployment
of
new
formats
(and
agents
able
to
handle
them)
is
expensive.
Thus,
before
inventing
a
new
data
format,
designers
should
carefully
consider
re-using
one
that
is
already
available.
For
a
data
format
to
be
usefully
interoperable
between
two
parties,
the
parties
must
agree
(to
a
reasonable
extent)
about
its
syntax
and
semantics.
Shared
understanding
of
a
data
format
promotes
interoperability
but
does
not
imply
constraints
on
usage;
for
instance,
a
data
sender
cannot
count
on
being
able
to
constrain
the
behavior
of
a
data
receiver.
Below
we
describe
some
characteristics
of
a
data
format
that
facilitate
integration
into
the
Web
architecture.
This
document
does
not
address
generally
beneficial
characteristics
of
a
specification
such
as
readability,
simplicity,
attention
to
programmer
goals,
attention
to
user
needs,
accessibility,
nor
internationalization.
The
section
on
architectural
specifications
includes
references
to
additional
format
specification
guidelines.
Binary
data
formats
are
those
in
which
portions
of
the
data
are
encoded
for
direct
use
by
computer
processors,
for
example
thirty-two
bit
little-endian
two's-complement
and
sixty-four
bit
IEEE
double-precision
floating-point.
The
portions
of
data
so
represented
include
numeric
values,
pointers,
and
compressed
data
of
all
sorts.
A
textual
data
format
is
one
in
which
the
data
is
specified
as
a
sequence
of
characters.
HTML,
Internet
e-mail,
and
all
XML-based
formats
are
textual.
Increasingly,
internationalized
textual
data
formats
refer
to
the
Unicode
repertoire
[
UNICODE
]
for
character
definitions.
In
principle,
all
data
can
be
represented
using
textual
formats.
The
trade-offs
between
binary
and
textual
data
formats
are
complex
and
application-dependent.
Binary
formats
can
be
substantially
more
compact,
particularly
for
complex
pointer-rich
data
structures.
Also,
they
can
be
consumed
more
rapidly
by
agents
in
those
cases
where
they
can
be
loaded
into
memory
and
used
with
little
or
no
conversion.
Textual
formats
are
usually
more
portable
and
interoperable.
Textual
formats
also
have
the
considerable
advantage
that
they
can
be
directly
read
and
understood
by
human
beings.
This
can
simplify
the
tasks
of
creating
and
maintaining
software,
and
allow
the
direct
intervention
of
humans
in
the
processing
chain
without
recourse
to
tools
more
complex
than
the
ubiquitous
text
editor.
Finally,
it
simplifies
the
necessary
human
task
of
learning
about
new
data
formats;
this
is
called
the
"view
source"
effect
.
It
is
important
to
emphasize
that
intuition
as
to
such
matters
as
data
size
and
processing
speed
is
not
a
reliable
guide
in
data
format
design;
quantitative
studies
are
essential
to
a
correct
understanding
of
the
trade-offs.
Therefore,
designers
of
a
data
format
specification
should
make
a
considered
choice
between
binary
and
textual
format
design.
Note:
Text
(i.e.,
a
sequence
of
characters
from
a
repertoire)
is
distinct
from
serving
data
with
a
media
type
beginning
with
"text/".
Although
XML-based
formats
are
textual,
many
XML-based
formats
do
not
consist
primarily
of
phrases
in
natural
language.
See
the
section
on
media
types
for
XML
for
issues
that
arise
when
"text/"
is
used
in
conjunction
with
an
XML-based
format.
See
TAG
issue
binaryXML-30
.
Extensibility
and
versioning
are
strategies
to
help
manage
the
natural
evolution
of
information
on
the
Web
and
technologies
used
to
represent
that
information.
For
more
information
about
versioning
strategies
and
agent
behavior
in
the
face
of
unrecognized
extensions,
see
TAG
issue
XMLVersioning-41
and
"Web
Architecture:
Extensible
Languages"
[
EXTLANG
].
There
is
typically
a
(long)
transition
period
during
which
multiple
versions
of
a
format,
protocol,
or
agent
are
simultaneously
in
use.
Good
practice:
Version
information
A
format
specification
SHOULD
provide
for
version
information
in
language
instances.
Story
Nadia
and
Dirk
are
designing
an
XML
data
format
to
encode
data
about
the
film
industry.
They
provide
for
extensibility
by
using
XML
namespaces
and
creating
a
schema
that
allows
the
inclusion,
in
certain
places,
of
elements
from
any
namespace.
When
they
revise
their
format,
Nadia
proposes
a
new
optional
"lang"
attribute
on
the
"film"
element.
Dirk
feels
that
such
a
change
requires
them
to
assign
a
new
namespace
name,
which
might
require
changes
to
deployed
software.
Nadia
explains
to
Dirk
that
their
choice
of
extensibility
strategy
in
conjunction
with
their
namespace
policy
allows
certain
changes
that
do
not
affect
conformance
of
existing
content
and
software,
and
thus
no
change
to
the
namespace
identifier
is
required.
They
choose
this
policy
to
help
them
meet
their
goals
of
reducing
the
cost
of
change.
Dirk
and
Nadia
have
chosen
a
particular
namespace
change
policy
that
allows
them
to
avoid
changing
the
namespace
name
whenever
they
make
changes
that
do
not
affect
conformance
of
deployed
content
and
software.
They
might
have
chosen
a
different
policy,
for
example
that
any
new
element
or
attribute
has
to
belong
to
a
namespace
other
than
the
original
one.
Whatever
the
chosen
policy,
it
should
set
clear
expectations
for
users
of
the
format.
Good
practice:
Namespace
policy
A
format
specification
SHOULD
include
information
about
change
policies
for
XML
namespaces.
As
an
example
of
a
change
policy
designed
to
reflect
the
variable
stability
of
a
namespace,
consider
the
W3C
namespace
policy
for
documents
on
the
W3C
Recommendation
track.
The
policy
sets
expectations
that
the
Working
Group
responsible
for
the
namespace
may
modify
it
in
any
way
until
a
certain
point
in
the
process
("Candidate
Recommendation")
at
which
point
W3C
constrains
the
set
of
possible
changes
to
the
namespace
in
order
to
promote
stable
implementations.
Note
that
since
namespace
names
are
URIs,
the
owner
of
a
namespace
URI
has
the
authority
to
decide
the
namespace
change
policy.
Designers
can
facilitate
the
transition
process
by
making
careful
choices
about
extensibility
during
the
design
of
a
language
or
protocol
specification.
Good
practice:
Extensibility
mechanisms
A
specification
SHOULD
provide
mechanisms
that
allow
any
party
to
create
extensions
that
do
not
interfere
with
conformance
to
the
original
specification.
Application
needs
determine
the
most
appropriate
extension
strategy
for
a
specification.
For
example,
applications
designed
to
operate
in
closed
environments
may
allow
specification
designers
to
define
a
versioning
strategy
that
would
be
impractical
at
the
scale
of
the
Web.
As
part
of
defining
an
extensibility
mechanism,
specification
designers
should
set
expectations
about
agent
behavior
in
the
face
of
unrecognized
extensions.
Good
practice:
Unknown
extensions
A
specification
SHOULD
specify
agent
behavior
in
the
face
of
unrecognized
extensions.
Two
strategies
have
emerged
as
being
particularly
useful:
-
"Must
ignore":
The
agent
ignores
any
content
it
does
not
recognize.
-
"Must
understand":
The
agent
treats
unrecognized
markup
as
an
error
condition.
A
powerful
design
approach
is
for
the
language
to
allow
either
form
of
extension,
but
to
distinguish
explicitly
between
them
in
the
syntax.
Additional
strategies
include
prompting
the
user
for
more
input,
automatically
retrieving
data
from
available
links,
and
falling
back
to
default
behavior.
More
complex
strategies
are
also
possible,
including
mixing
strategies.
For
instance,
a
language
can
include
mechanisms
for
overriding
standard
behavior.
Thus,
a
data
format
can
specify
"must
ignore"
semantics
but
also
allow
people
to
create
extensions
that
override
that
semantics
in
light
of
application
needs
(for
instance,
with
"must
understand"
semantics
for
a
particular
extension).
Extensibility
is
not
free.
Providing
hooks
for
extensibility
is
one
of
many
requirements
to
be
factored
into
the
costs
of
language
design.
Experience
suggests
that
the
long
term
benefits
of
extensibility
generally
outweigh
the
costs.
Many
modern
data
format
include
mechanisms
for
composition.
For
example:
-
It
is
possible
to
embed
text
comments
in
some
image
formats,
such
as
JPEG/JFIF.
Although
these
comments
are
embedded
in
the
containing
data,
they
have
little
or
no
effect
on
the
display
of
the
image.
-
There
are
container
formats
such
as
SOAP
which
fully
expect
to
be
composed
from
multiple
namespaces
but
which
provide
an
overall
semantic
relationship
of
message
envelope
and
payload.
-
RDF
allows
well-defined
mixing
of
vocabularies,
and
allows
text
and
XML
to
be
used
as
a
data
type
values
within
a
statement
having
clearly
defined
semantics.
These
relationships
can
be
mixed
and
nested
arbitrarily.
In
principle,
a
SOAP
message
can
contain
an
SVG
image
that
contains
an
RDF
comment
which
refers
to
a
vocabulary
of
terms
for
describing
the
image.
Note
however,
that
for
general
XML
there
is
no
semantic
model
that
defines
the
interactions
within
XML
documents
with
elements
and/or
attributes
from
a
variety
of
namespaces.
Each
application
must
define
how
namespaces
interact
and
what
effect
the
namespace
of
an
element
has
on
the
element's
ancestors,
siblings,
and
descendants.
See
TAG
issues
mixedUIXMLNamespace-33
,
xmlFunctions-34
,
and
RDFinXHTML-35
.
The
Web
is
a
heterogeneous
environment
where
a
wide
variety
of
agents
provide
access
to
content
to
users
with
a
wide
variety
of
capabilities.
It
is
good
practice
for
authors
to
create
content
that
can
reach
the
widest
possible
audience,
including
users
with
graphical
desktop
computers,
hand-held
devices
and
mobile
phones,
users
with
disabilities
who
may
require
speech
synthesizers,
and
devices
not
yet
imagined.
Furthermore,
authors
cannot
predict
in
some
cases
how
an
agent
will
display
or
process
their
content.
Experience
shows
that
the
separation
of
content,
presentation,
and
interaction
promotes
the
reuse
and
device-independence
of
content;
his
follows
from
the
principle
of
independent
specifications
.
For
more
information
about
principles
of
device-independence,
see
[
DIPRINCIPLES
].
Good
practice:
Separation
of
content,
presentation,
interaction
A
specification
SHOULD
allow
authors
to
separate
content
from
both
presentation
and
interaction
concerns.
Note
that
when
content,
presentation,
and
interaction
are
separated
by
design,
agents
need
to
recombine
them.
There
is
a
recombination
spectrum,
with
"client
does
all"
at
one
end
and
"server
does
all"
at
the
other.
There
are
advantages
to
each:
recombination
on
the
server
allows
the
server
to
send
out
generally
smaller
amounts
of
data
that
can
be
tailored
to
specific
devices
(such
as
mobile
phones).
However,
such
data
will
not
be
readily
reusable
by
other
clients
and
may
not
allow
client-side
agents
to
perform
useful
tasks
unanticipated
by
the
author.
When
a
client
does
the
work
of
recombination,
content
is
likely
to
be
more
reusable
by
a
broader
audience
and
more
robust.
However,
such
data
may
be
of
greater
size
and
may
require
more
computation
by
the
client.
Of
course,
it
may
not
always
be
desirable
to
reach
the
widest
possible
audience.
Designers
should
consider
appropriate
technologies
for
limiting
the
audience.
For
instance
digital
signature
technology,
access
control
,
and
other
technologies
are
appropriate
for
controlling
access
to
content.
Some
data
formats
are
designed
to
describe
presentation
(including
SVG
and
XSL
Formatting
Objects).
Data
formats
such
as
these
demonstrate
that
one
can
only
separate
content
from
presentation
(or
interaction)
so
far;
at
some
point
it
becomes
necessary
to
talk
about
presentation.
Per
the
principle
of
independent
specifications,
these
data
formats
should
only
address
presentation
issues.
See
the
TAG
issues
formattingProperties-19
and
contentPresentation-26
.
A
defining
characteristic
of
the
Web
is
that
it
allows
embedded
references
to
other
resources
via
URIs.
The
simplicity
of
creating
links
using
absolute
URIs
(
<a
href="http://www.example.com/foo">
)
and
relative
URI
references
(
<a
href="foo">
and
<a
href="foo#anchor">
)
is
partly
(perhaps
largely)
responsible
for
the
birth
of
the
hypertext
Web
as
we
know
it
today.
When
one
resource
(representation)
refers
to
another
resource
with
a
URI,
this
constitutes
a
link
between
the
two
resources.
Additional
metadata
may
also
form
part
of
the
link
(see
[
XLink10
],
for
example).
Good
practice:
Link
mechanisms
A
specification
SHOULD
provide
mechanisms
for
identifying
links
to
other
resources
and
to
portions
of
representation
data
(via
fragment
identifiers).
Good
practice:
Web
linking
A
specification
SHOULD
provide
mechanisms
that
allow
Web-wide
linking,
not
just
internal
document
linking.
Good
practice:
Generic
URIs
A
specification
SHOULD
allow
content
authors
to
use
URIs
without
constraining
them
to
a
limited
set
of
URI
schemes.
What
agents
do
with
a
hypertext
link
is
not
constrained
by
Web
architecture
and
may
depend
on
application
context.
Users
of
hypertext
links
expect
to
be
able
to
navigate
links
among
representations.
Data
formats
that
do
not
allow
content
authors
to
create
hypertext
links
lead
to
the
creation
of
"terminal
nodes"
on
the
Web.
Good
practice:
Hypertext
links
A
data
format
SHOULD
incorporate
hypertext
links
if
hypertext
is
the
expected
user
interface
paradigm.
Links
are
commonly
expressed
using
URI
references
(defined
in
section
4.2
of
[
URI
]),
which
may
be
combined
with
a
base
URI
to
yield
a
usable
URI.
Section
5.1
of
[
URI
]
explains
different
mechanisms
for
establishing
a
base
URI
for
a
resource
and
establishes
a
precedence
among
the
various
mechanisms.
For
instance,
the
base
URI
may
be
a
URI
for
the
resource,
or
specified
in
a
representation
(see
the
base
elements
provided
by
HTML
and
XML,
and
the
HTTP
'Content-Location'
header).
See
also
the
section
on
links
in
XML
.
Agents
resolve
a
URI
reference
before
using
the
resulting
URI
to
interact
with
another
agent.
URI
references
help
in
content
management
by
allowing
content
authors
to
design
a
representation
locally,
i.e.,
without
concern
for
which
global
identifier
may
later
be
used
to
refer
to
the
associated
resource.
Many
data
formats
are
XML-based
,
that
is
to
say
they
conform
to
the
syntax
rules
defined
in
the
XML
specification
[XML10]
.
This
section
discusses
issues
that
are
specific
to
such
formats.
Anyone
seeking
guidance
in
this
area
is
urged
to
consult
the
"Guidelines
For
the
Use
of
XML
in
IETF
Protocols"
[IETFXML]
,
which
contains
a
thorough
discussion
of
the
considerations
that
govern
whether
or
not
XML
ought
to
be
used,
as
well
as
specific
guidelines
on
how
it
ought
to
be
used.
While
it
is
directed
at
Internet
applications
with
specific
reference
to
protocols,
the
discussion
is
generally
applicable
to
Web
scenarios
as
well.
The
discussion
here
should
be
seen
as
ancillary
to
the
content
of
[IETFXML]
.
Refer
also
to
"XML
Accessibility
Guidelines"
[XAG]
for
help
designing
XML
formats
that
lower
barriers
to
Web
accessibility
for
people
with
disabilities.
XML
defines
textual
data
formats
that
are
naturally
suited
to
describing
data
objects
which
are
hierarchical
and
processed
in
a
chosen
sequence.
It
is
widely,
but
not
universally,
applicable
for
data
formats;
an
audio
or
video
format,
for
example,
is
unlikely
to
be
well
suited
to
expression
in
XML.
Design
constraints
that
would
suggest
the
use
of
XML
include:
-
Requirement
for
a
hierarchical
structure.
-
The
data's
usefulness
should
outlive
the
tools
currently
used
to
process
it
(though
obviously
XML
can
be
used
for
short-term
needs
as
well).
-
Ability
to
support
internationalization
in
a
self-describing
way
that
makes
confusion
over
coding
options
unlikely.
-
Early
detection
of
encoding
errors
with
no
requirement
to
"work
around"
such
errors.
-
A
high
proportion
of
human-readable
textual
content.
-
Potential
composition
of
the
data
format
with
other
XML-encoded
formats.
Sophisticated
linking
mechanisms
have
been
invented
for
XML
formats.
XPointer
allows
links
to
address
content
that
does
not
have
an
explicit,
named
anchor.
XLink
is
an
appropriate
specification
for
representing
links
in
hypertext
XML
applications.
XLink
allows
links
to
have
multiple
ends
and
to
be
expressed
either
inline
or
in
"link
bases"
stored
external
to
any
or
all
of
the
resources
identified
by
the
links
it
contains.
Designers
of
XML-based
formats
should
consider
using
XLink
and,
for
defining
fragment
identifier
syntax,
using
the
XPointer
framework
and
XPointer
element()
Schemes.
See
TAG
issue
xlinkScope-23
.
Story
The
authority
responsible
for
"weather.example.com"
realizes
that
it
can
provide
more
interesting
representations
by
creating
instances
that
consist
of
elements
defined
in
different
XML-based
formats
,
such
as
XHTML,
SVG,
and
MathML.
How
can
one
ensure
that
there
are
no
naming
conflicts
when
elements
from
different
XML-based
data
formats
are
mixed?
For
example,
suppose
that
one
designer
defines
the
para
element
in
an
XML
format
to
identify
paragraphs,
and
another
designer
defines
the
para
element
in
a
second
format
to
identify
parachutes.
"Namespaces
in
XML"
[
XMLNS
]
provides
a
mechanism
for
establishing
globally
unique
names.
Specification
designers
who
declare
namespaces
thus
provide
a
global
context
for
instances
of
the
data
format.
Establishing
this
global
context
allows
those
instances
(and
portions
thereof)
to
be
re-used
and
combined
in
novel
ways
not
yet
imagined.
Failure
to
provide
a
namespace
makes
such
re-use
more
difficult,
perhaps
impractical
in
some
cases.
Good
practice:
Namespace
adoption
A
specification
that
establishes
an
XML
vocabulary
SHOULD
place
all
element
names
and
global
attribute
names
in
a
namespace.
Attributes
are
always
scoped
by
the
element
on
which
they
appear.
An
attribute
that
is
"global,"
that
is,
one
that
might
meaningfully
appear
on
elements
of
any
type,
including
elements
in
other
namespaces,
should
be
explicitly
placed
in
a
namespace.
Local
attributes,
ones
associated
with
only
a
particular
element
type,
need
not
be
included
in
a
namespace
since
their
meaning
will
always
be
clear
from
the
context
provided
by
that
element.
The
xsi:type
attribute,
provided
by
W3C
XML
Schema
for
use
in
XML
instance
documents,
is
an
example
of
a
global
attribute.
It
can
be
used
by
authors
of
any
vocabulary
to
make
an
assertion
in
instance
data
about
the
type
of
the
element
on
which
it
appears.
The
type
attribute
occurs
in
the
W3C
XML
Schema
namespace
"http://www.w3.org/2001/XMLSchema"
and
must
always
be
fully
qualified.
The
frame
attribute
on
an
HTML
table
is
an
example
of
a
local
attribute.
There
is
no
value
in
placing
that
attribute
in
a
namespace
since
the
attribute
is
unlikely
to
be
useful
on
an
element
other
than
an
HTML
table.
Applications
that
rely
on
DTD
processing
must
impose
additional
constraints
on
the
use
of
namespaces.
DTDs
perform
validation
based
on
the
lexical
form
of
the
element
and
attribute
names
in
the
document.
This
makes
prefixes
syntactically
significant
in
ways
that
are
not
anticipated
by
[
XMLNS
].
4.5.4.
<a name="namespace-documents" id="namespace-documents" shape="rect">
Namespace
Documents
Representation
Story
Nadia
receives
representation
data
from
"weather.example.com"
in
an
unfamiliar
data
format.
She
knows
enough
about
XML
to
recognize
which
XML
namespace
the
elements
belong
to.
Since
the
namespace
is
identified
by
the
URI
"http://weather.example.com/2003/format",
she
asks
her
browser
to
retrieve
a
representation
of
the
namespace
via
that
URI.
Nadia
URI;
this
is
requesting
the
<a name="def-namespace-document" id="def-namespace-document">
<dfn>
namespace
document
</dfn>
called
a
.
Nadia
gets
back
some
useful
data
that
allows
her
to
learn
more
about
the
data
format.
Nadia's
browser
may
also
be
able
to
perform
some
operations
automatically
(i.e.,
unattended
by
a
human
overseer)
given
data
that
has
been
optimized
for
software
agents.
For
example,
her
browser
might,
on
Nadia's
behalf,
download
additional
agents
to
process
and
render
the
format.
There
are
many
reasons
to
provide
information
about
a
namespace.
A
person
might
want
to:
-
understand
its
purpose,
-
learn
how
to
use
the
markup
vocabulary
in
the
namespace,
-
find
out
who
controls
it,
-
request
authority
to
access
schemas
or
collateral
material
about
it,
or
-
report
a
bug
or
situation
that
could
be
considered
an
error
in
some
collateral
material.
A
processor
might
want
to:
-
retrieve
a
schema,
for
validation,
-
retrieve
a
style
sheet,
for
presentation,
or
-
retrieve
ontologies,
for
making
inferences.
In
general,
there
is
no
established
best
practice
for
creating
a
namespace
document.
representation.
Application
expectations
will
influence
what
data
format
or
formats
are
used
to
create
a
namespace
document.
representation.
Application
expectations
will
also
influence
whether
relevant
information
appears
in
the
namespace
document
representation
itself
or
is
referenced
from
it.
Good
practice:
<a name="namespace-docs" id="namespace-docs" shape="rect">
Namespace
documents
representations
The
owner
of
an
XML
namespace
name
SHOULD
make
available
material
intended
for
people
to
read
and
material
optimized
for
software
agents
in
order
to
meet
the
needs
of
those
who
will
use
the
namespace
vocabulary.
When
a
namespace
representation
is
provided
by
the
namespace
URI
owner,
that
material
is
authoritative.
For
example,
the
following
are
examples
of
formats
used
to
create
namespace
documents:
representations:
[
OWL10
],
[
RDDL
],
[
XMLSCHEMA
],
and
[
XHTML11
].
Each
of
these
formats
meets
different
requirements
described
above
for
satisfying
the
needs
of
an
agent
that
wants
more
information
about
the
namespace.
Note,
however,
issues
related
to
fragment
identifiers
and
multiple
representations
if
content
negotiation
is
used
with
namespace
documents.
representations.
See
TAG
issues
namespaceDocument-8
and
abstractComponentRefs-37
.
Section
3
of
"Namespaces
in
XML"
[
XMLNS
]
provides
a
syntactic
construct
known
as
a
QName
for
the
compact
expression
of
qualified
names
in
XML
documents.
A
qualified
name
is
a
pair
consisting
of
a
URI,
which
names
a
namespace,
and
a
local
name
placed
within
that
namespace.
"Namespaces
in
XML"
provides
for
the
use
of
QNames
as
names
for
XML
elements
and
attributes.
Other
specifications,
starting
with
[
XSLT10
],
have
employed
the
idea
of
using
QNames
in
contexts
other
than
element
and
attribute
names,
for
example
in
attribute
values
and
in
element
content.
However,
general
XML
processors
cannot
reliably
recognize
QNames
as
such
when
they
are
used
in
attribute
values
and
in
element
content;
for
example,
the
syntax
of
QNames
overlaps
with
that
of
URIs.
Experience
has
also
revealed
other
limitations
to
QNames,
such
as
losing
namespace
bindings
after
XML
canonicalization.
Good
practice:
QNames
Indistinguishable
from
URIs
A
specification
in
which
QNames
represent
URI/local-name
pairs
SHOULD
NOT
allow
both
Qnames
and
URIs
in
attribute
values
or
element
content,
where
they
would
be
indistinguishable.
For
more
information,
see
the
TAG
finding
"
Using
QNames
as
Identifiers
in
Content
"
.
Because
QNames
are
compact,
some
specification
designers
have
adopted
the
same
syntax
as
a
means
of
identifying
resources.
Though
convenient
as
a
shorthand
notation,
this
usage
has
a
cost.
There
is
no
single,
accepted
way
to
convert
a
QName
into
a
URI
or
vice
versa.
Although
QNames
are
convenient,
they
do
not
replace
the
URI
as
the
identification
mechanism
of
the
Web.
The
use
of
QNames
to
identify
Web
resources
without
providing
a
mapping
to
URIs
is
inconsistent
with
Web
architecture.
Good
practice:
QName
Mapping
A
specification
in
which
QNames
serve
as
resource
identifiers
MUST
provide
a
mapping
to
URIs.
For
examples
of
QName-to-URI
mappings,
see
[
RDF10
].
See
also
TAG
issues
rdfmsQnameUriMapping-6
,
qnameAsId-18
,
and
abstractComponentRefs-37
.
Consider
the
following
fragment
of
XML:
<section
name="foo">
.
Does
the
section
element
have
what
the
XML
Recommendation
refers
to
as
the
ID
foo
?
One
cannot
answer
this
question
by
examining
the
element
and
its
attributes
alone.
In
XML,
the
quality
of
"being
an
ID"
is
associated
with
the
type
of
an
attribute,
not
its
name.
Finding
the
IDs
in
a
document
requires
additional
processing.
-
Processing
the
document
with
a
processor
that
recognizes
DTD
attribute
list
declarations
(in
the
external
or
internal
subset)
might
reveal
a
declaration
that
identifies
the
name
attribute
as
an
ID.
Note:
This
processing
is
not
necessarily
part
of
validation.
A
non-validating,
DTD-aware
processor
can
perform
ID
assignment.
-
Processing
the
document
with
a
W3C
XML
schema
might
reveal
an
element
declaration
that
identifies
the
name
attribute
as
an
xs:ID
.
-
In
practice,
processing
the
document
with
another
schema
language,
such
as
RELAX
NG
[
RELAXNG
],
might
reveal
the
attributes
declared
to
be
of
ID
in
the
XML
Schema
sense.
Many
modern
specifications
begin
processing
XML
at
the
Infoset
[
INFOSET
]
level
and
do
not
specify
normatively
how
an
Infoset
is
constructed.
For
those
specifications,
any
process
that
establishes
the
ID
type
in
the
Infoset
(and
Post
Schema
Validation
Infoset
(
PSVI
)
defined
in
[
XMLSCHEMA
])
may
usefully
identify
the
attributes
of
type
ID.
-
In
practice,
applications
may
have
independent
means
of
specifying
ID-ness
as
provided
for
and
specified
in
the
XPointer
specification.
To
further
complicate
matters,
DTDs
establish
the
ID
type
in
the
Infoset
whereas
W3C
XML
Schema
produces
a
PSVI
but
does
not
modify
the
original
Infoset.
This
leaves
open
the
possibility
that
a
processor
might
only
look
in
the
Infoset
and
consequently
would
fail
to
recognize
schema-assigned
IDs.
The
TAG
expects
to
continue
to
work
with
other
groups
to
help
resolve
open
questions
about
establishing
"ID-ness"
in
XML
formats.
See
TAG
issue
xmlIDSemantics-32
.
RFC
3023
defines
the
Internet
media
types
"application/xml"
and
"text/xml",
and
describes
a
convention
whereby
XML-based
data
formats
use
Internet
media
types
with
a
"+xml"
suffix,
for
example
"image/svg+xml".
These
Internet
media
types
create
two
problems:
First,
for
data
identified
as
"text/*",
Web
intermediaries
are
allowed
to
"transcode",
i.e.,
convert
one
character
encoding
to
another.
Transcoding
may
make
the
self-description
false
or
may
cause
the
document
to
be
not
well-formed.
Good
practice:
XML
and
"text/*"
In
general,
a
representation
provider
SHOULD
NOT
assign
Internet
media
types
beginning
with
"text/"
to
XML
representations.
Second,
representations
whose
Internet
media
types
begin
with
"text/"
are
required,
unless
the
charset
parameter
is
specified,
to
be
considered
to
be
encoded
in
US-ASCII.
Since
the
syntax
of
XML
is
designed
to
make
documents
self-describing,
it
is
good
practice
to
omit
the
charset
parameter,
and
since
XML
is
very
often
not
encoded
in
US-ASCII,
the
use
of
"text/"
Internet
media
types
effectively
precludes
this
good
practice.
Good
practice:
XML
and
character
encodings
In
general,
a
representation
provider
SHOULD
NOT
specify
the
character
encoding
for
XML
data
in
protocol
headers
since
the
data
is
self-describing.
The
section
on
media
types
and
fragment
identifier
semantics
discusses
the
interpretation
of
fragment
identifiers.
Designers
of
an
XML-based
data
format
specification
should
define
the
semantics
of
fragment
identifiers
in
that
format.
The
XPointer
Framework
[
XPTRFR
]
provides
an
interoperable
starting
point.
When
the
media
type
assigned
to
representation
data
is
"application/xml",
there
are
no
semantics
defined
for
fragment
identifiers,
and
authors
should
not
make
use
of
fragment
identifiers
in
such
data.
The
same
is
true
if
the
assigned
media
type
has
the
suffix
"+xml"
(defined
in
"XML
Media
Types"
[
RFC3023
]),
and
the
data
format
specification
does
not
specify
fragment
identifier
semantics.
In
short,
just
knowing
that
content
is
XML
does
not
provide
information
about
fragment
identifier
semantics.
Many
people
assume
that
the
fragment
identifier
#abc
,
when
referring
to
XML
data,
identifies
the
element
in
the
document
with
the
ID
"abc".
However,
there
is
no
normative
support
for
this
assumption.
See
TAG
issue
fragmentInXML-28
.