Parties who wish to communicate must agree upon a shared set of
identifiers and on their meanings.
This is untrue for some reasonable meanings of "meaning", as Pat Hayes
has argued from time to time. You could say instead:
"Parties who wish to communicate must agree on the practical
effects of using certain identifiers."
or
"Parties who wish to communicate must agree upon a shared set of
identifiers and (to a reasonable degree) on their meanings."
That is: some ambiguity of meaning is both reasonable and
unavoidable. I don't think an unqualified "agree" normally means
"partially agree".
Does http://weather.example.com/oaxaca identify the weather report for
just Oaxaca or for the Oaxaca region? When it starts to matter, you
can start to build a shared understanding of which it is. But you
can't banish those ambiguities until you notice them. There's also a
school of design where you choose not to banish them, even when you
see them, until you know they matter.
Overtaken by events.
issue hawke2: Section 2: Full agreement not required for communication
error decided 2004-02-13
[Section 2] assumes that identification and retrievability are the same thing.
Given the extensive use, starting with namespaces, but continuing with the
identification of XSLT and XQuery functions, and so on, of using URIs to
identify non-retrievble and abstract entities, this conflation is problematic
at best.
Decided at Ottawa f2f. (No action.) The new text about information
resources is believed to address this issue.
issue schema2: [Section 2] Unwise confluence of identification and retrievability
error decided 2004-03-04
Section 2, introductory paragraphs. In the introduction to this
section, the failure of the document to make any serious attempt to
define the term 'resource' begins to bite you -- and more to the
point, begins to cause problems for the reader. I recognize that it's
difficult to define 'resource' well, but I believe it essential that
you try. If definition proves absolutely impossible, you can of
course take it as an undefined primitive notion, but to make that
approach useful I think you would need to specify explicitly the
relations which are postulated as holding between resources and other
primitive notions.
In the current draft, you are making things too easy on yourselves;
the document suffers.
Some questions one might hope to have some light shed on by either a
definition or by a non-defining description of resource as a primitive
notion:
How many resources are there, or how many could there be?
Can resources be created or come into existence at a particular
point in time?
Can resources cease to exist?
Can a set of resources be a resource?
Can a part of a resource be a resource?
Do all users of the Web operate with the same set of resources,
or is it possible for one user to identify three resources
where another identifies only two, without either of them being
in error?
Who determines the identity of a resource?
If the question arises whether two URIs designate the same
resource, can there be an authoritative answer to the question,
or is it a judgement question like the question 'Is "love" an
adequate English rendering for the Greek word "agape"?', on which
every thoughtful observer may form an independent opinion?
It is clear that various parts of the architecture document assume
that some resources have owners. Do any resources have multiple
owners? Do any resources lack owners?
issue msm8: WD-webarch-20031209, Section 2, introductory paragraphs: The term 'resource' needs to be defined
error raised 2004-03-04
Section 2 para 3 says
When a representation uses a URI (instead of a local identifier)
as an identifier, then it gains great power from the vastness of
the choice of resources to which it can refer.
This suggests that URIs have the advantage, compared to local
identifiers, of being more numerous. But if we assume that both URIs
and local identifiers are finite-length strings without any length
restriction we need worry about, then both sets are enumerably
infinite and there is a one-to-one mapping between them, so that they
have exactly the same cardinality and neither is any more vast than
the other.
I suspect that what is meant here is that URIs have the advantage of
being dereferenceable; this is true of some URIs, but not, I think, of
all.
Overtaken by events.
issue msm9: WD-webarch-20031209, Section 2 para 3: The vastness of URI space
error decided 2004-03-04
Section 2, Principle: URI assignment says: "A resource owner SHOULD
assign a URI to each resource that others will expect to refer to."
In order to comply with this principle, it seems to be necessary for
resource owners to know what resources they own, or (equivalently) to
know, of each thing they own, whether it is a resource or not. It
doesn't seem plausible to expect compliance with this principle if
"resource" is not defined more informatively than it is defined in
this document.
It may also be noted in passing that this principle also requires that
resource owners predict what other actors will expect; it would be
nice if the principle could be reformulated without requiring owners
to perform such predictions.
Note also that if resources can be any "items of interest" (as stated
by section 1), it may be impossible for a resource owner to provide
URIs for every resource which may be an item of interest. If there is
an owner of the real numbers, for example, that owner cannot comply
with the principle enunciated here. If anyone owns an infinite set of
items of interest, and if sets of such items are thought to be
themselves potential items of interest, then that owner cannot, in
principle, provide URIs for all items of interest: the power set of an
enumerably infinite set is not enumerable, and neither URIs nor any
other finite names can be provided for all the members of a
non-enumerable set.
I wonder if some slightly less demanding principle ought to be
enunciated.
Overtaken by events.
issue msm10: WD-webarch-20031209, Section 2: Assigning URIs to resources others will expect to refer to
error decided 2004-03-04
Parties who wish to communicate must agree upon a shared set of
identifiers and on their meanings.
This is false. A baby communicates distress and discomfort to his or
her parents without there being any identifers, or even any
identification going on on the part of the baby. I might be able to
communicate that this large bolder crushing my leg should be removed
by the stout and helpful non-english-speaking lass beside me by making
somewhat spastic gesticulations. Or, in a more structured way, I might
point at the bolder, or wap the bolder, and make a little rolling
motion with my hands.
A number of editorial comments followed
Overtaken by events.
issue parsia5: LC Comment, Section 2: Agreement on identifiers
error decided 2004-03-05
The identification mechanism for the Web is the URI.
Presumably this isn't *quite* right, as there is a need for some
idenification mechanisms that are not URI based in order to associate
(some, at least) URIs with resources for subsequent reidentification.
Also, for example, host names identify things very critical to the
functioning of the web, and yet, aren't URIs. Etc.
Overtaken by events.
issue parsia6: LC Comment, Section 2: Identification mechanism of the Web
error decided 2004-03-05
A URI must be assigned to a resource in order for agents to be able
to refer to the resource.
Even restricted to software agents, this is false.
_x foaf:mbox <mailto:bparsia@isr.umd.edu>.
Allows an OWL Reasoner to refer to me (since foaf:mbox is an
InverseFunctionalProperty). (While there was a URI involved, it wasn't
assigned *to me*.) I can make or refute assertions about me in this
way.
Overtaken by events.
issue parsia7: LC Comment, Section 2: On requirement to assign a URI to a resource
error decided 2004-03-05
Resources exist before URIs;
If URIs are strings, and string are abstract mathematical entities
(i.e., a kind of data structure) independant of their physical
instantiation, then, reasonably, URIs have always existed, so any
particular URI has existed before some recently come into existent
Resources. I'm not even sure of the point of such metaphysical
statements. Or imagine I have, oh, a programming language where I have
URI objects (a subclass of String). Let's say I want to use a URI to
identify some other objects in my system. Does this claim require that
(in pseudopython):
my_object_uri = URI('http://blahblah.com/blah') #The URI now
exists!
my_funky_object = FunkyObject() #Now the Resource in question
exists.
my_object_uri.assigned_to(my_funky_object)
is broken in some way? Why would this matter?
issue parsia8: LC Comment, Section 2: On resources existing before URIs
clarification raised 2004-03-05
a resource may be identified by zero URIs.
Ah, this is what you mean? It's not very happy either. I take it you
mean that some resource might *not* be identified by *any* URI. Cool.
And given my above example, it might still be possible for agents to
refer to it. Naturally, it's often a good idea to give various
resources a URI! For example, I don't think it's possible (or, at
least, easy) to *link* to something in a machine readable way in HTML.
So, give such resources URIs, please. I think it's quite possible to
make the sensible point without appeal to broken metaphysics.
Actually, the rest of the paragraph seems quite good and
sensible.
Overtaken by events.
issue parsia9: LC Comment, Section 2: On resources being able to have zero URIs
error decided 2004-03-05
Principle: URI assignment.
A resource owner SHOULD assign a URI to each resource that others will
expect to refer to.
I would recommend the TAG study FOAF because that community has made a
different choice (i.e., to rely a lot on inverseFunctionalProperties).
Aside from that, I think this principle misses an important point:
Formats and protocols should (often?) be designed to use URIs. This
encourages URI assignment by adding value to such assignment.
Overtaken by events.
issue parsia10: LC Comment, Section 2: On URI assignment
error decided 2004-03-05
First para: [[Parties who wish to communicate must agree
upon a shared set of identifiers and on their meanings.]]
"identifiers (names for things)"?
issue manola8: Add "(names for things)" after "identifiers"?
clarification raised 2004-03-10
Parties who wish to communicate must agree upon a shared set of
identifiers and on their meanings. The ability to use common
identifiers across communities motivates global identifiers in Web
architecture. Thus, Uniform
Resource Identifiers ([URI], currently being revised) which are global
identifiers in the context of the Web, are central to Web
architecture.
Constraint: Identify with URIs
The identification mechanism for the Web is
the URI.
A URI must be assigned to a resource in order for agents to be
able to refer to the resource. It follows that a resource should be
assigned a URI if a third party might reasonably want to link to
it, make or refute assertions about it, retrieve or cache a
representation of it, include all or part of it by reference into
another representation, annotate it, or perform other operations on
it.
When a representation uses a URI (instead of a local
identifier) as an identifier, then it gains great power from the
vastness of the choice of resources to which it can refer. The
phrase the "network effect" describes the fact that the usefulness
of the technology is dependent on the size of the deployed Web.
Resources exist before URIs; a resource may be identified by
zero URIs. However, there are many benefits to assigning a URI to a
resource, including linking, bookmarking, caching, and indexing by
search engines. Designers should expect that it will prove useful
to be able to share a URI across applications, even if that utility
is not initially evident.
The scope of a URI is global; the resource identified by a URI
does not depend on the context in which the URI appears (see also
the section about URIs in
other roles). Of course, what an agent does with a URI may
vary. The TAG finding "URIs, Addressability, and the use of HTTP GET and
POST" discusses additional benefits and considerations
of URI addressability.
Principle: URI assignment
A resource owner SHOULD assign a URI to each
resource that others will expect to refer to.
This principle dates back at least as far as Douglas Engelbart's
seminal work on open hypertext systems; see section Every Object Addressable in [Eng90].
In section 2.1, "URI Comparisons", I understand the meaning of the paragraph
which begins "Applications may apply rules ...". It means that if your
application makes assumptions about URI equivalences based on details not
covered in the specification, then it's your responsibility if any problems
develop from that. What I don't understand is the term "authority component" in this sentence:
For example, for "http" URIs, the authority component is case-insensitive.
The TAG agreed with the Editor's change to include parenthetical.
issue karr1: What does "authority component" mean?
clarification decided 2003-12-21
The statement "one might reasonably create URI's that ..." in the following passage may be inappropriate, as the preference for viewing a resource in Italian or Spanish should be communicated as meta information within the context, for which mechanisms such as CC/PP are being developed. To countenance the use of non-unique URI's for such a purpose is unwise.
The TAG believes that it is useful to indicate that there are
two resources (one Spanish and one Italian) but to add to the
example some discussion of content negotiation.
issue diwg2: Don't communicate language info in URIs (in example)
error decided 2004-02-25
The AWWW says that one may conclude that agents or representations are
each referring to the same resource if they are using identical URIs.
But that's problematic; it suggests that the relation between
resources and URIs is in some sense timeless and static. Once a URI
has been coined to identify a given resource, it can only ever
identify precisely that resource; else, we have to embrace the
willy-nilly change problem.
The TAG believes the reviewer's question is addressed by
section 3.6.2 of the document.
issue clark3: Willy-Nilly Resource Change
clarification decided 2004-02-26
When determining the uniqueness of a URI, is the fragment identifier
considered part of the identifying URI? If there is an argument list, does
the ? and what follows constitute part of the unique URI?
issue laskey2: What determines URI uniqueness?
clarification raised 2004-03-01
...For example, the parties responsible for weather.example.com should not
use both "http://weather.example.com/Oaxaca" and
"http://weather.example.com/oaxaca" to refer to the same resource; agents
will not detect the equivalence relationship by following specifications.
and
... Agents should not assume, for example, that
"http://weather.example.com/Oaxaca" and "http://weather.example.com/oaxaca"
identify the same resource, since none of the specifications involved
states that the path part of an "http" URI is case-insensitive.
While correct, I felt this was potentially a little confusing. The first
example did not seem well chosen to reflect the point I think is being
made. Suggest:
...For example, the parties responsible for weather.example.com should not
use both "http://weather.example.com/Oaxaca" and
"http://weather.example.com/Mexico?city=Oaxaca" to refer to the same
resource; agents will not detect the equivalence relationship by following
specifications.
Hmmm, maybe there's a third point to be made here, namely that the party
responsible for some domain should avoid using different URIs with small,
easily overlooked differences?
issue klyne6: Clarification about point on agents detecting equivalence relationships
clarification raised 2004-03-05
[[URI producers should be conservative about the number of
different URIs they produce for the same resource. For example, the
parties responsible for weather.example.com should not use both
"http://weather.example.com/Oaxaca" and
"http://weather.example.com/oaxaca" to refer to the same resource;
agents will not detect the equivalence relationship by following
specifications. On the other hand, there may be good reasons for
creating similar-looking URIs. For instance, one might reasonably create
URIs that begin with "http://www.example.com/tempo" and
"http://www.example.com/tiempo" to provide access to resources by users
who speak Italian and Spanish.]]
Why does the first sentence refer to "URI producers" that "produce" URIs
rather than "resource owners" that "create" them (which would be more
consistent with earlier text). I also note that words "assign",
"create", and "produce" (and possibly others) are all used for what
seems to be the same idea.
Also, the rest of this illustration seems to have a funny interaction
with the URI opacity principle in Section 2.5 (especially the discussion
there about the travel example), since the Section 2.1 text above seems
to suggest there is value in being able to convey information to an
accessing "agent" (a human in this case) via the form of the URI itself
(i.e., if URIs are to be totally opaque to the "agent", why would there
be value in using one language over another?). Of course, this may be
just another problem in allowing "agent" to refer to people. However,
the problem seems somewhat more acute if the result of dereferencing
URIs in different languages is the retrieval of the report in the
corresponding languages because, while this kind of makes sense, it also
invites determining the language of the report from the language of the URI.
issue manola12: URI producers or owners? Relationship to opacity principle?
Evidence of confusion about "agent" including "people"?
clarification raised 2004-03-10
2nd para. The first sentence establishes
that character-by-character inequality doesn't mean that the resource
referred is different. But the subsequent sentences say basically the
opposite (that this is the most straightforward way to find resource
equality). Break into two paragraphs, or otherwise improve wording
to less confuse the reader.
Overtaken by events.
issue i18nwg8: Sentences seem contradictory
error decided 2004-03-18
3rd para. The casing example for weather.example.com/Oaxaca is a
bit obscure. Perhaps spell out the fact that case sensitivity matters
to some systems?
issue i18nwg9: Case example unclear.
clarification raised 2004-03-18
For instance, one might reasonably create URIs that begin with
"http://www.example.com/tempo" and "http://www.example.com/tiempo" to
provide access to resources by users who speak Italian and
Spanish."
It is nice to see an i18n-related example. However, there are all
kinds of issues with this. This is not necessarily a good way to
organize information in different languages on a server, in
particular if the information is highly parallel. It may be
better to find another example, for example with two English
words. Also, 'tempo' is an English word with a different
meaning. Perhaps German "Wetter" is better?
issue i18nwg10: Don't recommend organizing information by language.
clarification raised 2004-03-18
4th para.
"Likewise, URI consumers should ensure URI
consistency. For instance, when transcribing a URI, agents should not
gratuitously escape characters. The term "character" refers to URI
characters as defined in section 2 of [URI]".
The definition of
'character' in the first sentence is not clarified by section 2 of
the URI draft, which deals with details such as percent escaping of
characters. Section 1 of the URI draft *points to* a definition of
'character'.
This is an area where the presence of IRI would be welcome.
It might be more useful to describe what "gratuitious" means in
this context (there is currently no definition; we *think* it
means "don't escape characters unless it breaks usability", i.e.
I would expect to see %20 instead of space (because space breaks
the URI semantically).
issue i18nwg11: Mention IRIs?
clarification raised 2004-03-18
Web architecture allows resource owners to assign more than one
URI to a resource.
Constraint: URI
uniqueness
Web architecture does not constrain a Web
resource to be identified by a single URI.
Thus, URIs that are not identical (character for character) do
not necessarily refer to different resources. The most
straightforward way of establishing that two parties are referring
to the same Web resource is to compare, as character strings, the
URIs they are using. URI equivalence is discussed in section 6 of
[URI]
Good practice: URI aliases
Resource owners should not create arbitrarily
different URIs for the same resource.
URI producers should be conservative about the number of
different URIs they produce for the same resource. For example, the
parties responsible for weather.example.com should not use both
"http://weather.example.com/Oaxaca" and
"http://weather.example.com/oaxaca" to refer to the same resource;
agents will not detect the equivalence relationship by following
specifications. On the other hand, there may be good reasons for
creating similar-looking URIs. For instance, one might reasonably
create URIs that begin with "http://www.example.com/tempo" and
"http://www.example.com/tiempo" to provide access to resources by
users who speak Italian and Spanish.
Likewise, URI consumers should ensure URI consistency. For
instance, when transcribing a URI, agents should not gratuitously
escape characters. The term "character" refers to URI characters as
defined in section 2 of [URI].
Good practice: Consistent URI usage
If a URI has been assigned to a resource,
agents SHOULD refer to the resource using the same URI, character
for character.
Applications may apply rules beyond basic string comparison that
are licensed by specifications to reduce the risk of false
negatives and positives. For example, for "http" URIs, the
authority component is case-insensitive. Agents that reach
conclusions based on comparisons that are not licensed by relevant
specifications take responsibility for any problems that result.
Agents should not assume, for example, that
"http://weather.example.com/Oaxaca" and
"http://weather.example.com/oaxaca" identify the same resource,
since none of the specifications involved states that the path part
of an "http" URI is case-insensitive.
See section 6 [URI] for more
information about comparing URIs and reducing the risk of false
negatives and positives. See the section on future directions for
approaches other than string comparison that may allow different
parties to assert that
two URIs identify the same resource.
Following the lessons of the "deep linking" debacle, it might be good to
say explicitly what rights "URI ownership" does or does not confer. This
is somewhat addressed later, but it might be good to say something in this
section.
The Editor will include a forward link from 2.2 to 3.6.3.
issue booth2: What rights does "URI ownership" confer?
clarification decided 2004-01-06
The reviewer raised a number of points about URI ownership
and authority in sections 3.4 para 1 and para 2.
issue stickler7: Section 3.4, para 2: URI ownership questions
error raised 2004-02-03
Given all these problems I don't see how the architectural principles of
the World Wide Web can be so dependent on resource ownership. Many of the
uses of ``resource owner'' in the document do not make sense at all and
need to be removed from the document.
The term "Resource owner" has been replaced with "URI owner".
issue pps1: Ownership and authority
error decided 2004-02-12
Section 2.2, bulleted list, first item. It would be useful, I think,
if this were expounded at greater length. It is not necessarily clear
to all readers (it is, for example, not entirely clear to me) how the
hierarchical delegation here postulated follows from the wording of
the specifications defining the HTTP and mailto schemes.
Overtaken by events.
issue msm11: WD-webarch-20031209, Section 2.2, bulleted list, first item: Delegation of authority in hierarchical URIs
error decided 2004-03-04
Whatever the techniques used, except for the checksum case, the
agent has a unique relationship with the URI, called URI ownership.
Here is what I can find on what's an "agent", prior to this
passage:
Within each of these systems, agents (people and software)
strate typical behavior of Web agents \x{2014} people or software (on
behalf of a person, entity, or process) acting on this information
space. Software agents include servers, proxies, spiders, browsers, and
multimedia players.
So, an agent is a person or a program. Thus, every http uri has,
supposedly, one, and only one, person or program that is its owner.
However, institutional ownership seems possible, as is joint ownership.
issue parsia14: Various types of ownership
clarification raised 2004-03-05
The social implications of URI ownership are not discussed here.
However, the success or failure of these different approaches depends
on the extent to which there is consensus in the Internet community on
abiding by the defining specifications.
First you say that the social implications of URI ownership are *not*
discussed here, then go on to discuss some social implications. Don't
do that.
I don't believe the second statement of that quote, at least on many
interpretations, and I've objected to its use in various technical
arguments, some with TAG members. If this passage is to be a stick to
beat me with in technical debate in W3C working groups, then I
strenuously object to it, especially without substantial explication
and clarification. So, I make the strong comment that I want this line
struck. I object to it.
issue parsia15: Social implications of URI ownership.
error raised 2004-03-05
[[The requirement for URIs to be unambiguous demands that
different agents do not assign the same URI...]]
Now we have *agents* assigning URIs rather than, e.g., resource or URI
owners. It's not clear that this is consistent with prior discussion.
issue manola13: Can agents assign URIs? Or should this be "use"?
clarification raised 2004-03-10
[[The concept of URI ownership is especially visible in the case of the
HTTP protocol, which enables the URI owner to serve authoritative
representations of a resource.]]
This text is pertinent to the point raised earlier about resource vs.
URI ownership, and might be expanded on a bit to clarify that
relationship. In particular, when dealing with URIs that have
retrievable representations, it is straightforward to demonstrate
ownership; non-owners can't determine what is returned when
dereferencing such URIs, while owners can.
issue manola14: Clarify relationship between resource / URI ownership
clarification raised 2004-03-10
The requirement for URIs to be unambiguous demands that different agents do not
assign the same URI to different resources. URI scheme specifications assure this using a
variety of techniques, including:
- Hierarchical delegation of authority. This approach,
exemplified by the "http" and "mailto" schemes, allows the
assignment of a part of URI space to one party, reassignment of a
piece of that space to another, and so forth.
- Random numbers. The generation of a fairly large random number,
used in the "uuid" scheme, reduces the risk of ambiguity to a
calculated small risk.
- Checksums. The generation of a URI as a checksum based on a
data object has similar properties to the random number approach.
This is the approach taken by the "md5" scheme.
- Combination of approaches. The "mid" and "cid" schemes combine
some of the above approaches.
The approach taken for the "http" URI scheme follows the pattern
whereby the Internet community delegates authority, via the IANA
URI scheme registry [IANASchemes] and the DNS, over a set of URIs with
a common prefix to one particular owner. One consequence of this
approach is the Web's heavy reliance on the central DNS
registry.
Whatever the techniques used, except for the checksum case, the
agent has a unique relationship with the URI, called URI
ownership. The phrase "authority responsible for a URI"
is synonymous with "URI owner" in this document.
The social implications of URI ownership are not discussed here.
However, the success or failure of these different approaches
depends on the extent to which there is consensus in the Internet
community on abiding by the defining specifications. The concept of
URI ownership is especially visible in the case of the HTTP
protocol, which enables the URI owner to serve authoritative
representations of a resource. In this case, the HTTP origin
server (defined in [RFC2616])
is the agent acting on behalf of the URI owner.
AWWW abjures URI ambiguity; but in trying to think carefully about this,
I've realized that it's important to distinguish two kinds of URI ambiguity:
diachronic and synchronic. The AWWW only addresses the former kind, and I
think it should address the latter kind, too.
I'd like to see some language in the AWWW about avoiding synchronic
ambiguity by avoiding the "URI overloading" mistake with content
negotiation.
issue clark2: What kinds of ambiguity are there?
clarification raised 2004-02-26
The architecture document needs to do a better job of explaining what a
resource is in this context. (See email for more info)
Decided at Ottawa f2f. Added 2.5.2 Representation reuse.
issue schema3: [Section 2.3] Clarity required on nature of "resource"
error decided 2004-03-04
Section 2.3 says that the ambiguous *use* of URIs is to be avoided
(though, I'll point out, that the Good Practice is ambiguous between
ambiguous URIs and ambiguous *use* of URIs).
Of course, certain ambiguity doesn't matter, e.g., replicating Quine, I
might use a URI to refer to me, the human being, and someone else to
refer to the collection of undetatched people parts. As long as all our
uses *align* in (all) our interactions, we're fine, ambiguous
assignment or not.
Sorry for the quick digression into philosophy of languages, but,
really, at this time of night, I feel a little justified in turn around
:)
Overtaken by events.
issue parsia12: Ambiguous use of URIs v. URI Ambiguity?
error decided 2004-03-05
Hierarchical delegation of authority. This approach, exemplified by
the "http" and "mailto" schemes, allows the assignment of a part of URI
space to one party, reassignment of a piece of that space to another,
and so forth.
First use of 'URI space', which is undefined. I see 'information
space', 'uniform address space', and, of course, 'namespace'. As far as
I can tell, only 'namespace' has a definition (and it's not in this
doc, which is fine). Perhaps this is only editorial. A URI space seems
clear (a set of URIs? why not say that then?), but I did spend some
time wondering if it was the same as an infromation space or address
space. *Are* you using unambiguous phrases here? Are they aliases? Is
there a problem with either defining terms or using only one where
there's only one concept? Some principles of the web apply well to
technical prose.
issue parsia13: Use of term "URI Space"
clarification raised 2004-03-05
URI ambiguity should not be confused with ambiguity in natural language.
I'm not sure what this sentence is trying to say (what is meant here by
"confused with"). From what follows, I think the intent is to say
something like "justified by", in which case I think something like:
URIs should not be permitted the ambiguity that occurs in natural language.
[...existing text...]
This flexibility is not available to URIs, which should be defined to refer
to a single concept.
[Later,] I ran across this from TimBL in one of the Tag IRC logs, which seems to
capture the point more effectively.
Suggested text for 2.6: Whereas human communication tolerates such
ambiguity, machine processing does not. Strictly, the above URI as
identifies the information resource, some hypertext document. RDF
applications which use it for describing properties of that page are in
order; those who use its URL to directly assert properties of the whale are
using it inconsistently.
issue klyne8: Unclear point about ambiguity in natural language; is the point
about machine processing?
clarification raised 2004-03-05
[[URI ambiguity should not be confused with ambiguity in
natural language. The English statement "'http://www.example.com/moby'
identifies 'Moby Dick'" is ambiguous because one could understand the
phrase "Moby Dick" to refer to distinct resources: a particular printing
of this work, or the work itself in an abstract sense, or the fictional
white whale, or a particular copy of the book on the shelves of a
library (via the Web interface of the library's online catalog), or the
record in the library's electronic catalog which contains the metadata
about the work, or the Gutenberg project's online version]]
This example illustrates an ambiguous natural language statement, but
it's not clear that it doesn't also illustrate an ambiguous URI, since
the text doesn't say anything about how example.org, or other parties
citing http://www.example.com/moby, actually intepret it.
issue manola15: Does example *also* illustrate ambiguous URI usage?
clarification raised 2004-03-10
URI ambiguity. This may imply or suggest that natural language
differences in the representation of a resource are considered
bad. There should be examples of both good and bad ambiguity (or in
WebArch terminology, different but consistent representations of the
same resource as opposed to the use of a single URI for different
resources), with language negotation being a good example and wholly
different resources being a bad example
issue i18nwg14: Show examples of good and bad ambiguity
clarification raised 2004-03-18
Just as a shared vocabulary has tangible value, the ambiguous
use of terms imposes a cost in communication. URI
ambiguity refers to the use of the same URI to refer to
more than one distinct resource.
Good practice: URI
ambiguity
Avoid URI ambiguity.
URI ambiguity should not be confused with ambiguity in natural
language. The English statement "'http://www.example.com/moby'
identifies 'Moby Dick'" is ambiguous because one could understand
the phrase "Moby Dick" to refer to distinct resources: a particular
printing of this work, or the work itself in an abstract sense, or
the fictional white whale, or a particular copy of the book on the
shelves of a library (via the Web interface of the library's online
catalog), or the record in the library's electronic catalog which
contains the metadata about the work, or the Gutenberg project's online version.
[[In Web architecture, URIs identify resources. Outside
the bounds of Web architecture specifications, URIs can be useful for
other purposes, for example, as database keys...]]
It seems to me this paragraph mixes a few things. Just because a URI is
used as a database key doesn't necessarily mean it's being used for a
different purpose. If a URI is used as a key in a relational table that
associates metadata with the Web resources identified by those keys, and
does so correctly (i.e., distinguishes between metadata about Nadia and
metadata about her mailbox), it seems as if this is the *same* use of
the URI (to identify a Web resource), even though it may also be used in
the database to identify a distinct row in the table. Moreover, the
database might exhibit URI ambiguity in the same way the Web might,
e.g., by mixing metadata about both Nadia and her mailbox in the same
row. At the same time, the use of "mailto:nadia@example.com" as an
identifier for Nadia rather than her mailbox seems just as likely to
occur in a Web context as in this database one (people seem to want to
do it in RDF, for example; or is this not the part of the Web you're
talking about?).
issue manola16: Paragraph on other uses of URIs is confusing
clarification raised 2004-03-10
Good practice: URI opacity: This says
"Agents making use of URIs MUST NOT attempt to infer properties of the
referenced resource except as licensed by relevant specifications."
Earlier, the document defines 'agent' as both humans and machines.
This good practice is not too difficult to follow for agents
(although this seems to disallow e.g. Google to consider pieces
of an URI in their algorithms, e.g. the 'weather' and
'oaxaca' in 'http://weather.example.com/oaxaca'; we're not sure
disallowing this is intended or makes sense).
However, this practice is *impossible* to follow for humans: It's
just completely impossible to look at http://weather.example.com/oaxaca
and NOT interfering that this may be about 'weather' or 'oaxaca'.
The WebArch document itself is using this connection all the time.
This is important in connection with IRIs.
Overtaken by events.
issue i18nwg16: Good practice on URI opacity impossible to follow for humans.
error decided 2004-03-18
In Web architecture, URIs identify resources. Outside the bounds
of Web architecture specifications, URIs can be useful for other
purposes, for example, as database keys. For instance, the
organizers of a conference might use "mailto:nadia@example.com" to
refer to Nadia. While this usage is not licensed by Web
architecture specifications, in the context of the conference, all
parties may agree to that local policy and understand one another.
Certain properties of URIs, such as their potential for uniqueness,
make them appealing as general-purpose identifiers. In the Web
architecture, "mailto:nadia@example.com" identifies an Internet
mailbox; that is what is licensed by the "mailto" URI scheme
specification. The fact that the URI serves other purposes in
non-Web contexts does not lead to URI ambiguity. URI ambiguity
arises a URI is used to identify two different Web
resources.
Authors of specifications SHOULD NOT introduce a new URI scheme when
an existing scheme provides the desired properties of identifiers and
their relation to resources
The inverse (converse?) is also true - you should reuse a scheme and
protocol when they
do have the desired properties. It might be a good idea to reference
RFC3205 in this regard.
Overtaken by events.
issue rosenberg3: Reuse appropriate URI schemes (and protocols)
proposal decided 2004-04-21
In the URI "http://weather.example.com/", the "http" that
appears before the colon (":") names a URI scheme. Each URI scheme
has a normative specification that explains how identifiers are
assigned within that scheme. The URI syntax is thus a federated and
extensible naming mechanism wherein each scheme's specification may
further restrict the syntax and semantics of identifiers within
that scheme.
Examples of URIs from various schemes include:
- mailto:joe@example.org
- ftp://example.org/aDirectory/aFile
- news:comp.infosystems.www
- tel:+1-816-555-1212
- ldap://ldap.example.org/c=GB?objectClass?one
- urn:oasis:names:tc:entity:xmlns:xml:catalog
While the Web architecture allows the definition of new schemes,
introducing a new scheme is costly. Many aspects of URI processing
are scheme-dependent, and a significant amount of deployed software
already processes URIs of well-known schemes. Introducing a new URI
scheme requires the development and deployment not only of client
software to handle the scheme, but also of ancillary agents such as
gateways, proxies, and caches. See [RFC2718] for other considerations and costs
related to URI scheme design.
Because of these costs, if a URI scheme exists that meets the
needs of an application, designers should use it rather than invent
one.
Good practice: New URI schemes
Authors of specifications SHOULD NOT introduce
a new URI scheme when an existing scheme provides the desired
properties of identifiers and their relation to resources.
Consider our travel
scenario: should the authority providing information about the
weather in Oaxaca register a new URI scheme "weather" for the
identification of resources related to the weather? They might then
publish URIs such as "weather://travel.example.com/oaxaca". When a
software agent dereferences such a URI, if what really happens is
that HTTP GET is invoked to retrieve a representation of the
resource, then an "http" URI would have sufficed.
If the motivation behind registering a new scheme is to allow a
software agent to launch a particular application when retrieving a
representation, such dispatching can be accomplished at lower
expense via Internet Media Types. When designing a new data format,
the appropriate mechanism to promote its deployment on the Web is
the Internet Media Type.
Note that even if an agent cannot process representation data in
an unknown format, it can at least retrieve it. The data may
contain enough information to allow a user or user agent to make
some use of it. When an agent does not handle a new URI scheme, it
cannot retrieve a representation.
The Internet Assigned Numbers Authority
(IANA) maintains a registry [IANASchemes] of mappings
between URI scheme names and scheme specifications. For instance,
the IANA registry indicates that the "http" scheme is defined in
[RFC2616]. The process for
registering a new URI scheme is defined in [RFC2717].
The use of unregistered URI schemes is discouraged for a number
of reasons:
- There is no generally accepted way to locate the scheme
specification.
- Someone else may be using the scheme for other purposes.
- One should not expect that general-purpose software will do
anything useful with URIs of this scheme; the network effect is
lost.
Note: Some URI scheme specifications (such as
the "ftp" URI scheme specification) use the term "designate" where
the current document uses "identify."
TAG issue siteData-36 is about expropriation of naming
authority.
- the 2nd sentence of the 2nd paragraph in section 2.5 says "For
robustness, Web architecture promotes independence between an identifier
and the identified resource.". Should it not say "... the identified
resource and its representations."?
The reviewer retracted the issue.
issue baker1: Independence between identifier and resource, or representations?
clarification decided 2004-03-05
It is tempting to guess the nature of a resource by inspection of a
URI that identifies it. However, the Web is designed so that agents
communicate resource state through representations, not identifiers. In
general, one cannot determine the Internet Media Type of
representations of a resource by inspecting a URI for that resource.
For example, the ".html" at the end of "http://example.com/page.html"
provides no guarantee that representations of the identified resource
will be served with the Internet Media Type "text/html". The HTTP
protocol does not constrain the Internet Media Type based on the path
component of the URI; the server is free to return a representation in
PNG or any other data format for that URI."
First sentence talks about inferring the *nature* of a *resource* by
URI inspection (i.e., inferring that <http://ex.org/#BijanThePerson>>
rdf:type Person. from the URI alone). But the third sentence through
the rest of the paragraph talks about inferring the Mimetype of the
*representation* of the (state of) the resource. If you mean to
discourage both practices, some serious reworking is in order.
issue parsia17: Do you mean resource or representation?
clarification raised 2004-03-05
Resource state may evolve over time. Requiring resource owners to
change URIs to reflect resource state would lead to a significant
number of broken links. For robustness, Web architecture promotes
independence between an identifier and the identified resource.
I just wonder how this is different from:
Resources may come and go over time. Requiring resource owners to
abandon URIs to reflect resource non-existence woudl lead to a
significant number of broken links. For robustness, Web architecture
promotes independence between an identifier and the identified
resource."
Of course, you might say that abandoning URIs isn't what's required,
but rather maintaining legacy state. But then you've either changed the
resource (to something "representing" the nonexistence resource), or
you return representations reflecting the state of a nonexistence
resource. Of which there isn't any.
(Note that I'm not talking about imaginary entities, but ones who have
ceased to exist.)
The logic of avoiding broken links suggests that temporal URL ambiguity
might be useful for Web robustness (which might not be the same as
correctness).
issue parsia18: Temporal URL ambiguity useful for Web robustness?
clarification raised 2004-03-05
Good practice: URI opacity.
Agents making use of URIs MUST NOT attempt to infer properties of the
referenced resource except as licensed by relevant specifications.
This says nothing about not inferring properties of the retrieved
representations.
issue parsia19: Ok to infer properties of retrieved representations?
clarification raised 2004-03-05
[[URI producers should be conservative about the number of
different URIs they produce for the same resource. For example, the
parties responsible for weather.example.com should not use both
"http://weather.example.com/Oaxaca" and
"http://weather.example.com/oaxaca" to refer to the same resource;
agents will not detect the equivalence relationship by following
specifications. On the other hand, there may be good reasons for
creating similar-looking URIs. For instance, one might reasonably create
URIs that begin with "http://www.example.com/tempo" and
"http://www.example.com/tiempo" to provide access to resources by users
who speak Italian and Spanish.]]
Why does the first sentence refer to "URI producers" that "produce" URIs
rather than "resource owners" that "create" them (which would be more
consistent with earlier text). I also note that words "assign",
"create", and "produce" (and possibly others) are all used for what
seems to be the same idea.
Also, the rest of this illustration seems to have a funny interaction
with the URI opacity principle in Section 2.5 (especially the discussion
there about the travel example), since the Section 2.1 text above seems
to suggest there is value in being able to convey information to an
accessing "agent" (a human in this case) via the form of the URI itself
(i.e., if URIs are to be totally opaque to the "agent", why would there
be value in using one language over another?). Of course, this may be
just another problem in allowing "agent" to refer to people. However,
the problem seems somewhat more acute if the result of dereferencing
URIs in different languages is the retrieval of the report in the
corresponding languages because, while this kind of makes sense, it also
invites determining the language of the report from the language of the URI.
issue manola12: URI producers or owners? Relationship to opacity principle?
Evidence of confusion about "agent" including "people"?
clarification raised 2004-03-10
[[It is tempting to guess the nature of a resource by
inspection of a URI that identifies it. However, the Web is designed so
that agents communicate resource state through representations, not
identifiers.]]
This is another place where including people in the definition of
"agents" seems to create a possible difficulty. If agents include
people, then people quite frequently communicate information about the
nature of a resource by inspection of URIs, and it's very helpful. For
example, "http://weather.example.com/oaxaca" certainly suggests that the
resource it identifies has something to do with the weather in oaxaca
(as is noted further on), and that's very useful information (e.g., when
people pass those URIs around). That's certainly information about "the
nature of a resource", and Internet Media Types aren't the only things
relevant to people. This all, of course, reads much better if "agents"
are restricted to software. Pursuing this point in the subsequent text:
[[Agents making use of URIs MUST NOT attempt to infer properties of the
referenced resource except as licensed by relevant specifications.]]
This is good practice for software "agents". For people "agents", given
the "must not", how do you propose to stop them?
Further to this point, the text goes on:
[[The example URI used in the travel scenario
("http://weather.example.com/oaxaca") suggests that the identified
resource has something to do with the weather in Oaxaca. A site
reporting the weather in Oaxaca could just as easily be identified by
the URI "http://vjc.example.com/315". And the URI
"http://weather.example.com/vancouver" might identify the resource "my
photo album."]]
This is certainly true. But while it's good practice for software to
treat URIs opaquely, it seems to me that given the discussion in Section
2.1, which seems to license creating "descriptive" URIs in different
languages to enable people speaking those languages to more easily
access a resource (and which reflects the use of text in URIs as a means
for conveying information to people), you might want to suggest that,
given this "dual purpose" of URIs, it's *not* good practice to use the
URI "http://weather.example.com/vancouver" to identify the resource "my
photo album", even though one could, and it would be irrelevant to software.
Overtaken by events.
issue manola17: "Agent" that includes "people" source of confusion
error decided 2004-03-10
It is tempting to guess the nature of a resource by inspection
of a URI that identifies it. However, the Web is designed so that
agents communicate resource state through representations, not
identifiers. In general, one cannot determine the Internet Media
Type of representations of a resource by inspecting a URI for that
resource. For example, the ".html" at the end of
"http://example.com/page.html" provides no guarantee that
representations of the identified resource will be served with the
Internet Media Type "text/html". The HTTP protocol does not
constrain the Internet Media Type based on the path component of
the URI; the server is free to return a representation in PNG or
any other data format for that URI.
Resource state may evolve over time. Requiring resource owners
to change URIs to reflect resource state would lead to a
significant number of broken links. For robustness, Web
architecture promotes independence between an identifier and the
identified resource.
Good practice: URI
opacity
Agents making use of URIs MUST NOT attempt to
infer properties of the referenced resource except as licensed by
relevant specifications.
The example URI used in the travel scenario
("http://weather.example.com/oaxaca") suggests that the identified
resource has something to do with the weather in Oaxaca. A site
reporting the weather in Oaxaca could just as easily be identified
by the URI "http://vjc.example.com/315". And the URI
"http://weather.example.com/vancouver" might identify the resource
"my photo album."
On the other hand, the URI "mailto:joe@example.com" indicates
that the URI refers to a mailbox. The "mailto" URI scheme
specification authorizes agents to infer that URIs of this form
identify Internet mailboxes.
In some cases, relevant technical specifications license URI
assignment authorities to publish assignment policies. For more
information about URI opacity, see TAG issue metaDataInURI-31.
While I suspect that the older language for describing these semantics
had its own problems, I would be happier either with (1) its return or
(2) some further amplification or clarification of the existing
language.
The answer to the reviewer's question is "yes to 1."
The TAG believes that no change to the document is
required.
issue clark1a: Fragment Identifier Semantics
clarification decided 2004-02-26
See writeup from KC.
issue clark1b: Conflicting secondary resources
clarification raised 2004-02-26
Story
When navigating within the XHTML data that Nadia receives as a
representation of the resource identified by
"http://weather.example.com/oaxaca", Nadia finds that the URI
"http://weather.example.com/oaxaca#tom" refers to information about
tomorrow's weather in Oaxaca. This URI includes the fragment
identifier "tom" (the string after the "#").
The fragment
identifier of a URI allows indirect identification of a
secondary resource by
reference to a primary resource and additional information. The
secondary resource may be some portion or subset of the primary
resource, some view on representations of the primary resource, or
some other resource. The interpretation of fragment identifiers is
discussed in the section on media types and fragment identifier semantics.
See TAG issues abstractComponentRefs-37 and DerivedResources-43.
There remain open questions regarding identifiers on the Web.
The following sections identify a few areas of future work in the
Web community.
The integration of internationalized identifiers (i.e., composed
of characters beyond those allowed by [URI]) into the Web architecture is an important
and open issue. See TAG issue IRIEverywhere-27 for discussion about work going
on in this area.
Proposal" Knowing two URIs identify the same resource does not, however,
mean they are interchangeable. For example, Oaxaca might have
several government-run weather stations, and the measurements take
from each of these might be available from both
weather.example.org and weather.example.com.
The first might call a particular station
http://weather.example.org/stations/oaxaca#ws17a
while the second calls it
http://weather.example.com/rdfdump?region=oaxaca&station=ws17a
These two URIs would both identify the same resource, a certain
collection of weather measuring equipment. They are owl:sameAs
each other. But an attempt to dereference them might well produce
different content produced by different organizations (probably
based originally on the same government-supplied data), so a user
agent which substituted one for the other would be serving its
user poorly.
Overtaken by events.
issue hawke7: 2.7.2. Assertion that Two URIs Identify the Same Resource
proposal decided 2004-02-13
Emerging Semantic Web technologies, including the "Web Ontology
Language (OWL)" [OWL10], define
RDF [RDF10] properties such as
sameAs
to assert that two URIs identify the same
resource or functionalProperty
to imply it.
I concur with the XML Schema WG's comment that the document is too
focused on browser-based interactions rather than on the more general
problem of automata interaction. I understand the TAG's reluctance to
tackle the Web-vs-Web-services issue, but I think it's important for
AWWW to at least give the impression - if not outright say - that there
exists solutions to the automata integration problem within the
constraints/guidelines/principles of Web architecture. Some other
examples in section 3 would help there.
Overtaken by events.
issue baker2: More info on non-browser Web
request decided 2004-03-05
Note: The Web Architecture does not require a formal definition of
the commonly used phrase "on the Web." Informally, a resource is "on
the Web" when it has a URI and an agent can use the URI to retrieve a
representation of it using network protocols (given appropriate access
privileges, network connectivity, etc.).
Given that Web Arch doesn't require it, I would recommend not including
even an informal definition. Especially as it seems wrong, e.g., such
that tel:+1-816-555-1212 and any URN identified resources aren't on the
web (though they are possible subjects and objects (and I guess
predicates; URNs certainly) of RDF assertions). Is there a relation
between being "on the Web" and being, er, part of the "information
space" that is the web?
So, I think this note is strike worthy.
Overtaken by events.
issue parsia20: Drop definition of "on the Web"
proposal decided 2004-03-05
Communication between agents over a network about resources
involves URIs, messages, and data.
Story
Nadia follows a hypertext link labeled "satellite image"
expecting to retrieve a satellite photo of the Oaxaca region. The
link to the satellite image is an XHTML link encoded as
<a href="http://example.com/satimage/oaxaca">satellite image</a>
.
Nadia's browser analyzes the URI and determines that its scheme is "http". The browser
configuration determines how it locates the identified information,
which might be via a cache of prior retrieval actions, by
contacting an intermediary (such as a proxy server), or by direct
access to the server identified by the URI. In this example, the
browser opens a network connection to port 80 on the server at
"example.com" and sends a "GET" message as specified by the HTTP
protocol, requesting a representation of the resource identified by
"/satimage/oaxaca".
The server sends a response message to the browser, once again
according to the HTTP protocol. The message consists of several
headers and a JPEG image. The browser reads the headers, learns
from the "Content-Type" field that the Internet Media Type of the
representation is "image/jpeg", reads the sequence of octets that
comprises the representation data, and renders the image.
This section describes the architectural principles and
constraints regarding interactions between agents, including such
topics as network protocols and interaction styles, along with
interactions between the Web as a system and the people that make
use of it. The fact that the Web is a highly distributed system
affects architectural constraints and assumptions about
interactions.
Note: The Web Architecture does not require a
formal definition of the commonly used phrase "on the Web."
Informally, a resource is "on the Web" when it has a URI and an
agent can use the URI to retrieve a representation of it using
network protocols (given appropriate access privileges, network
connectivity, etc.). See the related TAG issue httpRange-14.
Does a URI *reference* or identify a resource? Is there even a
difference? I'm unsure here, but the choice of words might cause
confusion.
The TAG believes that the document is sufficiently clear
regarding reference/identify as is.
issue kopecky2: 3.1 Reference or Identify?
clarification decided 2004-02-23
Although many URI schemes are named after protocols, this does not imply
that use of such a URI will result in access to the resource via the named
protocol. Even when an agent uses a URI to retrieve a representation, that
access might be through gateways, proxies, caches, and name resolution
services that are independent of the protocol associated with the scheme name.
As phrased, I find this to be at odds with the text that follows, cf.
numbered items 4/5/6. Suggest replace:
use of such a URI will result
with
use of such a URI will necessarily result
The TAG agreed with the proposal to add "necessarily."
issue klyne11: Change "will result" to "will necessarily result"
clarification decided 2004-03-05
Agents may use a URI to access the referenced resource; this is
called dereferencing the URI. Access
may take many forms, including retrieving a representation of
resource state (for instance, by using HTTP GET or HEAD), modifying
the state of the resource (for instance, by using HTTP POST or
PUT), and deleting the resource (for instance, by using HTTP
DELETE).
There may be more than one way to access a resource for a given
URI; application context determines which access mechanism an agent
uses. For instance, a browser might use HTTP GET to retrieve a
representation of a resource, whereas a link checker might use HTTP
HEAD on the same URI simply to establish whether a representation
is available. Some URI schemes set expectations about available
access mechanisms, others (such as the URN scheme [RFC 2141]) do not. Section 1.2.2
of [URI] discusses the separation
of identification and interaction in more detail. For more
information about relationships between multiple access mechanisms
and URI addressability, see the TAG finding "URIs, Addressability, and the use of HTTP GET and
POST".
Although many URI schemes
are named after protocols, this does not imply that use of such a
URI will result in access to the resource via the named protocol.
Even when an agent uses a URI to retrieve a representation, that
access might be through gateways, proxies, caches, and name
resolution services that are independent of the protocol associated
with the scheme name.
Dereferencing a URI generally involves a succession of steps as
described in multiple independent specifications and implemented by
the agent. The following example illustrates the series of
specifications that are involved when a user instructs a user agent
to follow a hypertext link
that is part of an SVG document. In this example, the URI is
"http://weather.example.com/oaxaca" and the application context
calls for the user agent to retrieve and render a representation of
the identified resource.
- Since the URI is part of a hypertext link in an SVG document,
the first relevant specification is the SVG 1.1 Recommendation [SVG11]. Section 17.1
of this specification imports the link semantics defined in XLink
1.0 [XLink10]: "The remote
resource (the destination for the link) is defined by a URI
specified by the XLink href attribute on the 'a' element." The SVG
specification goes on to state that interpretation of an
a
element involves retrieving a representation of a
resource, identified by the href
attribute in the
XLink namespace: "By activating these links (by clicking with the
mouse, through keyboard input, voice commands, etc.), users may
visit these resources."
- The XLink 1.0 [XLink10]
specification, which defines the
href
attribute in
section 5.4, states that "The value of the href attribute must be a
URI reference as defined in [IETF RFC 2396], or must result in a
URI reference after the escaping procedure described below is
applied."
- The URI specification [URI]
states that "Each URI begins with a scheme name that refers to a
specification for assigning identifiers within that scheme." The
URI scheme name in this example is "http".
- [IANASchemes] states
that the "http" scheme is defined by the HTTP/1.1 specification
(RFC 2616 [RFC2616], section
3.2.2).
- In this SVG context, the agent constructs an HTTP GET request
(per section 9.3 of [RFC2616])
to retrieve the representation.
- Section 6 of [RFC2616]
defines how the server constructs a corresponding response message,
including the 'Content-Type' field.
- Section 1.4 of [RFC2616]
states "HTTP communication usually takes place over TCP/IP
connections." This example does not address that step in the
process, or other steps such as Domain Name System
(DNS) resolution.
- The agent interprets the returned representation according to
the data format specification that corresponds to the
representation's Internet Media Type (the value of the HTTP
'Content-Type') in the relevant IANA registry [MEDIATYPEREG].
On the other hand, it is considered an error if the semantics of the
fragment identifiers used in two representations of a secondary resource
are inconsistent.
This seems a rather odd statement to make (specifically: "it is considered
an error ...", because there is no specific way to determine if the
would-be erroneous condition actually arises. Suggest: drop this
paragraph; the intent is clear enough from the following good practice point.
The TAG agrees with the reviewer's point, but has decided to
keep the text and clarify it. At their 13 May 2004 ftf meeting,
the TAG resolved:
To remove "During a retrieval action" from the 10 May
2004 Editor's Draft.
Delete from "Note..." to end of paragraph.
Overtaken by events.
issue klyne12: Proposal to drop paragraph on inconsistent frag ids
error decided 2004-03-05
The document list SOAP beside HTTP, FTP, NNTP and SMTP but the IETF see
SOAP as a different thing than the other protocols as everyone else is
transported on TCP while SOAP need some more "things" between TCP and
SOAP. Normally, SOAP is transported on HTTP, SMTP or BEEP according to
various specifications. This might be confusing for the reader if it is
not clarified.
issue falstrom2: SOAP as a different thing than the other protocols
clarification raised 2004-04-21
The Web's protocols (including HTTP, FTP, SOAP, NNTP, and SMTP)
are based on the exchange of messages. A message may include representation
data as well as metadata about the resource (such as the
"Alternates" and "Vary" HTTP headers), the representation, and the
message (such as the "Transfer-encoding" HTTP header). A message
may even include metadata about the message metadata (for
message-integrity checks, for instance).
Two important classes of message are those that request a
representation of a resource, and those that return the result of
such a request. Such a response message (for example, a response to
an HTTP GET) includes a representation of the state
of the resource. A representation is an octet sequence that
consists logically of two parts:
- Representation data,
electronic data about resource state, expressed in one or more formats used separately or in
combination, and
- Representation
metadata. One important piece of metadata is the Internet Media Type,
discussed below.
Agents use representations to modify as well as retrieve
resource state. Note that even though the response to an HTTP POST
request may contain the above types of data, the response to an
HTTP POST request is not necessarily a representation of the state
of the resource identified in the POST request.
"Good practice: Fragment identifier consistency:
A resource owner who creates a URI with a fragment identifier and who uses
content negotiation to serve multiple representations of the identified
resource SHOULD NOT serve representations with inconsistent fragment
identifier semantics.
If the term "consistent" is here used in a technical sense, please explain what
it means and how inconsistencies are to be detected. If it is used in a
non-technical sense, please explain what it means.
We note that if fragment identifiers must be usable in more than one MIME
type, the result will be that the only fragment identifiers effectively allowed
will be bare names (or other fragment identifier syntaxes incapable of knowing
about or exploiting any of the structure of the data); it seems undesirable to
impoverish the URI identifier space in this way.
In general, content negotiation (like server-side browser sniffing) does not
seem to us to be an obviously and universally good thing: it leads to
unpredictable context-dependent results in ways that are actively hostile to
some machine-driven applications, and it interacts in this pernicious way with
fragment identifiers. On the other hand, if content negotiation is indeed
important to make things work, perhaps some advice on whether newly invented
schemes should support the equivalent of content negotiation is in order.
This is not a viable best practice recommendation, except as a bandaid, as it
tightly couples URIs to representations, and constrains representation
evolvability in untenable ways. This appears to highlight a weakness in the Web
architecture that should be explicitly addressed.
See msm13.
issue schema4: [3.3 Good practice: Fragment Identifier Consistency]
clarification decided 2004-03-04
The Internet Media Type [RFC2046]) of a representation determines which
data format specification(s) provide the authoritative
interpretation of the representation data (including fragment identifier syntax
and semantics, if any). The IANA registry [MEDIATYPEREG] maps media
types to data formats.
See the TAG finding "Internet Media Type registration, consistency of
use" for more information about media type
registration.
This sentence seems misleading, as if one can infer something
about the nature of a secondary resource by interpreting a
URI reference with fragement identifier.
One cannot infer the nature of any URI denoted resource based
either on the URI *or* based on any representation obtained by
dereferencing that URI, either directly, or for URI references
with fragment identifiers, by first dereferencing the base URI
and interpreting the fragment in terms of the MIME type of
the returned represenatation.
This last sentence could either be removed or clarified/reworked.
issue stickler8: Section 3.3.1, last para, last sentence: Nature of secondary resource not known through URI
clarification raised 2004-02-03
Question: are the methods PUT, POST or DELETE meaningful for
URI references with fragment identifiers, in terms of interacting
with the state of the secondary resources denoted? If not, then
it seems there is a good principle that one should use URIs
without fragment identifiers whenever possible to maximise
the utililty of those URIs.
Overtaken by events.
issue stickler9: Good practice note on URIs without fragids?
proposal decided 2004-02-03
[3.3.1] says:
"Per [URI], in order to know the authoritative interpretation
of a fragment identifier, one must dereference the URI containing the
fragment identifier. The Internet Media Type of the retrieved
representation specifies the authoritative interpretation of the fragment
identifier. Thus, in the case of Dirk and Nadia, the authoritative
interpretation depends on the SVG specification, not the XHTML
specification (i.e., the context where the URI appears)."
But this seems to contradict the referenced URI specification, which says:
"The semantics of a fragment identifier are defined by the set of
representations that might result from a retrieval action on the primary
resource. The fragment's format and resolution is therefore dependent on
the media type [RFC2046] of the retrieved representation, even though such
a retrieval is only performed if the URI is dereferenced."
The latter says clearly you need not dereference. On the contrary, you
must know the range of representations that you might get _if_ you tried
to dereference.
Decided just prior to the Ottawa f2f. Incorporating CL’s edits of 3.3.1
has fixed this issue (by making the webarch document the same as 2396bis.
issue schema5: [3.3.1] Inconsistency with RFC2396bis about frag id meaning?
error decided 2004-03-04
[3.3.1] says
"Given a URI "U#F", and a representation retrieved by dereferencing
URI "U", the (secondary) resource identified by "U#F" is determined by
interpreting "F" according to the specification associated with the
Internet Media Type of the representation."
What if the scheme is not HTTP and media types are not used (e.g. because the
URI uses the file: scheme or for some other reason)? Do fragment identifiers
work only with media-typed representations? We hope not.
issue schema6: [3.3.1] Do fragment identifiers
work only with media-typed representations?
clarification raised 2004-03-04
This reader wonders at this point whether there are any constraints on
the interpretation which the definer of a media type can place on
fragment identifiers for the media type. Can one, consistent with Web
architecture (if not necessarily with good design) define a media type
(let us say application/sortes) where the meaning of a fragment
identifier is identified by taking a checksum of the octet string
returned, the conventional numerological value of the string used as
the fragment identifier, multiplying the one by the other, and using
the product to look up a passage in a copy of Vergil, with the
stipulation that the meaning of the fragment identifier is "the
meaning of the passage found by this method"?
issue msm12: WD-webarch-20031209, Section 3.3.1, para 1: Are there constraints on the interpretation of fragment identifiers?
error raised 2004-03-04
[[Note that one can use a URI with a fragment identifier
even if one does not have a representation available for interpreting
the fragment identifier (one can compare two such URIs, for example).
Parties that draw conclusions about the interpretation of a fragment
identifier without retrieving a representation do so at their own risk;
such interpretations are not authoritative.]]
This is a place where some qualifying context about the nature of the
Web to which this architecture applies would have been helpful. For
example, suppose I have a collection of RDF or OWL statements having as
subjects the URI "http://www.example.com/images/nadia#hat", and the
RDF/OWL statements assert that the subject is of class "Hat" in some
ontology, that it's blue, and so on. On one hand, it seems as if one
could reasonably draw conclusions about the interpretation of this
fragment identifier (or rather the whole URI including it) *from the
RDF/OWL* without dereferencing the URI (using the URI to retrieve a
representation, whose media type specifies the authoritative
interpretation), assuming that the RDF/OWL itself is from a sufficiently
"authoritative" (in some sense) representation somewhere. Saying "such
intepretations are not authoritative" without any further qualification
or discussion, while it makes perfect sense given the way the Web works
now, doesn't seem to take such additional usage (which, after all, is
described in W3C Recommendations) into account.
The TAG intends to clarify the text to indicate that parties
who draw conclusions from syntactic analysis of URIs alone do so
at their own risk.
issue manola19: Please provide qualifying context about the nature of the Web
clarification decided 2004-03-10
Story
In one of his XHTML pages, Dirk links to an image that Nadia has
published on the Web. He creates a hypertext link with <a
href="http://www.example.com/images/nadia#hat">Nadia's
hat</a>
. Nadia serves an SVG representation of the
image (with Internet Media Type "image/svg+xml"), so the
authoritative interpretation of the fragment identifier "hat"
depends on the SVG specification.
Per [URI], in order to know the
authoritative interpretation of a fragment identifier, one must
dereference the URI containing the fragment identifier. The
Internet Media Type of the retrieved representation specifies the
authoritative interpretation of the fragment identifier. Thus, in
the case of Dirk and Nadia, the authoritative interpretation
depends on the SVG specification, not the XHTML specification
(i.e., the context where the URI appears).
Given a URI "U#F", and a representation retrieved by
dereferencing URI "U", the (secondary) resource identified by "U#F" is
determined by interpreting "F" according to the specification
associated with the Internet Media Type of the representation.
Interpretation of the fragment identifier during a retrieval
action is performed solely by the agent; the fragment identifier is
not passed to other systems during the process of retrieval. This
means that some intermediaries in the Web architecture (such as
proxies) have no interaction with fragment identifiers and that
redirection (in HTTP [RFC2616],
for example) does not account for them.
Note that one can use a URI with a fragment identifier even if
one does not have a representation available for interpreting the
fragment identifier (one can compare two such URIs, for example).
Parties that draw conclusions about the interpretation of a
fragment identifier without retrieving a representation do so at
their own risk; such interpretations are not authoritative.
Section 3.3.2, para 3 ("On the other hand ...") says "it is considered
an error if the semantics of the fragment identifiers used in two
representations of a secondary resource are inconsistent." What does
"inconsistent" mean here? How do the responsible parties determine
whether a given plan of using fragment identifiers is or is not
compliant with this rule?
Suppose that an internet media type (application/my-magic-mediatype)
is defined with the basic rule that it is represented by servers as an
XML data stream rooted in a particular namespace (e.g. one for
purchase orders), and that its fragment identifers are syntactically
identical to those of the application/xml media type, but denote not
individual XML elements or attributes but instead whatever real-world
objects are represented by those elements or attributes (a customer,
an invoice, a payment obligation, ...), if any, or else have no
denotation.
Suppose further that a resource owner serves the same octet sequence
as two different media types (e.g. application/xml and
application/my-magic-mediatype). Is the resource owner (a) obeying
the principle enunciated here, given that the denotations of the
fragment identifier in the two cases stand in a predictable and
plausible relation to each other? or (b) violating this principle,
given that in the two cases the fragment identifier identifies objects
of radically different classes (XML elements on the one hand, people
and other non-XML entities on the other)?
The TAG resolved to make the following changes to the document:
Include three examples about coneg as proposed by TBL.
State clearly that it's an error for representation providers to provide representations with inconsistent frag id semantics.
Talk about consistency as being in the eye of the
representation provider (not forgetting that users also have
expectations). Thus, the answer to the reviewer's last question
is: the notion of consistency is in the eye of the representation
provider.
issue msm13: WD-webarch-20031209, Section 3.3.2, para 3: Consistency of fragment identifiers
error decided 2004-03-04
Story
Dirk informs Nadia that he would also like her to make her
images available in formats other than SVG. For the same resource,
Nadia makes available a PNG image as well. Dirk's user agent and
Nadia's server negotiate so that the user agent retrieves a
suitable representation. Which specification specifies the
authoritative interpretation of the "hat" fragment identifier, the
PNG specification or the SVG specification?
For a given resource, an agent may have the choice between
representation data in more than one data format (through HTTP
content negotiation, for example). Since different data formats may
define different fragment identifier semantics, it is important to
note that by design, the secondary resource identified by a URI
with a fragment identifier is expected to be the same across all
representations. Thus, if a fragment has defined semantics in any
one representation, the fragment is identified for all of them,
even though a particular data format may not be able to represent
it.
Suppose, for example, that the authority responsible for
"http://weather.example.com/oaxaca/map#zicatela" provides
representations of the resource identified by
http://weather.example.com/oaxaca/map using three image formats:
SVG, PNG, and JPEG/JFIF. The SVG specification defines semantics
for fragment identifiers while the other specifications do not. It
is not considered an error that only one of the data formats
specifies semantics for the fragment identifier. Because the Web is
a distributed system in which formats and agents are deployed in a
non-uniform manner, the architecture allows this sort of
discrepancy. This design allows authors to take advantage of new
data formats while still ensuring reasonable backward-compatibility
for users whose agents do not yet implement them.
On the other hand, it is considered an error if the semantics of
the fragment identifiers used in two representations of a secondary
resource are inconsistent.
Good practice: Fragment identifier consistency
A resource owner who creates a URI with a
fragment identifier and who uses content negotiation to serve
multiple representations of the identified resource SHOULD NOT
serve representations with inconsistent fragment identifier
semantics.
Inconsistent fragment identifier semantics are one potential
source of URI
ambiguity.
See related TAG issues httpRange-14 and RDFinXHTML-35.
The reviewer raised a number of points about URI ownership
and authority in sections 3.4 para 1 and para 2.
issue stickler7: Section 3.4, para 2: URI ownership questions
error raised 2004-02-03
Given all these problems I don't see how the architectural principles of
the World Wide Web can be so dependent on resource ownership. Many of the
uses of ``resource owner'' in the document do not make sense at all and
need to be removed from the document.
The term "Resource owner" has been replaced with "URI owner".
issue pps1: Ownership and authority
error decided 2004-02-12
Successful communication between two parties using a piece of
information relies on shared understanding of the meaning of the
information.
I'll spare you the critical analysis of the opening platitude of a
section of a document. It's not clear to me, however, that they are, in
fact, useful.
Overtaken by events.
issue parsia21: Drop sentence on successful communication
proposal decided 2004-03-06
Arbitrary numbers of independent parties can identify and
communicate about a Web resource. To give these parties the confidence
that they are all talking about the same thing when they refer to "the
resource identified by the following URI ..." the design choice for the
Web is, in general, that the owner of a resource assigns the
authoritative interpretation of representations of the resource.
So, this is "in general", which suggests that "in specific" this might
not be the case. For example, when the owner of the resource, uh, *gets
it wrong*. One example is ""Inconsistencies between Metadata and
Representation Data"".
So, let's generalize. What if the owner of the resource gets the
*information* encoded in the message wrong? Is that authoritative? What
would that mean? Suppose I retrieve a representation of my purchase
order, does the resource owner have an authorative interprestion of the
*meaning of the order*, interpreting my "5 very cheap things, please"
as "5000 hugely expensive things, you bastard!!!"?
There is a sensible thing buried in here, I think. I think it's quite
right to be judicious in ignoring narrow, well understood and somewhat
verifiable represenation metadata. One example (if there were a media
type for OWL-DL and OWL-Full as well as RDF) would be interpreting a
retrieved ontology as OWL-DL vs. just as RDF. Different inferences are
licenced, and there are times where one might want to publish the
ontology for RDF interpretation only.
Of course, really, it would be best if the format provided a way to
specify this.
issue parsia22: What does "in general" mean? Would the case be different "in specific"?
clarification raised 2004-03-06
Successful communication between two parties using a piece of information
relies on shared understanding of the meaning of the information. Arbitrary
numbers of independent parties can identify and communicate about a Web
resource. To give these parties the confidence that they are all talking
about the same thing when they refer to "the resource identified by the
following URI ..." the design choice for the Web is, in general, that the
owner of a resource assigns the authoritative interpretation of
representations of the resource.
I recall that TimBL and Pat Hayes had a lengthy debate about
something rather like this. See Thread
with some indication of consensus around
this
mail and
this email.
I am not sure that the above text really captures the subtlety of this
discussion. As Pat Hayes noted:
>Note though that other non-RDF systems may and do use URIs. So the
>principle can must be a general one of web architecture.
Names are global in scope. OK, though (in the other branch of the
discussion) I don't think this is going to be feasible, myself, if
taken strictly. Still, I agree, its not a bad place to start, as long
as we understand that we will eventually have to replace it with
something more sophisticated.
issue klyne13: Text on communication between two parties misses mark about
global names
clarification raised 2004-03-05
First para: [[To give these parties the confidence that
they are all talking about the same thing when they refer to "the
resource identified by the following URI ..." the design choice for the
Web is, in general, that the owner of a resource assigns the
authoritative interpretation of representations of the resource.]]
The text "owner of a resource" links to Section 2.2 titled "URI
Ownership". So why say "owner of a resource" rather than "owner of a
URI"? Also, Section 3.3 just got through telling us that if the URI
contains a fragment identifier, then the Internet Media Type of the
retrieved representation specifies the authoritative interpretation of
the fragment identifier. I realize that in one case it's the
authoritative interpretation *of the fragment* and in the other its the
authoritative interpretation *of representations of the resource*, but
the use of "authoritative interpretation" in both places (particularly
when they're so close together) seems potentially confusing.
issue manola21: Owner of resource v. owner of URI
clarification raised 2004-03-10
Successful communication between two parties using a piece of
information relies on shared understanding of the meaning of the
information. Arbitrary numbers of independent parties can identify
and communicate about a Web resource. To give these parties the
confidence that they are all talking about the same thing when they
refer to "the resource identified by the following URI ..." the
design choice for the Web is, in general, that the owner of a resource assigns
the authoritative interpretation of representations of the
resource.
In our travel scenario, the
authority responsible for "weather.example.com" has license to
create representations of this resource. Which representation(s)
Nadia receives depends on a number of factors, including:
- Whether the authority responsible for "weather.example.com"
responds to requests at all;
- Whether the authority responsible for "weather.example.com"
makes available one or more representations for the resource
identified by "http://weather.example.com/oaxaca";
- Whether Nadia has access privileges to such representations
(see the section on linking and
access control);
- If the authority responsible for "weather.example.com" has
provided more than one representation (in different formats such as
HTML, PNG, or RDF, or in different languages such as English and
Spanish), the resulting representation may depend on negotiation
between the user agent and server that occurs as part of the HTTP
transaction.
- When Nadia made the request. Since the weather in Oaxaca
changes, Nadia should expect that representations will change over
time.
See TAG issues contentTypeOverride-24 and rdfURIMeaning-39.
Should this be reworded in "User agents MUST NOT silently ignore
authoritative metadata."? If so, is it still worth mentioning? (ie, is
there any point in saying "do what the protocol says to do"?)
issue dhm6: Use of "server" in "...authoritative server metadata..."
clarification raised 2004-02-20
[3.4.1] Says that user agents must not silently ignore server metadata.
Metadata covers a lot of ground: what is its scope? May a user agent ignore a
server-specified DTD or Schema and choose to apply a local variant
(e.g. because the user so specifies in a local configuration file or a
launch-time option)? Why not?
issue schema7:
[3.4.1] What is scope of metadata?
clarification raised 2004-03-04
If the sender is not a trusted authority, it would be foolish for the recipient
to rely on the principle of sender-makes-right. A well written production
server runs an unacceptable risk if it accepts at face value everything
an untrusted client tells it. Must it inform the client each time it follows
its own instructions by ignoring client information?
Overtaken by events.
issue schema8: [3.4.1] Authority and trust
error decided 2004-03-04
(We also note in passing that focusing on the interactions between
"user-agents" and "servers" is fundamentally limiting in the sense mentioning
in our opening comment. Are not peer-to-peer interactions covered by this
architecture?)
Overtaken by events.
issue schema9: [3.4.1] Are peer-to-peer interactions covered?
error decided 2004-03-04
[[User agents should detect such inconsistencies but
should not resolve them without involving the user.]]
Now the term is "user agent" rather than "agents". Is there some
particular reason for distinguishing between these terms?
issue manola22: "Agent" or "user agent" meant?
clarification raised 2004-03-10
We believe that charset handling, the way it is currently
specified in various specs (i.e. outer information has priority to
inner information), is basically okay (with the exception
of the (irrelevant in practice) iso-8859-1 default given in the
HTTP spec, and the us-ascii default for text/foo+xml, which makes
text/foo+xml rather useless. It might be good to reach some consensus
about this, and document it.
Overtaken by events.
issue i18nwg19: text/foo+xml considered useless?
proposal decided 2004-03-18
"Furthermore, server managers can help reduce the
risk of error through careful assignment of representation metadata
(especially that which applies across representations). The section
on media types for XML presents an example of reducing the risk of
error by providing no metadata about character encoding when serving
XML."
This seems to pick out a somewhat arbitrary detail, without
stating the much more important underlying principles, such
as:
Always make sure you know what the character encoding of a
document or message is.
Make sure that it's easy for server managers and authors to configure
and test metadata on the server, to make sure it's
correct.
No arbitrary defaults for specs
No out-of-the-box with arbitrary settings
The description of the example also is too general, because there
are ways to implement/operate a server that make it much more
easy/appropriate to put the 'charset' into the header than into
the body, e.g. when producing content in a pipeline from a
database.
issue i18nwg20: text/foo+xml considered useless?
proposal raised 2004-03-18
Inconsistencies between the data format of representation data
and assigned representation metadata do occur. Examples that have
been observed in practice include:
- The actual character encoding of a representation is
inconsistent with the charset parameter in the representation
metadata.
- The namespace of the root element of XML representation data is
inconsistent with the value of the 'Content-Type' field in HTTP
headers.
User agents should detect such inconsistencies but should not
resolve them without involving the user.
Principle: Authoritative server
metadata
User agents MUST NOT silently ignore
authoritative server metadata.
Thus, for example, if the parties responsible for
"weather.example.com" mistakenly label the satellite photo of
Oaxaca as "image/gif" instead of "image/jpeg", and if Nadia's
browser detects a problem, Nadia's browser must not silently ignore
the problem and render the JPEG image. Nadia's browser can notify
Nadia of the problem or notify Nadia and take corrective action. Of
course, user agent designers should not ignore usability issues
when handling this type of error; notification may be discreet, and
handling may be tuned to meet the user's preferences. See the TAG
finding "Client handling of MIME headers" for more
in-depth discussion and examples.
Furthermore, server managers can help reduce the risk of error
through careful assignment of representation metadata (especially
that which applies across representations). The section on media types for XML
presents an example of reducing the risk of error by providing no
metadata about character encoding when serving XML.
[3.5] says that an interaction is safe if the agent does not incur any
obligation beyond the interaction. This seems too broad; the TAG has been
advised of other scenarios. For example, if each access to a resource needs to
be authenticated at the application (not https) level, but no ongoing
obligation is established, this rule suggests that the retrieval is safe. Is
that really true? We wouldn't want the access cached, except perhaps by an
application-specific cache that knew our authorization rules. Consider also
the case where the provider of the resource needs to log the access. The issue
is an important one, and the summary given here comes close to being an
oversimplification.
issue schema10: [3.5] Breadth of "safe" interactions
error raised 2004-03-04
[[Nadia's retrieval of weather information (an example of
a read-only query or lookup) qualifies as a "safe" interaction; a safe
interaction is one where the agent does not incur any obligation beyond
the interaction. An agent may incur an obligation through other means
(such as by signing a contract). If an agent does not have an obligation
before a safe interaction, it does not have that obligation afterwards]]
Here, "agent" is used in a sense where it might well be a person
("signing a contract"). Can software agents "incur obligations" in the
sense used here?
issue manola23: Can software agents incur obligations? ("agent" or "user agent")
clarification raised 2004-03-10
[[Other Web interactions resemble orders more than queries.]]
Is this "orders" in the sense of "placing an order", "that's an order,
soldier", or both?
issue manola24: What meaning(s) of "order" is meant?
clarification raised 2004-03-10
Story
Nadia decides to book a vacation to Oaxaca at
"booking.example.com." She enters data into a series of online
forms and is ultimately asked for credit card information to
purchase the airline tickets. She provides this information in
another form. When she presses the "Purchase" button, her browser
opens another network connection to the server at
"booking.example.com" and sends a message composed of form data
using the POST method. Note that this is not a safe interaction; Nadia
wishes to change the state of the system by exchanging money for
airline tickets.
The server reads the POST request, and after performing the
booking transaction returns a message to Nadia's browser that
contains a representation of the results of Nadia's request. The
representation data is in XHTML so that it can be saved or printed
out for Nadia's records. Note that neither the data transmitted
with the POST nor the data received in the response necessarily
correspond to any resource named by a URI.
Nadia's retrieval of weather information (an example of a
read-only query or lookup) qualifies as a "safe" interaction; a safe
interaction is one where the agent does not incur any
obligation beyond the interaction. An agent may incur an obligation
through other means (such as by signing a contract). If an agent
does not have an obligation before a safe interaction, it does not
have that obligation afterwards.
Other Web interactions resemble orders more than queries. These
unsafe interactions may
cause a change to the state of a resource and the user may be held
responsible for the consequences of these interactions. Unsafe
interactions include subscribing to a newsletter, posting to a
list, or modifying a database.
Safe interactions are important because these are interactions
where users can browse with confidence and where agents (including
search engines and browsers that pre-cache data for the user) can
follow links safely. Users (or agents acting on their behalf) do
not commit themselves to anything by querying a resource or
following a link.
Principle: Safe retrieval
Agents do not incur obligations by retrieving
a representation.
For instance, it is incorrect to publish a link that, when
followed, subscribes a user to a mailing list. Remember that search
engines may follow such links.
For more information about safe and unsafe operations using HTTP
GET and POST, and handling security concerns around the use of HTTP
GET, see the TAG finding "URIs, Addressability, and the use of HTTP GET and
POST".
Para 3 seems to contradict the last statement of para 1. In para 1
it is said that POST requests and responses cannot be referenced
by URIs, yet para 3 describes a means to do just that.
It seems that what is meant to be said in para 1 is that, per the
default behavior of POST, the request and response are not normally
assigned distinct URIs by which they can be later referenced. ???
The Editor will review 3.5.1 and propose a revision to the TAG that more clearly distinguishes the two topics of bookmarking results of POST and paper trails (both safe and unsafe contexts).
issue stickler6: Section 3.5.1: POST requests and URIs
error decided 2004-02-03
[3.5.1] Says:
"There are mechanisms in HTTP, not widely deployed, to remedy this
situation. HTTP servers can assign a URI to the results of a POST
transaction using the "Content-Location" header (described in section
14.14 of [RFC2616]), and allow authorized parties to retrieve a record of
the transaction thereafter via this URI (the value of URI persistence is
apparent in this case). User agents can provide an interface for managing
transactions where the user agent has incurred an obligation on behalf of
the user."
Yes, but is this saying specifically that content-location SHOULD be used?
If so, so. If not, then make clearer what's intended.
issue schema11: [3.5.1] Best practice that content-location SHOULD be used?
clarification raised 2004-03-04
Story
Nadia pays for her airline tickets online (through a POST
interaction as described above). She receives a Web page with
confirmation information and wishes to bookmark it so that she can
refer to it when she calculates her expenses. Although Nadia can
print out the results, or save them to a file, she cannot bookmark
the results. In fact, neither the POST request, which expresses her
commitment to pay, nor the airline company's response, which
expresses its acknowledgment and its own commitment, can be
referenced by URIs.
It is a breakdown of the Web architecture if agents cannot use
URIs to reconstruct a "paper trail" of transactions, i.e., to refer
to receipts and other evidence of accepting an obligation. Indeed,
each electronic mail message includes a unique message identifier,
one reason why email is so useful for managing accountability
(since, for example, email can be copied to public archives). On
the other hand, HTTP servers and deployed user agents do not
generally keep records of POST transactions, making it difficult
for all parties to reconstruct a series of transactions.
There are mechanisms in HTTP, not widely deployed, to remedy
this situation. HTTP servers can assign a URI to the results of a
POST transaction using the "Content-Location" header (described in
section 14.14 of [RFC2616]),
and allow authorized parties to retrieve a record of the
transaction thereafter via this URI (the value of URI persistence is
apparent in this case). User agents can provide an interface for
managing transactions where the user agent has incurred an
obligation on behalf of the user.
How can "...they both conclude that the resource is unreliable"
since (a) they cannot determine from either the URI or any
representation what resource the URI actually denotes, and
(b) the behavior of a given server providing access to
representations of a resource is all that can be unreliable.
The resource itself is (typically) not part of the system.
A better example of "unreliability" might be a service which
frequently returns 404 responses rather than useful representations
or one which often returns representations which do not
accurately reflect the state of the weather in Oaxaca, or
one which sometimes returns XHTML but other times returns
plain text. Yet in such cases, it is the service resolving
the URI to representations that is unreliable or inconsistent,
not the resource itself.
The Editor will s/unreliable/unpredictable.
issue stickler5: Section 3.6, para 1: Fix "resource is unreliable"
error decided 2004-02-03
Story
Since Nadia finds the Oaxaca weather site useful, she emails a
review to her friend Dirk recommending that he check out
'http://weather.example.com/oaxaca'. Dirk clicks on the link in the
email he receives and is surprised to see his browser display a
page about auto insurance. Dirk confirms the URI with Nadia, and
they both conclude that the resource is unreliable. Although the
managers of Oaxaca have chosen the Web as a communication medium,
they have lost two customers due to ineffective resource
management.
The usefulness of a resource depends on good management by its
owner. As is the case with many human interactions, confident
interactions with a resource depend on stability and
predictability. The value of a URI increases with the
predictability of interactions using that URI. Avoiding unnecessary
URI aliases is one aspect
of proper resource management.
Good practice: Consistent
representation
Publishers of a URI SHOULD provide
representations of the identified resource consistently and
predictably.
This section discusses important aspects of representation
management.
Owners of URIs should be free to decide whether any representations
are made available, and should *NOT* feel obligated to provide
representations if they themselves have no need to do so. URIs
without representations may simply be less valueable/useful
than those with representations. But it shouldn't be considered
bad practice to not provide any representations.
I recommend that this particular "good practice" be removed,
even though language should remain which reflects that URIs
with accessible representations are usually more useful than
those without.
The TAG intended to indicate that people SHOULD provide
representations; the community is poorer where representations are
not available. "SHOULD" allows URI owners to make a choice.
issue stickler4: Section 3.6.1 Proposed removal of good practice note
request decided 2004-02-03
[3.6.1]
Good practice: Available representation
Publishers of a URI SHOULD provide representations of the identified
resource.
We are concerned that this appears to privilege dereferenceable URIs over other
sectors of URI space; in particular to denigrate all uses of URIs as pure
(non-dereferenceable) identifiers, such as namespaces, QT functions, SOAP
extensions, SAX properties, etc. etc. There are often pragmatic reasons for
declining to make URIs dereferencable (unwelcome load on servers, for example,
or identifiers that are intended purely for software systems and that humans
will never see or need to dereference to obtain useful information). It seems
to us that at least a coherent story should be told about how this
pure-identification use fits into the overall Web architecture.
Decided at the Ottawa f2f. Added good practice about unnecessary
network access.
issue schema12: [3.6.1]
[3.6.1] Good practice: Available representation. Too preferential to dereferencable URIs
error decided 2004-03-04
The authority responsible for a resource may supply zero or more
representations of a resource. The authority is also responsible
for accepting or rejecting requests to modify a resource, for
example, by configuring a server to accept or reject HTTP PUT data
based on Internet Media Type, validity constraints, or other
constraints.
Good practice: Available representation
Publishers of a URI SHOULD provide
representations of the identified resource.
Scenarios appear to be based on "static" URI's; i.e.: "persistent" URI's (reference chapter 3.6.2). Suggest discussion of "dynamically generated" URI's; particularly addressing situations where dynamic URI's are bookmarked or forwarded by a user.
issue diwg1: Add scenario(s) with dynamically generated URI
proposal raised 2004-02-25
There are strong social expectations that once a URI identifies a
particular resource, it should continue indefinitely to refer to that
resource; this is called URI persistence. URI persistence is a matter of
policy and commitment on the part of authorities servicing URIs. The choice
of a particular URI scheme provides no guarantee that those URIs will be
persistent or that they will not be persistent.
The terminology "authorities servicing URIs" seems to be not consistent
with that used elsewhere; e.g. "authority responsible for a resource" at
the start of section 3.6.1., and "URI producers" in section 2.1.
As I draft this, I think there's maybe a deeper omission here: a lack of
separation between the owner or authority responsible for a resource, and
the authority for a particular part of URI space that may be used to
identify a resource. (cf. also my previous comment above.) If not
clarified, I think this could be a source of continuing miscommunication.
The TAG believes that the following minor changes to the
document are sufficient to address the reviewer's concern.
In 2.2, change to "(for example, to a server manager or someone who has been delegated part of the URI space on a given Web server).
s/authorities servicing URIs/URI owners
issue klyne15: Lack of separation between owner of a resource
and authority for a part of URI space used to identify a resource?
error decided 2004-03-05
I made a note to myself at the end of this section:
"Maye add a comment about metadata consistency and problems that may occur
of a resource is not persistent"
but now I not sure what it is I meant by this.
I think I may have been thinking about a case where RDF is used to describe
some resource, but the resource whose representation is served at a given
URI is allowed to change over time. Then, any RDF that uses said URI to
describe the resource at some point in time becomes completely incorrect if
the URI is assigned to a different resource. Is it worth trying to make a
point that the value of RDF descriptions depends to a considerable extent
on the stability/persistence of the URIs used?
Decided at Ottawa f2f. (No action.)
issue klyne17: Worth pointing out value of RDF descriptions depends
on URI persistence?
proposal decided 2004-03-05
There are strong social expectations that once a URI identifies
a particular resource, it should continue indefinitely to refer to
that resource; this is called URI persistence. URI
persistence is a matter of policy and commitment on the part of
authorities servicing URIs. The choice of a particular URI scheme
provides no guarantee that those URIs will be persistent or that
they will not be persistent.
Since representations are used to communicate resource state,
persistence is directly affected by how well representations are
served. Service breakdowns include:
- Inconsistent representations served. Note the difference
between a resource owner changing representations predictably in
light of the nature of the resource (the changing weather of
Oaxaca) and the owner changing representations arbitrarily.
- Improper use of content negotiation, such as serving two images
as equivalent through HTTP content negotiation, where one image
represents a square and the other a circle.
HTTP [RFC2616] has been
designed to help manage URIs. For example, HTTP redirection (using
the 3xx response codes) permits servers to tell an agent that
further action needs to be taken by the agent in order to fulfill
the request (for example, the resource has been assigned a new
URI). In addition, content negotiation also promotes consistency,
as a site manager is not required to define new URIs when adding
support for a new format specification. Protocols that do not
support content negotiation (such as FTP) require a new identifier
when a new data format is introduced.
For more discussion about URI persistence, see [Cool].
It might help clarify the point made in this section if
some examples of mistaken attempts to restrict the use of URIs were
given, rather than just the building security analogy. Also, it's not
clear whether or not the principle described here (and the further
discussion in the "Deep Linking" finding) deals with all possible
situations of this sort. For example, it certainly used to be the case
(and may be the case now) that US Defense Department documents could not
only have a security classification, but their *titles* might also have
a security classification (that is, the *existence* of the document was
classified). A classified document with an unclassified title could be
referenced in the usual way, but a reader without the necessary
clearance would be unable to access the referenced document (this would
correspond to the situations already described). On the other hand,
classifying the title of the document would prevent the reader from even
seeing the reference without the necessary clearance. How would you
suggest handling this situation (admittedly, opagueness of URIs would help!)
issue manola27: Provide examples of mistaken attempts to restrict URI usage
proposal raised 2004-03-10
It is reasonable to limit access to a resource (for commercial
or security reasons, for example), but it is unreasonable to
prohibit others from merely identifying the resource.
As an analogy: The owners of a building might have a policy that
the public may only enter the building via the main front door, and
only during business hours. People who work in the building and who
make deliveries to it might use other doors as appropriate. Such a
policy would be enforced by a combination of security personnel and
mechanical devices such as locks and pass-cards. One would not
enforce this policy by hiding some of the building entrances, nor
by requesting legislation requiring the use of the front door and
forbidding anyone to reveal the fact that there are other doors to
the building.
Story
Nadia and Dirk both subscribe to the "weather.example.com"
newsletter. Nadia wishes to point out an article of particular
interest to Dirk, using a URI. The authority responsible for
"weather.example.com" can offer newsletter subscribers such as
Nadia and Dirk the benefits of URIs (such as bookmarking and
linking) and still limit access to the newsletter to authorized
parties.
The Web provides several mechanisms to control access to
resources; these mechanisms do not rely on hiding or suppressing
URIs for those resources. For more information, see the TAG finding
"'Deep Linking' in the World Wide Web".
There remain open questions regarding Web interactions. The TAG
expects future versions of this document to address in more detail
the relationship between the architecture described herein,
... voice-over-ip (including RTSP [RFC2326]).
RTSP does not qualify as voice over IP by most people's definitions. Its
generally called "streaming media", and if you want to reference a VoIP
protocol, try SIP (RFC 3261).
issue rosenberg4: Use SIP for voice-over-ip, RTSP for streaming media
clarification raised 2004-04-21
There remain open questions regarding Web interactions. The TAG
expects future versions of this document to address in more detail
the relationship between the architecture described herein, Web Services,
the Semantic
Web, peer-to-peer systems (including Freenet, MLdonkey, and NNTP [RFC977]), instant messaging systems (including [XMPP]), and voice-over-ip (including
RTSP [RFC2326]).
First para says "before inventing a new data format, designers should
carefully consider re-using one that is already available" but the whole
doc doesn't seem to say why all XML formats shouldn't be
application/xml.
issue kopecky3: 4 application/xml
clarification raised 2004-02-23
A data format (including XHTML, CSS, PNG, XLink, RDF/XML, and
SMIL animation) specifies the interpretation of representation data.
The first data format used on the Web was HTML. Since then, data
formats have grown in number. The Web architecture does not
constrain which data formats content providers can use. This
flexibility is important because there is constant evolution in
applications, resulting in new data formats and refinements of
existing formats. Although the Web architecture allows for the
deployment of new data formats, the creation and deployment of new
formats (and agents able to handle them) is expensive. Thus, before
inventing a new data format, designers should carefully consider
re-using one that is already available.
For a data format to be usefully interoperable between two
parties, the parties must have a shared understanding of its syntax
and semantics. This is not to imply that a sender of data
can count on constraining its treatment by a receiver; simply that
making good use of a data format requires knowledge of its
designers' intentions. Below we describe some characteristics of a
data format make it easier to integrate into the Web architecture.
This document does not address generally beneficial characteristics
of a specification such as readability, simplicity, attention to
programmer goals, attention to user needs, accessibility, and
internationalization. The section on architectural specifications includes references
to additional format specification guidelines.
A textual data format is one in which the data is specified as a
sequence of characters. HTML, Internet e-mail, and all XML-based formats are textual.
In modern textual data formats, the characters are usually taken
from the Unicode repertoire [UNICODE].
Binary data formats are those in which portions of the data are
encoded for direct use by computer processors, for example
thirty-two bit little-endian two's-complement and sixty-four bit
IEEE double-precision floating-point. The portions of data so
represented include numeric values, pointers, and compressed data
of all sorts.
In principle, all data can be represented using textual
formats.
The trade-offs between binary and textual data formats are
complex and application-dependent. Binary formats can be
substantially more compact, particularly for complex pointer-rich
data structures. Also, they can be consumed more rapidly by agents
in those cases where they can be loaded into memory and used with
little or no conversion.
Textual formats are usually more portable and interoperable.
Textual formats also have the considerable advantage that they can
be directly read and understood by human beings. This can simplify
the tasks of creating and maintaining software, and allow the
direct intervention of humans in the processing chain without
recourse to tools more complex than the ubiquitous text editor.
Finally, it simplifies the necessary human task of learning about
new data formats (the "view source" effect).
It is important to emphasize that intuition as to such matters
as data size and processing speed are not a reliable guide in data
format design; quantitative studies are essential to a correct
understanding of the trade-offs. Therefore, data format
specification authors should make a considered choice between
binary and textual format design.
Note: Text (i.e., a sequence of characters from
a repertoire) is distinct from serving data with a media type
beginning with "text/". Although XML-based formats are textual,
many such formats are not primarily comprised of phrases in natural
language. See the section on media types for XML for issues that arise when
"text/" is used in conjunction with an XML-based format.
See TAG issue binaryXML-30.
[4.2] In general, the section on versioning unduly and in too many ways
oversimplifies a complex, subtle, and as yet poorly understood problem.
For example, 4.2.3 says:
"Language designers SHOULD provide mechanisms that allow any party to
create extensions that do not interfere with conformance to the original
specification."
This oversimplifes a very tough tradeoff. When you allow such extensions, you
promote reuse of the base language for new purposes, and that seems good. You
also provide for a proliferation of potentially non-interoperable versions
depending on various extensions, as well as ensuring that some data will be
accepted by processors when it is in fact not conforming to a later or extended
definition of the language, but is simply erroneous and ought (if the processor
were only omniscient) to be rejected as such with a useful diagnostic.
That's bad.
Pursuing the principle enunciated here, one might conclude that maybe XML
should have let anyone who wanted to define new syntactic constructs such as
structured attributes? They didn't, and interoperability is helped rather than
hurt by such strictness.
There is a strong tension between versioning and extensibility and silent
error handling, once you get away from human mediated interactions and
interactions that do not involve mission- or life-critical applications.
For computer-to-computer mission-critical applications, "fallback behaviour"
is semantically equivalent to "silently handling errors" and the Web
architecture document is thus self-contradictory.
In addition, versioning and extensibility are not solely a property of data
representations, but of protocols as well.
The TAG agrees with the reviewer that the text does not
communicate why extensibility may not be appropriate in some
cases. Furthermore, the TAG has resolved to delete the phrase "falling back to default behavior".
issue schema13: [4.2] Overly simplifies a complex problem
error decided 2004-03-04
Extensibility and versioning are strategies to help manage the
natural evolution of information on the Web and technologies used
to represent that information.
For more information on about versioning strategies and agent
behavior in the face of unrecognized extensions, see TAG issue XMLVersioning-41 and "Web Architecture: Extensible
Languages" [EXTLANG].
Section 4.2.2, Story. The text says that defining a new optional
"lang" attribute on a "film" element does not affect the conformance
of any existing data or software. This isn't quite true: it changes
some invalid data (data with the undefined attribute "lang") into
valid data, and some non-conforming software (software which
erroneously accepts that invalid content) into conforming software.
The text is correct if, but only if, the universe of discourse is
restricted to valid data.
The TAG agrees with the reviewer regarding the general case,
but it doesn't apply to this specific instance. The TAG does not
feel changes to the document are required.
issue msm14: WD-webarch-20031209, Section 4.2.2, Story: Allowing extra attributes does change the conformance of existing data
error decided 2004-03-04
Story
Nadia and Dirk are designing an XML data format to encode data
about the film industry. They provide for extensibility by using
XML namespaces and creating a schema that allows the inclusion, in
certain places, of elements from any namespace. When they revise
their format, Nadia proposes a new optional "lang" attribute on the
"film" element. Dirk feels that such a change requires them to
assign a new namespace name, which might require changes to
deployed software. Nadia explains to Dirk that their choice of
extensibility strategy in conjunction with their namespace policy
allows certain changes that do not affect conformance of existing
content and software, and thus no change to the namespace
identifier is required. They chose this policy to help them meet
their goals of reducing the cost of change.
Dirk and Nadia have chosen a particular namespace change policy
that allows them to avoid changing the namespace name whenever they
make changes that do not affect conformance of deployed content and
software. They might have chosen a different policy, for example
that any new element or attribute has to belong to a namespace
other than the original one. Whatever the chosen policy, it should
set clear expectations for users of the format.
Good practice: Namespace policy
Format designers SHOULD document change
policies for XML namespaces.
As an example of a change policy designed to reflect the
variable stability of a namespace, consider the W3C namespace
policy for documents on the W3C Recommendation track. The
policy sets expectations that the Working Group responsible for the
namespace may modify it in any way until a certain point in the
process ("Candidate Recommendation") at which point W3C constrains
the set possible changes to the namespace in order to promote
stable implementations.
Note that since namespace names are URIs, the party (if any)
responsible for a namespace URI has the authority to decide the
namespace change policy.
[4.2.3] The discussion of mustIgnore & mustUnderstand should clarify the
difference between marking the distinction in the document instance, in a
schema, or in prose documentation. SOAP does it with an attribute in the
instance. Schema content models do it in the schema. Other systems provide
rules in the specifications. These have different tradeoffs.
issue schema14: [4.2.3] Must * rules in instance v. documentation
clarification raised 2004-03-04
[[As part of defining an extensibility mechanism, a
specification should set expectations about agent behavior in the face
of unrecognized extensions.]]
The following good practice then says
[[Language designers SHOULD specify agent behavior in the face of
unrecognized extensions.]]
It's not clear that a specification "setting expectations about"
agent behavior is the same as it "specifying" it. Why the difference
in wording?
Delete "As part of defining ..." sentence.
issue manola30: Difference between "setting expectations" and "specifying"?
clarification decided 2004-03-10
Designers can facilitate the transition process by making
careful choices about extensibility during the design of a language
or protocol specification.
Good practice: Extensibility
mechanisms
Language designers SHOULD provide mechanisms
that allow any party to create extensions that do not interfere
with conformance to the original specification.
Application needs determine the most appropriate extension
strategy for a specification. For example, applications designed to
operate in closed environments may allow specification authors to
define a versioning strategy that would be impractical at the scale
of the Web. As part of defining an extensibility mechanism, a
specification should set expectations about agent behavior in the
face of unrecognized extensions.
Good practice: Unknown extensions
Language designers SHOULD specify agent
behavior in the face of unrecognized extensions.
Two strategies have emerged as being particularly useful:
- "Must ignore": The agent ignores any content it does not
recognize.
- "Must understand": The agent treats unrecognized markup as an
error condition.
A powerful design approach is for the language to allow either
form of extension, but to distinguish explicitly between them in
the syntax.
Additional strategies include prompting the user for more input,
automatically retrieving data from available links, and falling
back to default behavior. More complex strategies are also
possible, including mixing strategies. For instance, a language can
include mechanisms for overriding standard behavior. Thus, a data
format can specify "must ignore" semantics but also allow people to
create extensions that override that semantics in light of
application needs (for instance, with "must understand" semantics
for a particular extension).
Extensibility is not free. Providing hooks for extensibility is
one of many requirements to be factored into the costs of language
design. Experience suggests that the long term benefits of
extensibility generally outweigh the costs.
[4.2.4] Says:
"In principle, a SOAP message can contain a JPEG image that
contains an RDF comment which refers to a vocabulary of terms for
describing the image."
This is untrue: SOAP is XML, JPEG is not. MTOM may do something to extend SOAP
to make this true, but as it stands the statement is false. Perhaps "... can
contain an SVG image that contains ..." is what you meant to write.
The TAG believes the 10 May 2004 draft addresses the
reviewer's concerns.
issue schema15: [4.2.4] SOAP message cannot include JPEG
error decided 2004-03-04
RDF allows well-defined mixing of vocabularies, and allows text and XML to
be used as a data type values within a statement having clearly defined
semantics.
I couldn't figure precisely what this was trying to say.
Change third bullet to: "The semantics of combining RDF documents with multiple vocabularies is well-defined."
Delete "and allows text and XML to be used as a data type values within a statement having clearly defined semantics."
issue klyne19: Unclear statement about mixing RDF vocabularies
clarification decided 2004-03-05
Note however, that for general XML there is no semantic model that defines
the interactions within XML documents with elements and/or attributes from
a variety of namespaces. Each application must define how namespaces
interact and what effect the namespace of an element has on the element's
ancestors, siblings, and descendants.
I think that there may be an important point to be made here about the
relationship of the "Semantic Web" with what I might call the "Hypertext
Web" upon which it is built, that the "Semantic Web" provides a
well-defined way to combine statements that draw upon an arbitrary number
of different namespaces. (I regard this as one of the more important
contributions of the Semantic Web.)
Maybe this is what the subject of my previous comment was trying to say?
The TAG resolved (SW abstaining):
Create a new section 4.6 about use of media types to build new applications that make use of the information space. Explain that you create new "applications" such as the semantic web through new media type definitions.
"Above we describe a global hypertext application which has been defined using specific Media types. The creation of other media types allows new applications to be build in the same space, using the same information space infrastructure. The semantic web is one such application, which is not described in detail in this version of this document. We expect future versions of the Arch doc to describe the semantic web in more detail."
Remove sem web from discussion in 3.7
Respond to reviewer that we expect future versions of the arch doc to go into more detail about the relationships among various systems.
Overtaken by events.
issue klyne20: Say something about relationship between
Hypertext Web and Semantic Web?
proposal decided 2004-03-05
Third bullet [[ * RDF allows well-defined mixing of
vocabularies, and allows text and XML to be used as a data type values
within a statement having clearly defined semantics.]]
"...allows text and XML to be used as data type values..." (delete the "a")?
Within the same statement? What does "having clearly defined semantics"
modify? Should this be "...within statements having clearly defined
semantics"?
The TAG believes this issue has been addressed by virtue of
deletion of the text in question.
issue manola31: Questions about RDF, text, XML mixing
clarification decided 2004-03-10
Many modern data format specifications include mechanisms for
composition. For example:
- It is possible to embed text comments in some image formats,
such as JPEG/JFIF. Although these comments are embedded in the
containing data, they have little or no effect on the display of
the image.
- There are container formats such as SOAP which fully expect to
be composed from multiple namespaces but which provide an overall
semantic relationship of message envelope and payload.
- RDF allows well-defined mixing of vocabularies, and allows text
and XML to be used as a data type values within a statement having
clearly defined semantics.
These relationships can be mixed and nested arbitrarily. In
principle, a SOAP message can contain a JPEG image that contains an
RDF comment which refers to a vocabulary of terms for describing
the image.
Note however, that for general XML there is no semantic model
that defines the interactions within XML documents with elements
and/or attributes from a variety of namespaces. Each application
must define how namespaces interact and what effect the namespace
of an element has on the element's ancestors, siblings, and
descendants.
See TAG issues mixedUIXMLNamespace-33, xmlFunctions-34, and RDFinXHTML-35.
This is often harder than the AWWW lets on, and sometimes it's simply not
possible at all. I think the language should be modulated to reflect that
reality; see xproposed text
issue clark6: Separating Presentation From Content
clarification raised 2004-02-26
Excerpt: "The reader is left with the impression that 'obviously,
the legal requirement justifies the actual practice. It constitutes
an overriding concern.' This is not necessarily the case. Please don't reinforce misconceptions. Particular if they have consequences that reduce the universality of access to web mediated transactions."
The TAG believes that the 10 May 2004 draft addresses the
reviewer's comment.
issue gilman1: 'legal requirement' as justification for 'particular presentation' misses 'leading Web to highest' mark
error decided 2004-03-02
Note that when content, presentation, and interaction are separated by
design, agents need to recombine them. There is a recombination spectrum,
with "client does all" at one end and "server does all" at the other. There
are advantages to each: recombination on the server allows the server to
send out generally smaller amounts of data that can be tailored to specific
devices (such as mobile phones). However, such data will not be readily
reusable by other clients and may not allow client-side agents to perform
useful tasks unanticipated by the author. When a client does the work of
recombination, content is likely to be more reusable by a broader audience
and more robust. However, such data may be of greater size and may require
more computation by the client.
I think there are also some scalability concerns that might be mentioned
here; e.g. an application is, in general, more likely to operate at
Internet scale if as much processing as possible is performed by user
agents (often, clients) rather than centralized processing agents (often,
servers).
Decided at Ottawa f2f.
issue klyne21: Add statement about scalability concerns
proposal decided 2004-03-05
The Web is a heterogeneous environment where a wide variety of
agents provide access to content to users with a wide variety of
capabilities. It is good practice for authors to create content
that can reach the widest possible audience, including users with
graphical desktop computers, hand-held devices and cell phones,
users with disabilities who may require speech synthesizers, and
devices not yet imagined. Furthermore, authors cannot predict in
some cases how an agent will display or process their content.
Experience shows that the allowing authors to separate content,
presentation, and interaction concerns promotes reuse and
device-independence (see [DIPRINCIPLES]); this follows from the principle of orthogonal of
specifications.
Good practice: Separation of content, presentation,
interaction
Language designers SHOULD design formats that
allow authors to separate content from presentation and interaction
concerns.
Note that when content, presentation, and interaction are
separated by design, agents need to recombine them. There is a
recombination spectrum, with "client does all" at one end and
"server does all" at the other. There are advantages to each:
recombination on the server allows the server to send out generally
smaller amounts of data that can be tailored to specific devices
(such as mobile phones). However, such data will not be readily
reusable by other clients and may not allow client-side agents to
perform useful tasks unanticipated by the author. When a client
does the work of recombination, content is likely to be more
reusable by a broader audience and more robust. However, such data
may be of greater size and may require more computation by the
client.
Of course, it may not always be desirable to reach the widest
possible audience. Application context may require a very specific
display (for a legally-binding transaction, for example). Also,
digital signature technology, access control, and other technologies are
appropriate for controlling access to content.
Some data formats are designed to describe presentation
(including SVG and XSL Formatting Objects). Data formats such as
these demonstrate that one can only separate content from
presentation (or interaction) so far; at some point it becomes
necessary to talk about presentation. Per the principle of
orthogonal specifications, these data formats should only
address presentation issues.
See the TAG issues formattingProperties-19 and contentPresentation-26.
The first good practice says, in my paraphrase, that (1) good
representation types allow users to make links to other resources and to
parts of representation-states of resources. The second good practice
says, again in my paraphrase, that (2) good representation types allow
users to make "Web-wide" links rather than merely "internal document"
links.
Aren't these redundant?
The TAG believes that the statements say different things and both are justified.
issue clark4a: Hypertext Good Practice Redundancies
clarification decided 2004-02-26
Surely the AWWW also wants to say that for those kinds of web application
or scenario -- Service Oriented Architecture and Semantic Web being the two
obvious examples -- where hypertext is not the "expected user interface
paradigm", by virtue of the fact that there really isn't a UI per se, one
still wants to prefer representation types which allow users to make
hypertext links between resources. REST and SOAP and RDF and WSDL and a
lot of other fun stuff works precisely because -- even in the absence of
any human-facing UI -- what's happening is that messages are being passed
around between machines, some of which contain assertions about resources,
and they are messages which contain hypertext links to other resources.
The real problem here is that there is no real formalization of "hypertext
link" in the AWWW. If it means A-HREF links simpliciter, then my point
about SOA and Semantic Web exceptions to this practice is unmotivated and
null. But if, as seems likely from Section 4.5.2. Links in XML,
"hypertext links" encompasses any link mechanism (that is, XLink and
friends) whereby HTTP URIs identify resources with which agents may
interact with the resources-states thereof, then something like my point is
needed.
issue clark4b: "Expected UI Paradigm"?
clarification raised 2004-02-26
Language designers SHOULD incorporate hypertext links into a data format if
hypertext is the expected user interface paradigm.
I found this statement a bit puzzling: many data formats have nothing to
do with a user interface; the preceding text says "What agents do with a
hypertext link is not constrained by Web architecture and may depend on
application context". So what is this trying to say?
issue klyne22: Clarify what is meant by context having influence
on use of hyperlinks
clarification raised 2004-03-05
A defining characteristic of the Web is that it allows embedded
references to other Web resources via URIs. The simplicity of
creating links using absolute URIs (<a
href="http://www.example.com/foo">
) and relative URI
references (<a href="foo">
and <a
href="foo#anchor">
) is partly (perhaps largely)
responsible for the birth of the hypertext Web as we know it
today.
When one resource (representation) refers to another resource
with a URI, this constitutes a link between the two resources.
Additional metadata may also form part of the link (see [XLink10], for example).
Good practice: Link
mechanisms
Language designers SHOULD provide mechanisms
for identifying links to other resources and to portions of
representation data (via fragment identifiers).
Good practice: Web linking
Language designers SHOULD provide mechanisms
that allow Web-wide linking, not just internal document
linking.
Good practice: Generic URIs
Language designers SHOULD allow authors to use
URIs without constraining them to a limited set of URI schemes.
What agents do with a hypertext link is not constrained by Web
architecture and may depend on application context. Users of the
hypertext links expect to be able to navigate links among
representations. Data formats that do not allow authors to create
hypertext links lead to the creation of "terminal nodes" on the
Web.
Good practice: Hypertext links
Language designers SHOULD incorporate hypertext
links into a data format if hypertext is the expected user
interface paradigm.
I found the text of this section less clear than was offered in an
email
from TimBL:
It is important to distinguish between the string which identifies
something and the BNF for a string in a document which
is used to specify the first string. The first is an identifier.
The second has been called a "reference". A reference
can use a relative form.
issue klyne23: Clarify section (see TBL text on identifier v. reference?)
clarification raised 2004-03-05
Links are commonly expressed using URI references (defined in section 4.2
of [URI]), which may be combined
with a base URI to yield a usable URI. Section 5.1 of [URI] explains different mechanisms for
establishing a base URI for a resource and establishes a precedence
among the various mechanisms. For instance, the base URI may be a
URI for the resource, or specified in a representation (see the
base
elements provided by HTML and XML, and the HTTP
'Content-Location' header). See also the section on links in XML.
Agents resolve a URI reference before using the resulting URI to
interact with another agent. URI references help in content
management by allowing authors to design a representation locally,
i.e., without concern for which global identifier may later be used
to refer to the associated resource.
...While it is directed at Internet applications with specific reference
to protocols, the discussion is generally applicable to Web scenarios as well.
I am uneasy with this phrasing, as it seems to suggest the Web is somehow
apart from the Internet. Suggest:
While it is directed at Internet applications with specific reference
to protocols, the discussion is also applicable to Web application formats.
issue klyne24: Is Web apart from Internet?
clarification raised 2004-03-05
Many data formats are XML-based, that is to say they
conform to the syntax rules defined in the XML specification [XML10]. This section discusses
issues that are specific to such formats. Anyone seeking guidance
in this area is urged to consult the "Guidelines For the Use of XML
in IETF Protocols" [IETFXML],
which contains a thorough discussion of the considerations that
govern whether or not XML ought to be used, as well as specific
guidelines on how it ought to be used. While it is directed at
Internet applications with specific reference to protocols, the
discussion is generally applicable to Web scenarios as well.
The discussion here should be seen as ancillary to the content
of [IETFXML]. Refer also to
"XML Accessibility Guidelines" [XAG] for help designing XML formats that lower
barriers to Web accessibility for people with disabilities.
Another reference with discussion relating to this topic of choosing to use
XML can be found in
RFC3117, section 5.1
Decided at Ottawa f2f. (No action.) The behavior suggested by RFC3117 was
actually reversed when the actual spec went through standardization so we aren’t
going to reference it.
issue klyne25: Add reference to RFC3117, section 5.1?
proposal decided 2004-03-05
XML defines textual data formats that are naturally suited to
describing data objects which are hierarchical and processed in a
chosen sequence. It is widely, but not universally, applicable for
data format specifications; an audio or video format, for example,
is unlikely to be well suited to expression in XML. Design
constraints that would suggest the use of XML include:
- Requirement for a hierarchical structure.
- The data's usefulness should outlive the tools currently used
to process it (though obviously XML can be used for short-term
needs as well).
- Ability to support internationalization in a self-describing
way that makes confusion over coding options unlikely.
- Early detection of encoding errors with no requirement to "work
around" such errors.
- A high proportion of human-readable textual content.
- Potential composition of the data format with other XML-encoded
formats.
in section 4.5.2, I'm uncomfortable with the recommendation to use
XLink when using XML, except perhaps when authoring documents which are
intended for human consumption. I believe that RDF/XML provides
superior linking capabilities for XML than does XLink, and IMO
preference should be given to it. Alternately, listing both as options
would be adequate.
issue baker4: 4.5.2: Preference for RDF linking over XLink linking
error raised 2004-03-05
Sophisticated linking mechanisms have been invented for XML
formats. XPointer allows links to address content that does not
have an explicit, named anchor. XLink is an appropriate
specification for representing links in hypertext XML applications. XLink allows links to
have multiple ends and to be expressed either inline or in "link
bases" stored external to any or all of the resources identified by
the links it contains.
Designers of XML-based formats should consider using XLink and,
for defining fragment identifier syntax, using the XPointer
framework and XPointer element() Schemes.
See TAG issue xlinkScope-23.
"Namespaces in XML provides a mechanism for establishing a
globally unique name that can be understood in any context." What does
it mean to understand a name? Should this say that the globally unique
name can unambiguously identify the intended meaning of the
element/attribute?
issue kopecky4: 4.5.3 use of "understand"
clarification raised 2004-02-23
[4.5.3] States:
"Namespaces in XML" [XMLNS] provides a mechanism for establishing a globally
unique name that can be understood in any context.
This is a false statement and should not be continued to be repeated.
Delete "that can be understood in any context" (Already
done in 10 May 2004 draft.)
Modify remaining sentence to say: "Namespaces in xml use URIs in order to obtain the properties of a global namespace."
Include a reference to 2.2 URI ownership.
issue schema17: [4.5.3] Statement about XMLNS and unique names false
error decided 2004-03-04
[4.5.3] Says:
"The type attribute from W3C XML Schema is an example of a global
attribute."
This should indicate type in the Schema Instance namespace, preferably with a
suitable link to our spec. Perhaps
"The type attribute from W3C XML Schema namespace is an example of a global
attribute."
There are also type attributes in the schema document vocabulary, e.g. on
<xsd:element>, and those are not global. Furthermore, we see above in 4.5.6
that a prefix is used to indicate xs:ID as a type. So, why not use xsi:type
for this one:
"The xsi:type attribute, provided by W3C XML Schema for use in XML
instance documents, is an example of a global attribute."
Include a clearer reference to the XML Schema specification.
issue schema18: [4.5.3] Clarification on "type" in XML Schema
clarification decided 2004-03-04
[4.5.3] Says:
"Attributes are always scoped by the element on which they appear.
An attribute that is "global," that is, one that might meaningfully
appear on different elements, including elements in other namespaces,
should be explicitly placed in a namespace. Local attributes, ones
associated with only a particular element, need not be included in a
namespace since their meaning will always be clear from the context
provided by that element."
This appears to mix the notion of element instance and what DTD-oriented minds
would call 'element type'. Perhaps this should read
An attribute that is "global," that is, one that might meaningfully appear
on elements of any type, including elements in other namespaces, should be
explicitly placed in a namespace. Local attributes, ones associated with
only a particular element type, need not be included in a namespace since
their meaning will always be clear from the context provided by that
element."
The TAG agreed with the reviewer.
issue schema19: [4.5.3] Element type/instance confusion
clarification decided 2004-03-04
Story
The authority responsible for "weather.example.com" realizes
that it can provide more interesting representations by creating
instances that consist of elements defined in different XML-based formats, such as
XHTML, SVG, and MathML.
How do the application designers ensure that there are no naming
conflicts when they combine elements from different formats (for
example, suppose that the "p" element is defined in two or more XML
formats)? "Namespaces in XML" [XMLNS] provides a mechanism for establishing a
globally unique name that can be understood in any context.
Language specification designers that declare namespaces thus
provide a global context for instances of the data format.
Establishing this global context allows those instances (and
portions thereof) to be re-used and combined in novel ways not yet
imagined. Failure to provide a namespace makes such re-use more
difficult, perhaps impractical in some cases.
Good practice: Namespace
adoption
Language designers who create new XML
vocabularies SHOULD place all element names and global attribute
names in a namespace.
Attributes are always scoped by the element on which they
appear. An attribute that is "global," that is, one that might
meaningfully appear on different elements, including elements in
other namespaces, should be explicitly placed in a namespace. Local
attributes, ones associated with only a particular element, need
not be included in a namespace since their meaning will always be
clear from the context provided by that element.
The type
attribute from W3C XML Schema is an
example of a global attribute. It can be used by authors of any
vocabulary to make an assertion about the type of the element on
which it appears. The type
attribute occurs in the W3C
XML Schema namespace and must always be fully qualified. The
frame
attribute on an HTML table is an example of a
local attribute. There is no value in placing that attribute in a
namespace since the attribute is unlikely to be useful on an
element other than an HTML table.
Applications that rely on DTD processing must impose additional
constraints on the use of namespaces. DTDs perform validation based
on the lexical form of the element and attribute names in the
document. This makes prefixes syntactically significant in ways
that are not anticipated by [XMLNS].
However, the term "definitive" is missing. Was this intentional? Based on
a quick skimming of the issue, it looks like the TAG is in agreement that
the namespace document should directly or indirectly provide *definitive*
material about the namespace, but I'm not sure.
The TAG agrees but for consistency prefers the term
"authoritative".
issue booth3: 4.5.4: NS document as definitive source of info on namespace
clarification decided 2004-02-20
Story
Nadia receives a representation data from "weather.example.com"
in an unfamiliar data format. She knows enough about XML to
recognize which XML namespace the elements belong to. Since the
namespace is identified by the URI
"http://weather.example.com/2003/format", she asks her browser to
retrieve a representation of the namespace via that URI. Nadia is
requesting the namespace document.
Nadia gets back some useful data that allows her to learn more
about the data format. Nadia's browser may also be able to perform
some operations automatically (i.e., unattended by a human
overseer) given data that has been optimized for software agents.
For example, her browser might, on Nadia's behalf, download
additional agents to process and render the format.
There are many reasons to provide information about a namespace.
A person might want to:
- understand its purpose,
- learn how to use the markup vocabulary in the namespace,
- find out who controls it,
- request authority to access schemas or collateral material
about it, or
- report a bug or situation that could be considered an error in
some collateral material.
A processor might want to:
- retrieve a schema, for validation,
- retrieve a style sheet, for presentation, or
- retrieve ontologies, for making inferences.
In general, there is no established best practice for creating a
namespace document. Application expectations will influence what
data format or formats are used to create a namespace document.
Application expectations will also influence whether relevant
information appears in the namespace document itself or is
referenced from it.
Good practice: Namespace
documents
Resource owners who publish an XML namespace
name SHOULD make available material intended for people to read and
material optimized for software agents in order to meet the needs
of those who will use the namespace vocabulary.
For example, the following are examples of formats used to
create namespace documents: [OWL10], [RDDL],
[XMLSCHEMA], and [XHTML11]. Each of these formats
meets different requirements described above for satisfying the
needs of an agent that wants more information about the namespace.
Note, however, issues related to fragment identifiers and multiple representations
if content negotiation is used with namespace documents.
See TAG issues namespaceDocument-8 and abstractComponentRefs-37.
Below the Good Practice: QName Mapping - the section (or some
other) should probably say more on the interaction of QName Mapping,
fragment identifiers in XML (4.5.8) commonly used for this mapping and
namespace documents (4.5.4)
Overtaken by events.
issue kopecky5: 4.5.5 More info on qnames, fragids, ns docs
request decided 2004-02-23
Section 3 of "Namespaces in XML" [XMLNS] provides a syntactic construct known as a
QName for the compact expression of qualified names in XML
documents. A qualified name is a pair consisting of a URI, which
names a namespace, and a local name placed within that namespace.
"Namespaces in XML" provides for the use of QNames as names for XML
elements and attributes.
Other specifications, starting with [XSLT10], have employed the idea of using QNames in
contexts other than element and attribute names, for example in
attribute values and in element content. However, general XML
processors cannot recognize QNames as such when they are used in
attribute values and in element content; they are indistinguishable
from URIs. Experience has also revealed other limitations to
QNames, such as losing namespace bindings after XML
canonicalization.
Good practice: QNames
Indistinguishable from URIs
Specifications that use QNames to represent
URI/local-name pairs SHOULD NOT allow both forms in attribute
values or element content where they would be indistinguishable
from URIs.
For more information, see the TAG finding "Using QNames as Identifiers in
Content".
Because QNames are compact, some specifications have adopted the
same syntax as a means of identifying Web resources. Though
convenient as a shorthand notation, this usage has a cost. There is
no single, accepted way to convert a QName into a URI or
vice-versa. Although QNames are convenient, they do not replace the
URI as the identification mechanism of the Web. The use of QNames
to identify Web resources without providing a mapping to URIs is
inconsistent with Web architecture.
Good practice: QName Mapping
Language designers who use QNames as
identifiers of Web resources MUST provide a mapping to URIs.
For examples of QName-to-URI mappings, see [RDF10]. See also TAG issues rdfmsQnameUriMapping-6, qnameAsId-18, and abstractComponentRefs-37.
This section lacks a conclusion, any kind of statement on what
should/should not be used. Or words that at the moment there is no
conclusion.
The TAG believes this is an open issue in Web architecture,
which the Editor should highlight more in the text.
Overtaken by events.
issue kopecky6: 4.5.6 What's the conclusion?
request decided 2004-02-23
[4.5.6] Fails to careful highlight the particular flavours of "ID" in play, and
that they are NOT the same thing. For example, consider the following three
statements:
"Does the section element have the ID "foo"?"
(This needs to be something like
"Does the section element have what the XML Recommendation refers to as
the ID "foo"? ")
"Processing the document with a W3C XML Schema might reveal an element
declaration that identifies the name attribute as an xs:ID."
(This one is probably OK.)
"In practice, processing the document with another schema language, such
as RELAX NG [RELAXNG], might reveal the attributes of type ID."
(What is a "type ID" here? If it's RELAX using the schema data types,
then isn't it xs:ID in this case?)
In practice, applications may have independent means of specifying IDness as
provided for and specified in XPointer.
XPointer carefully
discusses these options.
The TAG believes that the 10 May 2004 draft addresses the
reviewer's concern, with the following changes:
change fourth bullet in 10 May 2004 draft to read "In
practice, applications may have independent means of locating
identifiers inside a document such as provided for and specified
in the XPointer specification."
Include a reference to section 3.2.
issue schema20: [4.5.6] Flavors of ID not discussed
error decided 2004-03-04
Consider the following fragment of XML: <section
name="foo">
. Does the section
element have
the ID "foo"? One cannot answer this question by examining the
element and its attributes alone. In XML, the quality of "being an
ID" is associated with the type of the attribute, not its name.
Finding the IDs in a document requires additional processing.
- Processing the document with a processor that recognizes DTD
attribute list declarations (in the external or internal subset)
might reveal a declaration that identifies the name attribute as an
ID. Note: This processing is not necessarily part
of validation. A non-validating, DTD-aware processor can perform ID
assignment.
- Processing the document with a W3C XML Schema might reveal an
element declaration that identifies the name attribute as an
xs:ID
.
- In practice, processing the document with another schema
language, such as RELAX NG [RELAXNG], might reveal the attributes of type ID.
Many modern specifications begin processing XML at the Infoset [INFOSET] level and do not specify
normatively how an Infoset is constructed. For those
specifications, any process that establishes the ID type in the
Infoset (and Post Schema Validation Infoset
(PSVI) defined in [XMLSCHEMA]) may usefully identify the attributes
of type ID.
To further complicate matters, DTDs establish the ID type in the
Infoset whereas W3C XML Schema produces a PSVI but does not modify
the original Infoset. This leaves open the possibility that a
processor might only look in the Infoset and consequently would
fail to recognize schema-assigned IDs.
See TAG issue xmlIDSemantics-32.
These Internet Media Types create two problems: First, for data identified
as "text/*", Web intermediaries are allowed to "transcode", i.e., convert
one character encoding to another. Transcoding may make the
self-description false or may cause the document to be not well-formed.
The statement "Web intermediaries are allowed to "transcode ..."
seemed to me to be rather broadly applied here. Is there a
specification that asserts this in general? If not, I think the
comment should be constrained to something like "in some Web
applications, intermediaries are allowed to transcode.
issue klyne26: Transcoding allowing by some or all intermediaries?
clarification raised 2004-03-05
RFC 3023 defines the Internet Media Types "application/xml" and
"text/xml", and describes a convention whereby XML-based data
formats use Internet Media Types with a "+xml" suffix, for example
"image/svg+xml".
These Internet Media Types create two problems: First, for data
identified as "text/*", Web intermediaries are allowed to
"transcode", i.e., convert one character encoding to another.
Transcoding may make the self-description false or may cause the
document to be not well-formed.
Good practice: XML and "text/*"
In general, server managers SHOULD NOT assign
Internet Media Types beginning with "text/" to XML
representations.
Second, representations whose Internet Media Types begin with
"text/" are required, unless the charset
parameter is
specified, to be considered to be encoded in US-ASCII. Since the
syntax of XML is designed to make documents self-describing, it is
good practice to omit the charset
parameter, and since
XML is very often not encoded in US-ASCII, the use of "text/"
Internet Media Types effectively precludes this good practice.
Good practice: XML and character
encodings
In general, server managers SHOULD NOT specify
the character encoding for XML data in protocol headers since the
data is self-describing.
The section on media
types and fragment identifier semantics discusses the
interpretation of fragment identifiers. Designers of an XML-based
data format specification should define the semantics of fragment
identifiers in that format. The XPointer Framework [XPTRFR] provides a interoperable
starting point.
When the media type assigned to representation data is
"application/xml", there are no semantics defined for fragment
identifiers, and authors should not make use of fragment
identifiers in such data. The same is true if the assigned media
type has the suffix "+xml" (defined in "XML Media Types" [RFC3023]), and the data format
specification does not specify fragment identifier semantics. In
short, just knowing that content is XML does not provide
information about fragment identifier semantics.
Many people assume that the fragment identifier
#abc
, when referring to XML data, identifies the
element in the document with the ID "abc". However, there is no
normative support for this assumption.
See TAG issue fragmentInXML-28.