W3C TAG
Dan Connolly, Jul 2002
DRAFT $Revision: 1.30 $ of $Date: 2002/07/22 22:54:02 $ by $Author:
connolly $
cf tag issue 14
@@hmm... not sure if this is really a FAQ or more of a socratic dialog...
That's a tricky question; let's start by getting clear what you mean by each of those in isolation, before we talk about their relationship.
By URI, I bet you mean the sort of thing you put in your HTML files, right? Note that this is subtly different from the meaning of the term URI in RFC2396, the most widely ratified specification of the term.
It's a string of characters, starting with scheme: . e.g.
http://www.w3.org/
ftp://ftp.w3.org/
irc://irc.openprojects.net/rdfig
urn:oasis:names:tc:SAML:1.0:assertion
tel:+1-913-555-1212
See RFC2396 for details.
acap, cid, data, etc. IANA keeps a list of registered URI schemes.
myscheme:blort
a
URI?Well, myscheme:blort
meets the syntactic constraints of
RFC2396, so yes, it's a URI. But myscheme
isn't registered, so
you don't have license to use that URI in any Internet protocols; there
aren't any valid uses of it. You can't expect anybody to know what you mean
by it, and you aren't guaranteed that somebody else isn't already using it
for something else. But we're getting ahead of ourselves, into the
relationship of URIs to resources...
If you want to register a new scheme, follow the IETF process, esp the guidelines in RFCNNNN@@.
../foo
a URI?No. It is a URI reference, though. And in the context of a base
URI (i.e. the address where you got the document in which you found
../foo
; say http://aDomain/aPath/xyz
) it's
well-defined what URI ../foo
corresponds to:
http://aDomain/foo
. See section @@ of RFC2396 for details.
foo/bar
w.r.t.
mailto:somebody@somedomain
?Now you're getting pedantic. I said it was well-defined; I didn't say it was intuitive. How often do you get documents from that base URI? That's not really a frequently asked question.
The answer is mailto:foo/bar
, by @@recent interpretation of
RFC2396. Implementation experience is not completely consistent.
[@@weasle words about how it's useful to eventually get this nailed down, i.e. developing a URI test suite, but the bugs aren't costing us much in the mean time.]
No.
@@more?
http://example/aPath/myDoc.html#section2
a
URI?No, per RFC2396, URIs don't include # characters. URI references do.
Just about anything.
From RFC2396:
A resource can be anything that has identity. Familiar examples include an electronic document, an image, a service (e.g., "today's weather report for Los Angeles"), and a collection of other resources. ...
Yes. RFC2396 continues...
... Not all resources are network "retrievable"; e.g., human beings, corporations, and bound books in a library can also be considered resources.
Unfortunately, the term is pretty ambiguous in this context.
We often think of files as documents. You can save the file several times, and you still think of it as the same document. When the U.S. Constitution is amended, we think of it as the same document.
For the purpose of this discussion, let's please reserve the term document to mean a sequence of bytes paired with an Internet Media Type [What's an internet media type? Well, it's much like a URI scheme... never mind; let's not go into that just now, OK?]. i.e. the contents of a myDoc.html file on your disk at any one time is a document, but if you revise it and save it, you've got a new document. If you want to talk about the mutable file, let's use resource for that.
This use of document corresponds to the term entity body in Hypertext Transfer Protocol -- HTTP/1.1, RFC2616, and in the MIME specs, e.g. RFC2045.
[more motivation for using document this way: XML specs. Note that per Infoset REC, the base URI of an XML document is intrinsic to that document; so if you take your file and copy it somewhere else in the Web, you get a new document.]
Perhaps we need another term for resources that are often called documents...
Each work is a resource which is closely related to one or more documents; e.g. an essay, a book, a computer program, the U.S. Constitution, etc.
"An abstract information thing of value, typically intellectual property" -- timbl's doc schema.
see also: Conceptual Works in the OpenCyc ontology.
Ok; now that we have our background terms straight...
First, each valid use of a URI reference unambiguously identifies one resource. That is: if you are using a registered URI scheme and following all the other relevant protocol specifications, it is unambiguous what resource you are referring to. This goes for all URI references, not just URIs.
No; As we mentioned above, if you use an unregistered URI scheme, you don't have any guarantees that somebody else isn't using the same URI to mean something else in the same protocol message.
[@@elaborate with more of an example?]
This very typical example is worth some elaboration...
You ask your browser (or other user agent) to visit
http://example.org/aPath/myDoc
; that's called the request
URI, in the HTTP
specification. Your user agent looks up example.org
in DNS
and gets an IP address back; makes a TCP connection, and sends a request. If
all goes well, you get an HTTP 200 (OK) response back, containing a document:
the media type is indicated by the Content-Type
header field,
and the byte sequence is in the body of the HTTP response.
Provided the HTTP transaction is a valid use of the request URI, it's clear what resource it identifies. This transaction represents this resource as the document in the response. To make (valid) use of a URI in order to represent the resource it identifies by a document is to dereference the URI.
@@idea: We might try introducing formal notation at this point...
@prefix : <#>. @prefix mediaTypeText: <...@@somehwere in IANA land...>. @prefix HTTP: <...@@some specification of HTTP...>. :req23 a HTTP:Transaction. <http://example.org/aPath/myDoc> :req23 (mediatypeText:html "<html ... ").end idea:.
This merits elaboration as well...
Suppose your user agent dereferenced
http://example.org/aPath/myDoc
and got a reply, dated 15:41,
including some document. You browse around a bit this way, and not much later
you follow another link to that same address. If your user agent is clever
(and quality HTTP user agents are...) it will include an
If-Modified-Since: ... 15:41
header field in this second request
for myDoc, since it already has a reply in cache. Suppose the server knows
that the resource hasn't changed (using local filesystem metadata); then the
server will reply 304 Not Modified
. This second transaction
represents the resource identified by
http://example.org/aPath/myDoc
by the same document as the
first.
If you like, you can look at the two transactions as one use of that identifier.
Likewise, most FTP RETR transactions are valid uses of URIs that represent resources with documents.
[@@LIST almost represents a resource as a document; often, a proxy makes up an HTML document that represents the directory resource, and we trust that this is valid.]
Sometimes.
In many case, the relationship between URIs, resources, and documents corresponds to the relationship between filenames, files, and file contents: each filename identifies one file. One file might be known by several filenames (think of shortcuts/symlinks/aliases). The file can have different contents at different times, but if you make a copy of the file's contents, you'll get just one thing at any time.
FTP and HTTP are designed to exploit this analogy, to make it easy to export filesystems into the Web.
But take care not to overgeneralize.
No; for example there are at least 2 files, one called
w3c_home.gif
and one called w3c_home.png
, used in
dereferncing http://www.w3.org/Icons/w3c_home
. Format
negotiation is a technique that allows for graceful evolution of data formats
(aka Internet media types) in the web. HTTP has specific support for it; see
section @@ of the HTTP
specification for details.
file:/etc/hosts
is
ambiguous, isn't it?Each valid use of file:/etc/hosts
is unambiguous. It's valid
to use file URIs within one machine. And it may be valid to use file: URIs to
refer to well-known files such as /etc/hosts
, if you're
sure/confident all the readers are using unix systems.
But if you write
See <a href="file:/home/user12/niftyStuff.html">my new nifty stuff</a>.
in an HTML document and publish that document on the public Internet, and somebody using a different machine reads it, they won't be able to dereference it; that's not a valid use of that URI. Their user agent may not detect the failure; it may find a file under that pathname and display it; but since this use of that URI is invalid, the document they see is likely irrelevant to what you meant.
No; in each use, it refers to "the yahoo personlized content for the reader", whoever the reader is.
@@hmm... think more about this one; plusses and minuses...
../myFile
is
ambiguous, isn't it?Again, each valid use of ../myFile
is unambiguous;
use of ../myFile
with a base URI of
http://example/dirA/this/stuff
may refer to a different resource
than use of ../myFile
with a base URI of
http://example/dirB/that/stuff
; in the first case, its absolute
form is http://example/dirA/myFile
, but in the second case, its
absolute form is http://example/dirB/myFile
. Clearly these need
not identify the same resource.
Sometimes.
The relationship between URIs, resources, and documents is like the relationship between C pointers, memory cells, and values. The analogy with C++ or Java is even stronger: object references, objects, and values returned from method calls; in particular, the toString() or writeObject() method from Serializable.
But again, take care not to overgeneralize: while every Java object exports an equals() method, most resources do not.
[@@something about scale/scope: same object reference might point to different objects in different runs of a program; one run of a program is like one use of a URI; ideally, the whole web is one use, i.e. one run of a program.]
Not necessarily. If you consider every real number a resource; clearly we can't give every real number a URI without collisions; there are only denumerably many URIs. (@@cite some explanation of cantor's argument or whatever for elaboration)
[@@see also: RDF bNodes stuff.]
http://WWW.EXAMPLE/
identify the same resource as
http://www.example/
?Yes, the HTTP specification [@@section] says that in any valid
use of http://WWW.EXAMPLE/
and
, they identify the same
resource.http://www.example/
But don't count on consumers realizing that. Be consistent about how you write them if you expect consumers to realize you mean the same resource. [@@elaborate? talk about canonical forms?]
http://example/BIGWORD
identify the same resource as
http://example/bigword
?Not necessarily.
@@more
@@usage note: don't be silly enough to depend on case as the only distinction between your URIs. It's too fragile.
While each use of a URI reference is unambiguous, different uses of the same absolute URI reference may identify different resources. These situations are often either obscure or costly or both.
Typically, if I browse the web and I visit
http://zoo.example/animals/tigerBob
and I read a document about
Bob, a tiger at the zoo, I like to think that I can make a link to that
address in a document I publish, and when readers follow links from my
documens to that address, they'll get a document about that same Bob. This is
where the Web derives much of its value: using short strings to share
resources. I like to consider my visit to that address and my reader's visit
are in the same naming context. I don't mind if somebody revises
tigerBob
and my reader gets a slightly different document; but
if they get a document about a tropical storm or something else totally
irrelevant to the document I found when I was browsing, the Web hasn't been
of much use.
Recall the discussion of HTTP 304 not modified
transactions; the HTTP protocol shows that the two transactions are
causally related; that the server knew about the first transaction when
servicing the second. This is reasonably clear evidence[@@can we get rid
of these weasel-words? Need to think abotu expires etc.] that the two
transactions are one use of the URI
http://example.org/aPath/myDoc
. Though nothing in the HTTP
header fields says so, my link to tigerBob was causally related to the
response I got from zoo.example
; the Referrer header field in my
reader's request makes it clear that his/her request is causally related to
these events as well. It's easiest if we can just look at the whole web as
one use of all the URIs. [@@getting sloppy now.]
If http://acme.example/pricelist
was used to get a pricelist
about ACME software's prices, then ACME software sold the domain to ACME
pets, and it was then used to get a list of pet prices, then that same URI is
refers to different resources in those two uses.
The use of this address by ACME pets interferes with anyone who wants to continue to use it to refer to ACME software's prices: historians, court archivists, news archive services, etc.
This is clearly sub-optimal.
@@elaborate: W3C gets a new member; it's a new W3C, in some uses
@@elaborate: conneg accross media types, where the fragments don't refer to anything compatible
Yes and No.
@@
From http://www.ietf.org/rfc/rfc2616.txt (top of page 9): An HTTP URI denotes a "network data object or service [which] may be available in multiple representations (e.g. multiple languages, data formats, size, and resolutions) or vary in other ways." There's some weasel room in their, but not to include cars.
-- sandro in RDFIG. @@
TimBL's axiom about doc:work: { [] log:uri [string:startsWith
"http:"] ; log:racine ?x} => { ?x a doc:Work}.
TimBL's conjecture: doc:Work ont:disjointFrom
commonKnowledge:Car.
and...
I have two things with identity, so need two URIs. http://www.markbaker.ca/index.html identifies my HTML "web page"
-- MarkB
What happens when you set conneg up to serve text/html and application/rdf+xml from the same address? What does /myfavoritemovie#title identify then? An element, or a movie?
sbp 19 Jul 2002 20:56:49 +0100
@@did Roy give pointers?
pointing to elements vs. pointing to things.
Things versus their names, use/mention, map/territory. In the abstract/intro somewhere?
tel: URIs can have the same sort of validity failures as file: URIs... see Dierken 22Jul to www-tag. In 22Jun, RF says news: URIs do that too. hmm...
ISBN: copy of book vs. intellectual content of book
manifestation vs work
DesignIssues/Generic
use patents/copyrights as an example?
persistence guarantees: URN NIDs and domain names.
how many members in W3C? time.
why separate schemes for mid:stuff@domain and news:stuff@domain ? need new URI scheme for iCalendar items?
why do we use absolute naming in Web Architecture? surely relative naming is less constraining, no?