OverloadedUri

From W3C Wiki


An "Overloaded URI" is a URI that people have given more than one meaning.

For example, think of the URI for your home page. If someone puts the URI of your web page into a web browser, they'll see your web page. But suppose now that your friend wants to write an RDF document about you. Since RDF nodes generally refer to subjects, and the subjects are told by URIs, your friend may well use your homepage as the URI for talking about you, the person- and not just your home page.

If your friend does this, your friend is "Overloading the URI" for your home page. That is, they've attached additional meaning ("this address represents you, the person") to the old one ("this address is the location of your home page.")

There is controversy about whether this is good or bad. Some arguments and examples appear below.


Examples

W3C Consortium Example

Consider the following triple:


<http://www.w3.org/Consortium> x:members 412.


This would likely mean, "The W3C consortium has members, 412 of them."

But this is overloading the web page http://www.w3.org/Consortium .

Suppose we want to talk about the page itself?


<http://www.w3.org/Consortium> y:lastModified "Fri, 24 Oct 2003 16:41:28 GMT".


Now it appears that the W3C consortium, previously having meetings and members, is actually a web page.

This is an ambiguity that computers could have a tough time resolving.

Melville Example

   <http://www.melville.org/hmmoby.htm> dc:author "Herman Melville".


"Melville wrote Moby Dick."


   <http://www.melville.org/hmmoby.htm> dc:author "Jim Madden".


"Jim Madden is the author of this web page about Moby Dick."

Common Arguments

Subjects Should be Browsable

Pro:

 "OverloadedUris are useful because you can type RDF subjects into a web browser, and see what the subject is all about in a web page."

That is, looking through RDF documents, it can be easy to lose track of what things are. The argument is that by overloading web page URIs, we can easily see what is what, just by looking at the web page.

Con:

 It is good to be able to just type the subject URIs into a web browser.
 However, instead of overloading URIs, which confuse web pages and what the page talks about, you can use a "DualUseUri."
 That is, have a URI that your web server redirects to another web page. For example, you'd have a URI representing you, the person. Maybe it's "http://my.server.net/person/yourname/". If someone went to it with a web browser, it would redirect to your actual homepage- "http://my.server.net/webpages/yourname/".
 That way, if you want to talk about your web page itself, there is no conflict, since your web page has a different URI than you, the person.

?

(I have difficulty following the arguments here, so I'll just leave the text as it is.)

Pro:

  "OverloadedUri is very useful.  It's nice to use already-known URIs 
  in RDF to name the associated concepts."
  Response: To re-use the name with some different meaning
  is another anti-pattern, BadNameReuse.   
  There are other ways of achieving
  the same goal, like using
  foaf:homepage.

(My questions are: Is the assertion that "It's useful to use URI's that people already know?" Is the response that, "If you reuse the name, the URI means less?" I would just reassert the general complaint of overloaded URI- it's still the case that you have ambiguity between talking about the page, and the thing the page denotes. I don't understand how talking about foaf:homepage is a response to "It's useful to use URI's that people already know.")

Overloaded URI's: Distinguished by Context

Pro:

 "OverloadedUri causes no harm. When you see y:lastModified, you know that you're talking about web pages. When you see x:member, you know you're talking about groups of people. The machinery can figure out what subject URIs represent, by the context."

(I have difficulty understanding the Con response; So I'll just past it here verbatim:)

  What if you have a web page about web pages?   More precisely,
  whatever your ontology for talking about how browsers use URIs, 
  you'll need to use URIs both in the direct mode and the indirect
  mode; if the indirectly identifed thing is of the same class as
  the directly identified thing, type inference can't help you
  figure out which is which.

You can Overload when Context helps you Distinguish

Pro:

  This don't-overload mantra is too simple. There is a better, more nuanced, rule, which all naturally evolving languages have adopted: overload only when the context enables a clear differentiation of the intended meaning. In practice, this means its OK when the overloaded meanings are almost imposible to confuse with one another in almost any context, yet are systematically related (by a coercion function). So the "bad" example involving authorship (of book or web page about the book) is indeed bad, because there are contexts of use which provide no way to disambiguate.  But overloading mailboxes and people is actually quite stable and works quite well, because it is *obvious* that you email to an email address, not to a human being, and it is *obvious* that email boxes aren't, say, mammals or have children, in all contexts; and email addresses/people is close enough to being 1:1 that the marginal cases don't matter most of the time. So it works.
  This is important because the simplistic mantra doesn't work. It leads to concept-name-bloat, because as soon as you have the slightest possibility of conceptual distinctions being made (people as agents vs. people as patients vs. people as mammals vs....) then you suddenly need to split names into syntactic categories, and the old names no longer work properly. You thought that a person was a person, end of story: but your doctor and your lawyer and the IRS all make finer conceptual distinctions which arent even aligned with each other, so these person-URIs start getting made fractal by other people's distinctions.
  (above text pasted by SandroHawke but written by PatHayes.  After mulling on it for some weeks, Sandro is becoming convinced.)

Page Notes

This page was originally WhenBrowsableAndUnambiguousCollide. It was rewritten here in terms of being an anti-pattern. LionKimbro rewrote the page for his own understanding, and neutralized it (to a degree) because there was still controversy, not WikiConsensus.

Discussion

I've rewritten the page. I found the original a bit hard to follow as an outsider. I've rewritten it, capturing what I believe are the main arguments being made. I hope I'm not misrepresenting anybody. I think the new descriptions are clearer for the outsider to the arguments.

-- LionKimbro DateTime(2004-06-08T12:19:14)

I still have difficulty with "You can Overload when Context helps you Distinguish."

The perscription seems to be:

  • Overload when context makes it possible to differentiate meaning.

The argument seems to be:

  • If you can differentiate by context, then everything's okay.
  • The bad examples given (the Melville example) are places where it's hard to differentiate meaning.
  • But in places where it's easy to differentiate meaning, we don't have a problem. You only send e-mail to e-mail addresses, not to human beings. And e-mail addresses almost always go to a single human being, so you can overload e-mail addresses to refer to the person who receives the e-mail.

Arguments against not overloading URIs:

  • If you don't overload URIs, then you suddenly have to name lots and lots of things. You end up with lots and lots of names. If you can fold the extra name into a single URI, and then differentiate contexts, then you have a much easier time of things.
  • For example, your doctor, lawayer, and the IRS all mean very different things when talking about your person. Should they all identify you differently? If they do, the person-URIs multiply.

I wonder:

  • What is a CoercionFunction? Is that something that takes a URL plus a context, and assigns it a new, unique, identifier? (That is, an identifier that accounts for URL+context?) Where "Coercion Function" came from: "In practice, this means its OK when the overloaded meanings are almost imposible to confuse with one another in almost any context, yet are systematically related (by a coercion function)."
  • Does our modern RDF software make it easy to differentiate meaning by context? If it's theoretically possible, but the software does not yet make it practically possible, than that seems like something worth noting.
  • Is SandroHawke, transcribing for PatHayes, becoming convinced of the need for OverloadedUris, or is Sandro becoming convinced that we should not overload URIs?

-- LionKimbro DateTime(2004-06-08T12:19:14)

I myself am coming to believe that URIs should not be overloaded.

It seems right and proper to me that the IRS, the doctor, etc., etc., should all use seperate URL's to talk about different aspects of me. And there's nothing saying that they can't link to my homepage, or to my FOAF file.

And by DualUseUri, they would come to very different, and context-appropriate, pages when they looked up the subject. The doctor would be referred to an HTML rendition of my medical chart. The IRS would be referred to their internal notes on my doubtless spotless record. Or whatever else would be appropriate, given the situation.

-- LionKimbro DateTime(2004-06-08T12:19:14)

qmacro said that we should identify people like so:


<foaf:Person>
  <foaf:mbox>so-and-so@such-and-such.com</foaf:mbox>
  ...
</foaf:Person>


That is, we don't have to make a URI for the person. We can just side-step the issue entirely, and instead talk about the person that is identify by such-and-such...

This invokes the InverseFunctionalProperty. For FOAF, they are listed on Foaf:UniqueIdentifiers.

-- LionKimbro DateTime(2004-06-09T17:37:58)