HR14aCompromise

From W3C Wiki

See also TagIssue57Home

This proposal (see TagIssue57Responses) is designed to make everyone equally happy and everyone equally unhappy.

Work in progress - still working out the details.

JAR's work on this proposal is suspended on the basis of an assessment that a prohibition on sameAs will be rejected out of hand by the "no longer implies" proposal signers.

Synopsis

There is a weakly specified relationship between what a URI refers to and the generic resource at that URI. This lets URIs be used in different ways in different situations. The weak constraint on the relationship is strong enough to get predictable behavior when one wants to be clear, by talking about the "landing page", while leaving the URI itself free for other uses. When one is not clear human judgment may be required to interpret the URI.

This is really an approach to creating proposals for interoperable use of hashless http: URIs, not a proposal in itself. It provides constraints without particular guidance. See the bottom of the page for thoughts on how to develop.

Generic resources

Assume anything like TimBL's or JAR's or Pat's theory of what is accessed using a URI. If accepting this premise holds you up please let me know and I can tighten it up (at least in the RDF case). To avoid the AWWW baggage I'm not calling them "information resources" but it may end up that that's a better term. Maybe "time-varying information" or "variable information".

For each retrieval-enabled hashless http: URI (REHU) assume there is a generic resource, written functionally at the meta-level GR(U) where U is a REHU, whose representations/instances/encodings are the representations retrieved using that URI. I believe this is a safe assumption. There is no assumption that GR(U) is identified either by U or by any other URI.

The resource to landing page map

We introduce a new URI w:landingPage whose meaning is mostly uninterpreted (left up to human judgment) except for the following:

Call an RDF interpretation (see RDF Semantics) "conforming" (to this proposal) if it satisfies the following constraints:

  1. For each REHU U that is interpreted, w:landingPage relates the interpretation of U to GR(U) (which can probably be made rigorous, let me know if this catches you up)
  2. w:landingPage as a property is functional, i.e. every resource identified in this way has a unique associated landing page (possibly but not necessarily itself). If you know what the URI refers to you can recover GR(U).

The interpretation can be further constrained by statements in the graph as per the usual RDF (or OWL) semantics. To the extent it is unconstrained human judgment may be required for correct interpretation.

If w:LandingPage were not functional, there would be no way to relate resources to particular landing pages. E.g. if Chicago had two landing pages you would have to use a string-valued property like contentUri to refer to the landing pages; you could not relate a particular landing page unambiguously to Chicago.

We may want an additional constraint on interpretations that RDF URI references that are provably equivalent according to the HTTP and URI specs must be interpreted the same, but this is a detail.

Proposal 25 suggests a Document: link header expressing the same relationship. The Content-location: has also been suggested as a way to express this, but JAR opposes this since if GET U yields Content-header: Q then GR(Q) is not necessarily equal to GR(U), and what we need is a way to refer to GR(U), not GR(Q).

Although this is very similar to 'describedby' in the 'no longer implies' proposal, 'describedby' has the wrong name and semantics for the relation since not every landing page would describe anything at all, much less specifically what its URI would be known to refer to. It's hard to imagine what the landing page of a non-describing GR would be, other than itself. This situation is very different from 303, where one can generally assume you get a useable description.

The choice of URI is up for grabs. w:landingPage is just a placeholder and the namespace w: is TBD. It is inappropriate in that in common use the landing page for X is usually distinct from X, whereas here a GR can be its own landing page. Other names I considered: genericResource, doppelganger.

HTTP consistency

The HTTP protocol talks about the representations of the identified resource. These are the 200 responses to GET requests. The idea of this proposal is that the representations of the landing page are the same as the representations of the identified resource, even when they are different resources. The identified resource has no representations that are not representations of the landing page. It is through this gimmick that consistent identity with HTTP is maintained.

If w:landingPage were not functional, the identified resource could have representations coming from one URI that were not representations coming from a different URI, and trying to reflect identities back down from the RDF level to the HTTP level would result in incorrect answers to the question "is Z a representation of the identified resource".

HTTP sometimes talks about other properties of the identified resource, such as where it is located. These properties also have to be consistent if a URI is to identify "uniformly" i.e. to have the same identities and nonidentities at both the HTTP level and the RDF level. See the HTTP consistency use case.

Discovery

This relates to the question "the landing page may describe zero, one, or many things, so which of the many things described, or other things that happen to be at hand such as the landing page itself, is meant?". This is already underspecified by the proposal, and the point is to make the mapping a matter of interpretation influenced by representation provider choice and evolving community ideas. But it is desirable to posit a consistent mapping from landing page to thing so that there is a feeling that some rule is being followed permitting discovery of identity from retrieved content (the discovery principle).

Unfortunately under most proposals the same generic resource might have to serve as the landing page for two or more resources, depending on, say, which URI it came from. That is, to implement distinct URIs for the various things described (e.g. the statement '<U> describedby <V>.' describes both <U> and <V>), set up several URIs as HTTP level aliases for the same GR, and distinguish their referents based on what that GR says differentially using the various URIs.

It may be necessary to say that the URI is the only other thing the "identification" mapping is supposed to depend on. But expressing this as a constraint on RDF interpretations doesn't work, since by the time we talk about the property, the URIs have been lost and all we have is domain elements. We would have to preserve the URI itself as a property of the landing page (removing the possibility of aliases between resources), which would be very ugly.

We will consider this disambiguation to be in general a lost cause, but see below.

The following may be the tightest constraint we can get away with:

  1. For an interpretation to be conforming, whenever x and y are related by the w:landingPage property, either x=y or x is related to y via wdrs:describedBy.

with the observation that nothing requires wdrs:describedBy to be functional.

Reference to content use case

This is the case where you're writing a message or document and you want to refer to GR(U) for some REHU U.

Alternatives:

  1. Use the URI and hope that whoever's interpreting your graph also assumes <U> w:landingPage <U>
    1. This will be fairly obvious if the representations at U contain this statement
    2. It should also be clear (to human judgment) if the representations don't seem to describe anything in particular
    3. It should be clear if the representations are evidently self-describing
    4. It should be clear if the representations are equivalent to serializations of RDF graphs and the graphs don't contain <U>
    5. A Link: header could provide this information
  2. Use the URI and explicitly say <U> w:landingPage <U> in the same message or document, then hope that others will not use the URI differently
    1. This is a good bet if you're the one controlling representations at the URI and can insert a declaration
    2. Pretty good bet if there is no other reasonable interpretation (see below)
  3. Use a local hash URI or tag: URI and define it, e.g. <#doc> where <#doc> w:landingPage <U>
  4. Use blank node notation similarly [w:landingPage <U>]

Reference to something described use case

This is the case where you want to refer to something described by the representations retrieved from U.

If the representation might describe any of several different things this gets a bit dodgy, but let's assume what's described is clear. There are two common modes of description: either the representations use the URI to refer to something (i.e. they describe <U> by incorporating statements whose subject is written <U>), or there is some obvious primary topic.

  1. Use the URI and hope the intent will be clear
    1. It will be clear if representations contain an explicit declaration such as the below
    2. It will be fairly obvious if representations contain lots of statements whose subject is <U>
    3. Maybe it will be clear in cases like Amazon page URIs referring to books (depending on who you talk to)
    4. The representation provider could help make it clear using a Link: header per below
  2. Use the URI and explicitly say <U> wdrs:describedby [w:landingPage <U>] (with local hash URI or tag: URI as alternatives to blank node), or similar with foaf:primaryTopic (but be careful if multiple things seem to be described, or if primary topic could be more than one thing)
  3. If that's annoyingly long let's define a new property that's the composition of the two (this property could be used in Link: too)
  4. Use the URI and explicitly say <U> wdrs:describedby <V> where V is covered by these considerations
  5. Use the URI and explicitly say <U> wdrs:describedby <V> where <V> w:landingPage <V> seems likely based on its being on the right-hand side of a wdrs:describedby statement

It would be nice if <V> w:landingPage <V> were a consequence of <U> wdrs:describedby <V> as in the "no longer implies" change proposal, but I'm afraid it doesn't logically follow, it's only highly likely. We could try to retroactively shoehorn this into the semantics of wdrs:describedby, or (better) we could define a new property that entails it.

Reference to self-describing content

  1. Use the URI and hope it will be clear (since you will probably get the right answer under either of the two plausible interpretations, although it gets dodgy with, say, Moby Dick)
  2. Use the URI and explicitly say either, or ideally both, of <U> wdrs:describedby <U> and <U> w:landingPage <U>.

Consequences for linked data

<U> rdf:type :Earthquake now becomes plausible even when U is a REHU.

To implement a functional w:landingPage one must be prepared to map from resources (however they are modeled) back to a unique landing page. This is usually pragmatically easy since you know the URI (you are using it to refer to the resource) and you can recover GR(u) from the URI. But if a resource is named by two REHUs, it will not be possible to decide which landing page to pick, since the URI has been lost by that point.

Since the relation is functional, it becomes incorrect to say (or imply) <U> owl:sameAs <V> if U and V are both REHUs and GR(U) not= GR(V). This requirement would be met by a UNA (unique name assumption) but is slightly weaker.

If e.g. <U> rdf:type :Earthquake and <V> rdf:type :Earthquake and <U> owl:sameAs <V>, then either the content has to be equivalent, or members of class :Earthquake need to be interpreted not exactly as people but as "views" or "profiles" on people, or people "according to" some authority. People engage in this kind of substitution (metonymy) all the time without getting confused, and in the case where machine inference is desired it is possible as an option to be clear and avoid such confusions.

One way to get aliasing (sameAs) might be to use a 302 or 307 redirect.

This limitation on owl:sameAs is the price you are asked to accept, in exchange for the benefit of being able to use 200 instead of 303. Of course if you wanted to you could still write owl:sameAs where it is incorrect in order to achieve some end; but it would be wrong, i.e. nonconforming. Maybe this is OK since most uses of owl:sameAs in RDF are already nonconforming to the OWL spec.

Comparison with "no longer implies"

NLI doesn't require describedby to be functional, i.e. it allows multiple landing page for the same thing. So a blank node that's a describedby target can't be used to designate a particular landing page. NLI just says that the target of a describedby is an information resource. The implication is that you can use the IR's URI to refer to the IR, but this conclusion, while sort of reasonable, doesn't logically follow (and seems to be contradicted by the existence of IRs that are not their own landing pages, such as the Flickr example, or Crossref DOIs as they used to be served, with 302).

The present proposal improves on NLI:

  • by allowing determination of a particular landing page for a URI (as opposed to an ambiguous reference)
  • by giving a way to reliably refer to documents that don't describe anything (which obviously can't be done using describedby).

[w:landingPage <U>] is similar to [w:contentUri "U"] from some of the other proposals.

Possible mitigation

We could standardize a new equivalence relation (property) that is similar to owl:sameAs but considers equivalent things that are the same other than that they're described on different web pages. Then those currently using owl:sameAs could just switch to the new property, inference engines could be retooled to effectively alias the two (which ought to be easy), and we're done. In fact many uses of sameAs are logically incorrect and switching from sameAs to a new predicate could help protect data from more aggressive inference engines.

How to turn this into a useful agreement

Some URIs could identify compatibly with HR14a, via either an opt-in or an opt-out scheme. This would be the case if a resource is its own landing page:

<U> w:landingPage <U>.

Otherwise, there are (at least) three general tacks one could take. A single URI is associated with up to three resources: a landing page, what the landing page is about (call that X), and what the URI identifies (<U>). What's needed is a consistent interpretation, in tandem, of <U>, and the property being used to assert something about the interpretation of the URI, e.g.

<U> :bestFriend [foaf:name "Patrick Winston"].

First note that any interpretation has to assign a unique landing page to <U>. That is, <U> cannot have two different landing pages. The representations (sensu HTTP specification) of <U> are the representations (sensu "information resource" encoding) of the landing page. Given that here are some possibilities (probably not an exhaustive list):

  • <U> is the document, and :bestFriend is interpreted as relating <U> to Patrick Winston by composing <U>'s relation to X with X's relation to Patrick Winston. (This would actually be HR14a in disguise.)
  • <U> is X, and :bestFriend relates X to Patrick Winston.
  • <U> is a "chimera" i.e. a synthetic entity that has some properties of the landing page and some properties, e.g. :bestFriend, of X.

Any of these interpretations would be consistent with this proposal. A particular realization of the proposal might or might not care to encourage one of the three interpretations over the other, e.g. by standardizing on the use of some ontology or set of annotations whose documentation would favor on of the interpretations.

See also