TagIssue57Proposal26

From W3C Wiki

See also TagIssue57Home

The proposal is to explain the "parallel properties" pattern for use by people who want to use the REHU + linked data pattern (as opposed to hash URIs). (REHU = retrieval-enabled hashless URIs, or "200 URIs")

See also Sandro's blog post

The semantics of a REHU are, or should be in general, specified by the protocol or document format in which it is used. The specification should say how use of a REHU in that spec relates to use of the REHU in its "native" context (usually a protocol such as HTTP).

However, RDF is a special case because it does not specify the semantics of any URI, for purposes other than formal logical inference. So we suggest the following:

When interpreting a REHU in RDF, take it to refer to whatever it might refer to in its native contexts, if possible; however usually the native context doesn't say much about this. In the HTTP case we should take this to be some entity pretty closely connected to its automated use within the protocol, e.g. a file, script, etc.

To make the linked data pattern work we put all the action in the properties. Suppose we wish to say that Alice, who is associated with the REHU :alice, has friend Carol. We use a parallel property of 'indirect friendship', defined on what :alice refers to (identifies; a "node"), that holds that Alice (or whatever the subject is) has the subject of what :bob refers to. If v:friend refers to the indirect friendship property, we write

:alice v:friend :carol.

to say that Carol is Alice's friend.

It's best if v:friend is documented to work like this, but even if the documentation makes it seem to refer to the friendship property (perhaps named by URI zz:friend, although perhaps unnamed), it may be possible to interpret v:friend to refer to the indirect friendship property, rather than the friendship property.

(SVG here)

RDF interpretations design space exploration

To make the presentation easier to read, here are some abbreviations:

:alice    = http://example/alice
:carol    = http://example/carol
h:dolores = http://example/employees#dolores
v:friend  = http://example/vocabulary#friend

Now suppose someone has seen (or will see) the following HTTP interaction

Request:

GET http://example/alice HTTP/1.1

Response:

200 HTTP/1.1 OK
Content-type: text/turtle

@prefix  [etc]
:alice v:birthYear 1963.

Consider the following graph:

G1) :alice v:friend  :carol.
G2) :alice dc:creator :bob.
G3) h:dolores  v:friend  :carol.

Also suppose that in the Dublin Core sense of "creator" people don't have creators.

In RDF we have a choice of interpretation, and they're all pretty uninformative from the point of view of entailment; but they differ in their functional consequences (what "applications" do), e.g. in response to observed HTTP exchanges, or in generating them. Here are some mappings of the URIs individually (an interpretation in the RDF sense would be the entire mapping, therefore some combination of these individual mappings).

Some definitions introduced locally in order to streamline the description of the choices:

  • a "node" is the kind of thing an http: URI might identify at the protocol level (question: what about other protocols?)
  • the "subject" of a node is the thing associated with the subject, what it's mainly "about" - with the understanding that HTTP URIs "identify" nodes, REHUs refer to nodes, and only nodes have subjects

So here are mapping choices to be made in selecting an interpretation:

A1)  IS(:alice)  is a person  (similarly for :carol)
A2)  IS(:alice)  is a node
B1)  IS(h:dolores)    is a person
B2)  IS(h:dolores)    is a node (a.k.o. document fragment)
S1)  IS(v:friend) = lambda x y. x's friend is y
S2)  IS(v:friend) = lambda x y. x's subject's friend is y's subject
S3)  IS(v:friend) = lambda x y.
        IF {x is a node}
        THEN {whether x's subject's friend is y's subject}
        ELSE {whether x's friend is y}
S4)  IS(v:friend) = lambda x y.
        {x's friend is y} OR {x's subject's friend is y's subject}
S5)  IS(v:friend) = lambda x y.
        IF {x is something that could have a friend}
        THEN {whether x's subject's friend is y's subject}
        ELSE {whether x's friend is y}
S6)  ... other possibilities ...
S7) lambda x y. x's subject-or-self's friend is y's subject-or-self  [coercion]
D1)  IS(dc:creator) = lambda x y.
        the creator of x, according to Dublin Core definition, is y
D2)  IS(dc:creator) = lambda x y.
        the creator of a retrieval of x, according to the Dublin Core
          definition, is y's subject

You can't make choose these mappings independently; a given graph only makes sense when they are combined with one another (this is both true, and as specified by RDF model theory). So we have to look at the overall interpretations, the combinations of mapping choices.

Status quo (HR14a):

  A2 + B1 + S1  --> G1) is false
  D1 and D2 are both dubious.

NLI-like interpretation:

  A1 + B1 + S1  --> G2) is false regardless of D1/D2 choice

Parallel properties: this requires A2, but the rest are up for grabs...

  A1 is false, A2 is forced.
  If B2, then S2 is OK.
  If B1, then we need {any of S3, S4, S5}.
  D1 is dubious, D2 is OK.

Annoyance: I'm waving my hands re D1/D2 choice.

Annoyance: the "subject" of a node would have to be explained (given a node, how do we know what its subject is?). This is not a fatal flaw, just confusing.

"node" is intentionally vague. Similar to (not necessarily same as) "hypertext node", "HTTP resource", "information resource"

JAR might propose that documents aren't nodes... requiring a different kind of parallel property dealing with associated documents as opposed to associated things-that-the-representations-are-about.

Possible points

  • to avoid confusion please restrict use of the word 'identify' to the way it's used in the RFCs (not how used in RDF)
  • for hashless URIs, 'identification' is as connoted by the URI scheme (as documented by its registration)
  • hash URIs have semantics that depend on the media type
  • when hashless URIs both 'refer' (or 'denote', as in RDF) and 'identify' (as in protocols), it is a social good for them to refer to what they identify
  • there is a legacy exception for http:/HTTP/GET/303 (they don't refer in RDF to what they identify in HTTP); tolerated but discouraged
  • encourage one of the following two patterns for linked data:
    • hash URIs (clearer, better for inference, OWL)
      • for large namespaces, encourage some common convention for use of # (TBD, JAR suggest #_)
      • encourage people to think of this as a postfix indirection operator like * in C
    • for things like OGP. build the indirection into the relation itself. two sub-patterns:
      • 1) parallel direct+indirect properties (direct A R B + indirect A' R' B', where R' is a composition indirect o R o indirect-inverse)
      • 2) like 1 but only give a name to the indirect property
  • tolerate some amount of sloppiness (since people are being sloppy anyhow; it's unavoidable)
  • better PR for all of this
  • for when 303 performance matters, use Sandro's "publish the redirect rules" idea
  • clarify that in RDF at least it is buyer (receiver) beware, i.e. this is sender "best practice" but not normative on either sender or receiver, as the relation of RDF to 'identification' is not part of the definition of RDF. associated resource is "what the retrieved representation is about"
  • validators are a good thing. The author(s) of this proposal advocate development and deployment of RDF validators, including "overeager" ones that make good-taste assumptions not warranted by the specs.