TagIssue57Proposal27

From W3C Wiki


See also TagIssue57Home, /Earlier

For setup / background please read /Background. Also it might be appropriate to read the URIs in Data primer (forthcoming) before delving into this document.

This is an attempt to work out the detailed RDF semantics for "shorthand" and "parallel" properties.

Much of the complexity here may be specific to RDF, which has compositional semantics, i.e. whether a relationship is true of its arguments depends on the meanings of the arguments, not the way the statement of the relationship is written (the URIs themselves). In formats like JSON-LD, the story ought to be simpler, since its relationships hold between the URIs themselves, not their meanings.

We'll start small, by assuming RDF and simple object properties with undisputed objects. If we can't handle that case (at least) then we're doomed. Once this case makes sense, confirm that data properties work the same way, generalize to other notations, and find something 'architectural' to say. Please don't complain about how ungeneral this presentation is; it does not aim to be completely general.

Claim: We'll never agree on what the disputed URIs (the GET/200 ones) identify.

Claim: It's OK if interpretations differ; what matters for interoperability is entailment, not interpretation.

Problem: How, in this situation, groups of developers with different interpretations of a vocabulary can agree on entailment without agreeing on interpretation.

Solution: Posit that there are functions (called 'projection' functions), each of which translates from whatever one has (what's identified by the URI according to whatever the local interpretation happens to be) into the interpretation preferred by a particular group/application. Build properties for daily use by composing the intuitive (easily described, shared semantics i.e. work as is with undisputed URIs) relations with these projection functions.

Solution for unaffiliated parties: Make no assumptions about what the URI identifies, and always use a projection function to get predictability.

Consequence: What a contested URI identifies has to be mappable to all the interpretations needed by all groups. Therefore no equivalences can be assumed between disputed URIs other than those that all groups can agree on.

In any case what we have is a product construction in the category theory sense. What matters overall is the commutative diagram, not the choice of interpretation.

Scenario

Consider the following three interpretations of the same vocabulary {U, P, Q, R, D, T} (those are all URIs). The left column shows what's written in RDF; the other three columns show the interpretation of those constructs according to the three interpretations. We're using single letters to make the presentation easier to read; these stand for URIs such as http://example/omf.

Key:

  • Interpretation 1 = TimBL's / httpRange-14
  • Interpretation 2 = Ian Davis's / what is described
  • Neutral interpretation = the one HT/JT/JAR discussed in June, used by someone who just wants interoperation and doesn't care what the interpretation is.
  • OMF = Our Mutual Friend (novel created by Dickens)
  • LP = a landing page for Our Mutual Friend (created by Tomlinson)
  • ptopic = function from landing page to primary topic (subject of, what's described)
  • lp= function from document to designated landing page
  • U = 'disputed' URI e.g. http://example/omf with GET U => 200 landing page
  • P = URI interpreted as the "has creator" or "created by" property by all groups (perhaps http://purl.org/dc/terms/creator)
  • Q = 'parallel property' relating <U> to Tomlinson
  • R = 'parallel property' relating <U> to Dickens
  • D is interpreted by everyone to be Charles Dickens (perhaps a hash URI)
  • T is interpreted by everyone to be Claire Tomlinson
  • (*) = statement is false
Syntax Interpretation 1 Interpretation 2 Interpretation 3
U LP OMF a pair <LP,OMF>
with projections proj1, proj2
P creator creator creator
<U> <P> <D>. LP is by Dickens(*) OMF is by Dickens ? do not write
<U> <P> <T>. LP is by Tomlinson OMF is by Tomlinson(*) ? do not write
Q creator creator ∘ lpage creator ∘ proj1
<U> <Q> <T>. LP is by Tomlinson LP is by Tomlinson LP is by Tomlinson
R creator ∘ ptopic creator creator ∘ proj2
<U> <R> <D>. OMF is by Dickens OMF is by Dickens OMF is by Dickens

Following are diagrams illustrating the interpretations. The common ground for all of them is as follows:

  1. 'GET U' yields a landing page whose 'primary topic' is Our Mutual Friend.
  2. What U identifies bears some relationship (call it Q) to C. Tomlinson, and some relationship (call it R) to C. Dickens, i.e.
   <U>  <T>.  
   <U> <R> <D>.

Now one group of users/programmers/agents thinks that U identifies the landing page. For them, the property Q would be the same as creator, and R would be the composition of creator and primary topic:

Our second group takes U to identify what the landing page says it (U) does. Assume it says (entails) that U identifies Our Mutual Friend. For them, R would be the same as creator, and Q would be the composition of creator with a relationship 'lpage' which maps documents to their landing pages in such a way that Our Mutual Friend is mapped to a landing page that you GET using U:

A third group just wants to get along with everyone and doesn't want to make any assumptions about what's identified. All they know is that there are relationships proj1 and proj2 that get you from what's identified (whatever it may be) to the document and the landing page, respectively:

N.b. the phrase 'parallel properties' doesn't apply to the cases discussed above, because by making T and D undisputed, we've reduced this to the data property case. When the object URIs themselves are disputed, there is a second choice of whether the object reference is itself to be indirect. When both subject and object are indirect references, we can say the property is a "parallel" property.

Good citizenship: How to document Q and R

The property Q bears some relationship (call it M) to the property P, and R some other relationship (call it N) to P, i.e.

    <M> <P>. 
   <R> <N> <P>.

We're suggesting that (once proper URIs are chosen for M and N) people preparing vocabulary specifications that are intended to be used with hashless URIs should declare properties (such as Q and R) in just this way. They could get away with providing the information about Q and R in prose documentation, but by expressing the property/property relationships formally, the relationships can be detected and exploited by processors such as Tabulator.

Using terminology found elsewhere (primer in preparation), Q is an "immediate" property (it is about the landing page), and R is a "shorthand" property (it is about what the landing page is about). P is a "direct" property since it doesn't involve any indirection (projection).

The two property/property relations (declarations) that we discussed in June were the operators _ ∘ proj1 and _ ∘ proj2, i.e.

 <Q> <M> <P>.

implies IEXT(IS(Q)) = IEXT(IS(P)) ∘ proj1

 <R> <N> <P>.

implies IEXT(IS(R)) = IEXT(IS(P)) ∘ proj2. (See RDF model theory or cheat sheet for IEXT and IS.)

The interpretations of M and N are, as indicated above, different for different groups. But there is no reason not to have standard URIs for them, and indeed it would be useful to give them URIs.

If Q and R are adequately documented without appeal to the property P, one might even write one of

   <Q> <M> [].   or
   <R> <N> [].

which avoids the annoyance of having to give names to properties that might never be used. The latter, for example, would be the same as saying "R is a shorthand property".

TBD: These constraints aren't expressible in OWL DL. If M and N are written at all they will need to be annotation properties. Assess whether DL property chains, which only give inclusion, not equality, are a good alternative to M- and N-statements.

Where the object of a property might also be written using a hash URI, the story gets more complicated. In principle we might need, in addition to M and N, two corresponding declarations that say how the object is to be treated.

Proposed processing model

From

   <U> <Q> ?t.
   U is a hashless http: URI 
   <Q> <M> ?p.

deduce

   <U> <F*> _:1.       #implies (asserts) that such a _:1 F*-related to <U> exists
   _:1 ?p ?t.

i.e. there is a landing page associated with <U> (typically due to 200, but conceivably described by 303?).

F* (one of the projection functions) is functional so create only a single _:1 for each such URI U.

(Semantically, F* is just M applied to the identity function, i.e. <F*> <M> owl:sameAs.)

Similarly, from

   <U> <R> ?d.
   U is a hashless http: URI
   <R> <N> ?p.

deduce

   <U> <G*> _:2.       #implies (asserts) that such a _:2 G*-related to <U> exists
   _:2 ?p ?d.

i.e. there is something primary-topic-associated with <U> ("face value" regardless of 200/303?).

G* is also functional, so _:2 can be unique for each URI U.

TBD: Do we really need the syntactic precondition 'U is a hashless http: URI'? Syntactic conditions on inference rules wreak havoc with entailment (compositional semantics, referential transparency).

TBD: Work through the DOI+303 case (see below regarding interoperability with 303).

TBD: Consider a rule

  <U> <R> ?t.
  U is a hash URI
  ?t <M> ?p.

permits

  <U> ?p ?t.

??

Analyzing interoperability of shorthand properties with hash and 303 URIs

Work in progress: Filling in the boxes below, adopting the terminology used below:

URI class relation type
Compensating Non-compensating
Disputed X
Self-describing ("HR14a opt-in")
#
303

✓ = interoperability achieved, X = not achievable (we would recommend that one does not write such a statement).

"Compensating" means the property is declared to factor through F* or G*, while "non-compensating" means it is not. E.g. "shorthand" properties such as R are compensating, while P is non-compensating.

For maximum interoperability the goal is for all the blue boxes (1st, 3rd, 5th on the bottom row) in the following figure to be the same as one another, and for all the green boxes (2nd, 4th, 6th) to be the same as one another. This seems impossible.

Each box contains a vector of five things. Each arrow is a mapping. In each case the mapping maps the things in the tail box to the corresponding things in the head box.

There is only a single representation Rep involved in the story. Rep is (the serialization of) an RDF graph that (by supposition) tells anyone who tries to understand it on its own (i.e. without looking at the Web) that

  • the URIs U200-omf, U#omf, and U303 all identify Our Mutual Friend, and that
  • the URIs U200-lp and U#lp both identify (a generic resource whose instances include) Rep.

Rep is delivered in a 200 response to GET of U200-omf, U200-lp, and U. For GET of U303, one receives 303 Location: V, and then GET of V yields 200 Rep. (perhaps V = U.)

I1, I2, I3 are three interpretations. F*j = the function F* corresponding to interpretation Ij, etc.

This shows the issues of idempotency and come-from, and their duality with one another, pretty clearly, I think.

For interoperability we seek F*j(Ij(x)) = I1(x), G*j(Ij(x)) = I2(x) for all j, x. Idempotency is required when a mapping threatens to fail to preserve an identity, and there is a come-from problem when a mapping fails to preserve a distinction. ('identity' meaning equation or identification of two things.)

(?) We can probably deal with idempotency of F*; we would need to make sure that we never have a landing page for X, where X is a landing page for Y not equal to X. Then, for things that aren't landing pages (the representations don't say what the URI identifies, or there aren't any representations), F* is the identity. Similarly, G* could be identity on things that don't have (assigned) landing pages, although testing for this would be quite a trick in open-world semantics.

The come-from problem is harder to deal with and for now it looks like a fatal flaw. One solution might be to convince everyone to always use I3 in preference to I1 and I2. The other is just to never combine hash URIs with compensating (shorthand) properties.

See also /Suggestion - a late-arriving suggestion.