TagIssue57Proposal27

Scenario

Consider the following three interpretations of the same vocabulary {U, P, Q, R, D, T} (those are all URIs). The left column shows what's written in RDF; the other three columns show the interpretation of those constructs according to the three interpretations. We're using single letters to make the presentation easier to read; these stand for URIs such as http://example/omf.

Key:

Interpretation 1 = TimBL's / httpRange-14
Interpretation 2 = Ian Davis's / what is described
Neutral interpretation = the one HT/JT/JAR discussed in June, used by someone who just wants interoperation and doesn't care what the interpretation is.
OMF = Our Mutual Friend (novel created by Dickens)
LP = a landing page for Our Mutual Friend (created by Tomlinson)
ptopic = function from landing page to primary topic (subject of, what's described)
lp= function from document to designated landing page
U = 'disputed' URI e.g. http://example/omf with GET U => 200 landing page
P = URI interpreted as the "has creator" or "created by" property by all groups (perhaps http://purl.org/dc/terms/creator)
Q = 'parallel property' relating <U> to Tomlinson
R = 'parallel property' relating <U> to Dickens
D is interpreted by everyone to be Charles Dickens (perhaps a hash URI)
T is interpreted by everyone to be Claire Tomlinson
(*) = statement is false

Syntax	Interpretation 1	Interpretation 2	Interpretation 3
U	LP	OMF	a pair <LP,OMF>
			with projections proj1, proj2
P	creator	creator	creator
<U> <P> <D>.	LP is by Dickens(*)	OMF is by Dickens	? do not write
<U> <P> <T>.	LP is by Tomlinson	OMF is by Tomlinson(*)	? do not write
Q	creator	creator ∘ lpage	creator ∘ proj1
<U> <Q> <T>.	LP is by Tomlinson	LP is by Tomlinson	LP is by Tomlinson
R	creator ∘ ptopic	creator	creator ∘ proj2
<U> <R> <D>.	OMF is by Dickens	OMF is by Dickens	OMF is by Dickens

Following are diagrams illustrating the interpretations. The common ground for all of them is as follows:

'GET U' yields a landing page whose 'primary topic' is Our Mutual Friend.
What U identifies bears some relationship (call it Q) to C. Tomlinson, and some relationship (call it R) to C. Dickens, i.e.

   <U>  <T>.  
   <U> <R> <D>.

Now one group of users/programmers/agents thinks that U identifies the landing page. For them, the property Q would be the same as creator, and R would be the composition of creator and primary topic:

Our second group takes U to identify what the landing page says it (U) does. Assume it says (entails) that U identifies Our Mutual Friend. For them, R would be the same as creator, and Q would be the composition of creator with a relationship 'lpage' which maps documents to their landing pages in such a way that Our Mutual Friend is mapped to a landing page that you GET using U:

A third group just wants to get along with everyone and doesn't want to make any assumptions about what's identified. All they know is that there are relationships proj1 and proj2 that get you from what's identified (whatever it may be) to the document and the landing page, respectively:

N.b. the phrase 'parallel properties' doesn't apply to the cases discussed above, because by making T and D undisputed, we've reduced this to the data property case. When the object URIs themselves are disputed, there is a second choice of whether the object reference is itself to be indirect. When both subject and object are indirect references, we can say the property is a "parallel" property.

Good citizenship: How to document Q and R

The property Q bears some relationship (call it M) to the property P, and R some other relationship (call it N) to P, i.e.

    <M> <P>. 
   <R> <N> <P>.

We're suggesting that (once proper URIs are chosen for M and N) people preparing vocabulary specifications that are intended to be used with hashless URIs should declare properties (such as Q and R) in just this way. They could get away with providing the information about Q and R in prose documentation, but by expressing the property/property relationships formally, the relationships can be detected and exploited by processors such as Tabulator.

Using terminology found elsewhere (primer in preparation), Q is an "immediate" property (it is about the landing page), and R is a "shorthand" property (it is about what the landing page is about). P is a "direct" property since it doesn't involve any indirection (projection).

The two property/property relations (declarations) that we discussed in June were the operators _ ∘ proj1 and _ ∘ proj2, i.e.

 <Q> <M> <P>.

implies IEXT(IS(Q)) = IEXT(IS(P)) ∘ proj1

 <R> <N> <P>.

implies IEXT(IS(R)) = IEXT(IS(P)) ∘ proj2. (See RDF model theory or cheat sheet for IEXT and IS.)

The interpretations of M and N are, as indicated above, different for different groups. But there is no reason not to have standard URIs for them, and indeed it would be useful to give them URIs.

If Q and R are adequately documented without appeal to the property P, one might even write one of

   <Q> <M> [].   or
   <R> <N> [].

which avoids the annoyance of having to give names to properties that might never be used. The latter, for example, would be the same as saying "R is a shorthand property".

TBD: These constraints aren't expressible in OWL DL. If M and N are written at all they will need to be annotation properties. Assess whether DL property chains, which only give inclusion, not equality, are a good alternative to M- and N-statements.

Where the object of a property might also be written using a hash URI, the story gets more complicated. In principle we might need, in addition to M and N, two corresponding declarations that say how the object is to be treated.

Proposed processing model

From

   <U> <Q> ?t.
   U is a hashless http: URI 
   <Q> <M> ?p.

deduce

   <U> <F*> _:1.       #implies (asserts) that such a _:1 F*-related to <U> exists
   _:1 ?p ?t.

i.e. there is a landing page associated with <U> (typically due to 200, but conceivably described by 303?).

F* (one of the projection functions) is functional so create only a single _:1 for each such URI U.

(Semantically, F* is just M applied to the identity function, i.e. <F*> <M> owl:sameAs.)

Similarly, from

   <U> <R> ?d.
   U is a hashless http: URI
   <R> <N> ?p.

deduce

   <U> <G*> _:2.       #implies (asserts) that such a _:2 G*-related to <U> exists
   _:2 ?p ?d.

i.e. there is something primary-topic-associated with <U> ("face value" regardless of 200/303?).

G* is also functional, so _:2 can be unique for each URI U.

TBD: Do we really need the syntactic precondition 'U is a hashless http: URI'? Syntactic conditions on inference rules wreak havoc with entailment (compositional semantics, referential transparency).

TBD: Work through the DOI+303 case (see below regarding interoperability with 303).

TBD: Consider a rule

  <U> <R> ?t.
  U is a hash URI
  ?t <M> ?p.

permits

  <U> ?p ?t.

??

Analyzing interoperability of shorthand properties with hash and 303 URIs

Work in progress: Filling in the boxes below, adopting the terminology used below:

URI class	relation type
	Compensating	Non-compensating
Disputed	✓	X
Self-describing ("HR14a opt-in")		✓
#		✓
303		✓

✓ = interoperability achieved, X = not achievable (we would recommend that one does not write such a statement).

"Compensating" means the property is declared to factor through F* or G*, while "non-compensating" means it is not. E.g. "shorthand" properties such as R are compensating, while P is non-compensating.

For maximum interoperability the goal is for all the blue boxes (1st, 3rd, 5th on the bottom row) in the following figure to be the same as one another, and for all the green boxes (2nd, 4th, 6th) to be the same as one another. This seems impossible.

Each box contains a vector of five things. Each arrow is a mapping. In each case the mapping maps the things in the tail box to the corresponding things in the head box.

There is only a single representation Rep involved in the story. Rep is (the serialization of) an RDF graph that (by supposition) tells anyone who tries to understand it on its own (i.e. without looking at the Web) that

the URIs U200-omf, U#omf, and U303 all identify Our Mutual Friend, and that
the URIs U200-lp and U#lp both identify (a generic resource whose instances include) Rep.

Rep is delivered in a 200 response to GET of U200-omf, U200-lp, and U. For GET of U303, one receives 303 Location: V, and then GET of V yields 200 Rep. (perhaps V = U.)

I1, I2, I3 are three interpretations. F*j = the function F* corresponding to interpretation Ij, etc.

This shows the issues of idempotency and come-from, and their duality with one another, pretty clearly, I think.

For interoperability we seek F*j(Ij(x)) = I1(x), G*j(Ij(x)) = I2(x) for all j, x. Idempotency is required when a mapping threatens to fail to preserve an identity, and there is a come-from problem when a mapping fails to preserve a distinction. ('identity' meaning equation or identification of two things.)

(?) We can probably deal with idempotency of F*; we would need to make sure that we never have a landing page for X, where X is a landing page for Y not equal to X. Then, for things that aren't landing pages (the representations don't say what the URI identifies, or there aren't any representations), F* is the identity. Similarly, G* could be identity on things that don't have (assigned) landing pages, although testing for this would be quite a trick in open-world semantics.

The come-from problem is harder to deal with and for now it looks like a fatal flaw. One solution might be to convince everyone to always use I3 in preference to I1 and I2. The other is just to never combine hash URIs with compensating (shorthand) properties.

See also /Suggestion - a late-arriving suggestion.