TagIssue57Proposal27/Background-with-comments

NOTE: This is a copy of page TagIssue57Proposal27/Background that was created by David Booth to allow other people's comments to be inserted without messing up the original page. Please feel free to add your own comments, following this commenting convention (but without the use of bold font). -- David Booth 03:46, 2 October 2012 (UTC)

JAR September 2012, content moved away from TagIssue57Proposal27/Earlier

Goals of this exposition:

Not RDF or URI specific

I'm not sure what you mean, but I think the scope of this inquiry should be limited to RDF and URIs/IRIs, given that: (a) nobody outside the RDF / Semantic Web world cares about this problem; and (b) attempting to generalize the problem beyond this domain makes it harder to solve. -- David Booth 03:46, 2 October 2012 (UTC)

Start with first principles, not historical accident
Do not assume the webarch party line or world view
Explain why the pain seems to be peculiar to RDF

See also Roy Fielding email here and here.

Underspecification of meaning begets extension.

Yes! This is a feature, not a bug. This allows an identifier to be used for a wider range of purposes. The looseness of the SKOS ontology is a good example of this. -- David Booth 03:46, 2 October 2012 (UTC)

Format and protocol standards (whether defined by a specification or by some kind of application) that don't specify some aspect of meaning, or underspecify it, invite extension by users of the standard.

E.g. the application-level semantics of an XML document is not given by the XML specification itself and is therefore in each case an extension of XML. Even when namespaces are used, the semantics of element and attribute names is not specified by XML per se. This is obviously implicit in the design of XML.
Neither RFC 2616 nor the RDF specifications say what hashless http: URIs "identify" or mean on their own, and different parties have filled this vacuum by adopting various extensions (REST, generic resources, linked data "take at face value", "what I want it to mean", etc.).

Sometimes extensions are mutually incompatible.

Yes! This is a direct consequence of the above feature. This is not a "problem" that must be solved, it is a fact of life that we must learn to live with. Independent authors sometimes write RDF datasets that are incompatible with each other, even if they started from the same URI definitions. This is analogous to what can happen when two composers independently harmonize the same melody, one adding a tenor line and the other adding a bass line -- *without* communicating with each other. Although each composer's harmonization may sound great in isolation, the tenor and bass parts may be completely incompatible when combined. Both composers shared the same concept of the song's identity to a point -- the part that was defined by the melody -- but their notions of the song's identity differed beyond that, and that's okay. A totalitarian attempt to force a melody (or URI or RDF graph) to have a unique, predetermined interpretation would have the positive effect of making merging easier, but the negative effect of prohibiting other interpretations (or applications) that wished to use that melody/URI/graph in other ways. Precision is obviously beneficial sometimes, but looseness can also be beneficial, as it allows that melody/URI/graph to be used in a wider variety of contexts or applications than otherwise would be possible. Again, the SKOS ontology is a great example of this. -- David Booth 03:46, 2 October 2012 (UTC)

Different extensions of the same kind of construct (document / message / command / phrase / identifier) can lead to incompatible meanings, i.e. to consequences (under extension 2) that are not just unintended or unanticipated (according to extension 1), but actually incorrect or in conflict with correct behavior (under extension 1).

E.g. an RDF statement might be appropriate or "true" under extension 1 (say, the "generic resource" extension), inappropriate or "false" under extension 2 (say, the "take at face value" extension). An application acting on an inappropriate statement will do the wrong thing. See here for example.

Isolated extensions are unproblematic.

Incompatible extensions aren't a problem as long as the use of one extension (generation and interpretation) is isolated from others, i.e. there is no "mixing" of constructs assuming different extensions (meaning you can't tell which was intended), or use of applications that assume one extension with content created assuming the other. One can do what one likes in the privacy of your own sandbox or community, as long as one's content and applications stay there.

However, RDF, by design, invites mixing, e.g. in triple stores, and 'interoperation', e.g. applying general purpose applications to content discovered on the Web.

Yes, but just because RDF was designed to invite mixing, that does not mean that all RDF datasets are automatically compatible. -- David Booth 16:24, 3 October 2012 (UTC)

Interoperability requires keeping track of which extension is intended.

Those who care about interoperability (mixing of content that relies on different extensions) will want to mark, modify, or otherwise distinguish constructs so that different meanings are expressed using different or differently marked constructs.

One can certainly mint a new URI with a more precise definition if one wishes to further constrain the URI's resource identity in a particular way. But as long as different RDF authors are permitted to independently use the same URI in writing different RDF datasets, then the issue of compatibility of those datasets can still arise, because the datasets may subtly constrain the permissible interpretations of the URI in incompatible ways, without the authors realizing it. -- David Booth 03:46, 2 October 2012 (UTC)

There are various ways to record which extension is intended.

The simplest would be to mint a different URI, of course. -- David Booth 03:46, 2 October 2012 (UTC)

The marking or modifying could be done at any of various contextual levels, e.g.

keep track of it out of band (metadata)
use distinct media types
put an extension indicator in the document (e.g. version indicator, namespace declaration)
make the distinction in the immediate context of each occurrence of the identifier
use different identifiers to express different meanings

In RDF, by design, statements get "mixed" with one another and their provenance is often lost. Thus out of band, media type, and version indicator are not acceptable solutions. Getting agreement to use different URIs when purposes differ seems to be a lost cause. By elimination, this leaves immediate context (sentence) as the only potentially fruitful direction.

Yes, getting agreement to use different URIs when purposes differ *is* a lost cause, because purposes almost always differ. But no, immediate context (sentence) is *not* the only potentially fruitful direction. The other possibility is to accept the fact that ambiguity of reference is inescapable and learn to live with it: constraining ambiguity as desired, and disambiguating after the fact as needed. -- David Booth 13:42, 3 October 2012 (UTC)

Proposal 27 is to make distinctions via the immediate context

So here is the key hypothesis. What really matters, it is supposed, to parties preferring particular extensions to identifier meaning, is the meaning of constructs containing their choice of identifier; not that the identifier "identifies" anything in particular. The identifier meaning extension was created as a means to an end, and the end is (a) to have an enclosing construct that has the desired larger meaning, and (b) for the identifier in question to occur in that enclosing construct.

So as long as two constructs are generated and interpreted compatibly by sending and receiving parties, it doesn't matter what anyone thinks the identifiers "identify."

In RDF, the "enclosing construct" is a statement, and distinctions between extensions (according to the proposal) will be drawn by choice of property.