TagIssue57Proposal27/Background

From W3C Wiki

JAR September 2012, content moved away from TagIssue57Proposal27/Earlier

Goals of this exposition:

  • Not RDF or URI specific
  • Start with first principles, not historical accident
  • Do not assume the webarch party line or world view
  • Explain why the pain seems to be peculiar to RDF

See also Roy Fielding email here and here.

Underspecification of meaning begets extension.

Format and protocol standards (whether defined by a specification or by some kind of application) that don't specify some aspect of meaning, or underspecify it, invite extension by users of the standard.

  • E.g. the application-level semantics of an XML document is not given by the XML specification itself and is therefore in each case an extension of XML. Even when namespaces are used, the semantics of element and attribute names is not specified by XML per se. This is obviously implicit in the design of XML.
  • Neither RFC 2616 nor the RDF specifications say what hashless http: URIs "identify" or mean on their own, and different parties have filled this vacuum by adopting various extensions (REST, generic resources, linked data "take at face value", "what I want it to mean", etc.).

Sometimes extensions are mutually incompatible.

Different extensions of the same kind of construct (document / message / command / phrase / identifier) can lead to incompatible meanings, i.e. to consequences (under extension 2) that are not just unintended or unanticipated (according to extension 1), but actually incorrect or in conflict with correct behavior (under extension 1).

  • E.g. an RDF statement might be appropriate or "true" under extension 1 (say, the "generic resource" extension), inappropriate or "false" under extension 2 (say, the "take at face value" extension). An application acting on an inappropriate statement will do the wrong thing. See here for example.

Isolated extensions are unproblematic.

Incompatible extensions aren't a problem as long as the use of one extension (generation and interpretation) is isolated from others, i.e. there is no "mixing" of constructs assuming different extensions (meaning you can't tell which was intended), or use of applications that assume one extension with content created assuming the other. One can do what one likes in the privacy of your own sandbox or community, as long as one's content and applications stay there.

  • However, RDF, by design, invites mixing, e.g. in triple stores, and 'interoperation', e.g. applying general purpose applications to content discovered on the Web.

Interoperability requires keeping track of which extension is intended.

Those who care about interoperability (mixing of content that relies on different extensions) will want to mark, modify, or otherwise distinguish constructs so that different meanings are expressed using different or differently marked constructs.

There are various ways to record which extension is intended.

The marking or modifying could be done at any of various contextual levels, e.g.

  • keep track of it out of band (metadata)
  • use distinct media types
  • put an extension indicator in the document (e.g. version indicator, namespace declaration)
  • make the distinction in the immediate context of each occurrence of the identifier
  • use different identifiers to express different meanings
  • In RDF, by design, statements get "mixed" with one another and their provenance is often lost. Thus out of band, media type, and version indicator are not acceptable solutions. Getting agreement to use different URIs when purposes differ seems to be a lost cause. By elimination, this leaves immediate context (sentence) as the only potentially fruitful direction.

Proposal 27 is to make distinctions via the immediate context

So here is the key hypothesis. What really matters, it is supposed, to parties preferring particular extensions to identifier meaning, is the meaning of constructs containing their choice of identifier; not that the identifier "identifies" anything in particular. The identifier meaning extension was created as a means to an end, and the end is (a) to have an enclosing construct that has the desired larger meaning, and (b) for the identifier in question to occur in that enclosing construct.

So as long as two constructs are generated and interpreted compatibly by sending and receiving parties, it doesn't matter what anyone thinks the identifiers "identify."

  • In RDF, the "enclosing construct" is a statement, and distinctions between extensions (according to the proposal) will be drawn by choice of property.