Requirements for Any Theory of "Information Resource"

Jonathan Rees
16 February 2011
http://w3.org/2001/tag/awwsw/2011/axioms-2011-02.html
Superseded by http://w3.org/2001/tag/awwsw/2011/axioms-20110217.html

Abstract

This document describes the author's understanding of what must be true of "information resources" (see AWWW) in order for them to fulfill the intent of web architecture. By focusing on requirements, expressed as logical axioms, we postpone or avoid the question "what is an information resource" which inevitably leads to philosophical wheel-spinning and vexing paradoxes.

The main challenges are making "information resource" independent of the HTTP protocol, while saying what it means for one to be "on the web"; and understanding the meaning of metadata given the potentially arbitrary variability among the HTTP 200 responses to GETs of a single target "resource".

It is a purpose of this document to help explain "information resource" and the httpRange-14 rule, not to defend either. It is hoped that this treatment will be useful in future discussion of whether the rule needs modifying, and if so now.

Status of this document

This document presents the author's views, which have been developed with help from the informal AWWSW task force of the W3C TAG. It is presented as a first step toward consensus first within AWWSW, and then in larger forums. Comments for the purpose of repairing deficiencies and building consensus are welcome.

This document is likely to be revised. When citing please look for the latest revision.

Introduction

TBD

As is appropriate for requirements, the axioms leave many details unspecified, leaving a calculated misinterpretation risk. Readers are challenged to find creative ways to misinterpret the axioms so that missing axioms can be added.

The plan is to transcribe all the axioms into OWL-DL, so I can check for consistency and adequacy, but first I want to lay them out and see whether they make sense to reviewers.

Each axiom is accompanied by commentary. The commentary is meant to provide motivation and intent, but should be considered 'informative', not part of the axiom set.

Web-independent axioms

Our first challenge is to abstract "information resources" (roughly speaking, things that can be put "on the web") away from the Web.

There is a class of 'information resources'.

This is a troublesome one, so to be careful, please interpret the term as constrained by the following axioms, not according to your impulse. In particular ignore the AWWW definition as, while it may be consistent with what is said here, it is not sensible enough to be clearly consistent with it.

We need this term in order to make sense of the intent behind AWWW and the httpRange-14 resolution, although I don't promise that an interpretation compatible with these writings exists.

To say 'class' is not to say 'natural category' or 'ontologically coherent class'. The most natural model of the axioms might well make this the Scylla that Alan Ruttenberg has been menacing us with. [citation needed]

For now just bear in mind: An information resource is whatever it has to be in order to explain how Web metadata seems to work now.

There is a class 'simple information resource' (or 'simple IR') that is a subclass of 'information resource'.

This is what you get when GET, were the simple IR to be on the Web.

Introducing this term instead of 'representation' so that we can explain metadata; see below.

It is supposed to be consistent to assume that this class coincides with TimBL's class 'fixed resource'. Historically, you could take 'simple IR' to be the class of hypertext nodes of of the early Web, before conneg and dynamic content came along and anyone worried about the resource/representation distinction.

If you are FRBR-inclined you might want to be interpret 'simple IR' as a subclass of FRBR 'Manifestation'.

A 'simple IR' has designated parts, including its 'content' (a string, perhaps empty) and its 'media type' (a string, perhaps missing).

This axiom does not rule out other parts as well, such as content-language or expiration date, but that would depend on how you interpret these axioms.

Even given an enumeration of parts, a simple IR's identity is not determined - two simple IRs can have all the same parts yet have distinct origins (provenance). Compare FRBR 'Manifestation'. No need to go into detail here.

I'd permit 'octet sequence' as a particular kind of 'string'.

There is a relation 'has reading' between 'information resources' and 'simple IRs'.

Not functional, because readings can vary by media type, language, session, time, whim, etc.

Possible interpretation: a serial publication, where the January issue (encoded as a particular octet sequence) is a reading of the serial in January, and the February issue is a reading of the serial in February.

Other example: an "abstract document" (or "generic resource"), with readings ("fixed resources") in particular languages.

Compare HTTP Content-location: header.

TBD: Whether this holds may depend on circumstances. For now you can imagine the relationship holds if any such circumstances exist.

'has reading' is reflexive on simple IRs, and total on IRs.

That is, a simple IR is its own and only reading, and every IR has at least one reading.

There exists a simple IR that has a content-type.

There exists an IR that has at least two distinct readings.

What I mean is not simultaneously, as in the English/French example, but I don't have any way to express that in the axioms yet.

Connecting to other vocabularies

The goal here is to force consistency with (and explanation of) deployed use of dereferenceable URIs in RDF statements. To do this we need to tie these axioms into other ontologies.

Information resources are not in the domain of [...].

TimBL at least has argued that information resources, whatever they may be, do not have momentum or position, and are not mathematical. Forgetting ontological concerns, functionally the purpose of saying this is to help rule out misinterpretation. Nonsense invites misinterpretation and threatens interoperability.

TBD: Scavenge a list of these predicates from deployed ontologies.

On the other hand, even if this axiom is dropped, it ought to be pretty hard to interpret IRs to be anything other than appropriate metadata subjects, because of the metadata property and other axioms (below).

'simple IR' is a subclass of 'foaf:Document'

This seems both desirable and safe, given how foaf:Document is both defined and used.

This does not rule out the possibility that other 'information resources' are also foaf:Documents as well.

'simple IR' is in the domain of 'foaf:sha1', which is functional on it.

Presumably computed from the simple IR's content. This makes sense because that content is fixed - not varying by time or any other circumstantial variable.

Agnostic on how content-encoding figures into this.

Metadata properties

This set of three axioms create a pattern that can be repeated for a large set of 'metadata properties' from FOAF, Dublin Core, Web Linking (RFC 5988), and elsewhere, substituting each property in turn for 'dc:creator'.

I can't define 'metadata property', nor would it be appropriate to do so in a set of requirements; the best I can do is that they are properties that 'make good sense' as providing information about simple IRs. In a future version of this report I may include a list of properties from the above ontologies that I would consider metadata properties. The more of these there are, the harder it becomes to misinterpret the axioms.

At least one simple IR is in the domain of 'dc:creator'.

Although this is weak, it helps prevent unreasonable misinterpretations of the intent.

Let R be an IR, S a simple IR, and W a member of the range of dc:creator. Then {R dc:creator W} and {R has reading S} imply {S dc:creator W}.

That is to say that a metadata property 'spreads' from an IR to all of its readings.

This is a strong statement, as it precludes using dc:creator on, say, a serial publication where different issues are created by different agents. On the other hand, without this axiom, metadata is not informative of readings and thus is neither falsifiable nor predictive.

Let R be an IR, S be a simple IR, and W be a member of the range of dc:creator. Then {S dc:creator W} implies {R dc:creator W}.

This is the converse of the previous axiom, and it says that invariant properties of readings must also hold for the IR - the IR must 'fess up' to things that all of its readings do. This rules out pathologies where, e.g., the T is a dc:creator of all of a document's translations, but not of the IR itself. The practical benefit is that it lets you 'gamble' on hypotheses of an IR formed by investigating a number of its readings. You're not guaranteed to be right, but you may be willing to act on the hypothesis.

Web-relating axioms

Not every IR needs to be 'on the web', but it should be possible to put many of them 'on'.

We don't require that every dereferenceable URI be related to an IR, but we do need to say what the special relationship is, when it exists.

(def) A simple IR is 'authorized for' a URI in certain circumstances iff it would be a correct result when dereferencing (sensu RFC 3986) the URI in those circumstances.

"Correct" means technically correct per consensus specification (e.g. RFCs) and recognized authorities (e.g. DNS). This would be hard to formalize, and I hope we won't have reason to.

'Authorized for' is similar to 3986 'dereferences to' (inverse) but (a) we want to rule out the case of unauthorized dereference (system got hacked, etc.), and (b) no actual dereferencing act has to happen in order for this to hold (e.g. data: URIs?).

We do need to account for the circumstances of authorization. Not sure how to make formal. A particular simple IR might be authorized for a URI in a secure session with one user, but not in another. Or if the domain has multiple A records, the simple IR might be authorized when the request is processed by one server, but not by the other.

I'm pulling a fast one here since 'dereferences to' is defined in 3986 to yield a 'representation' but here we need a 'simple IR'. The parts (content, media type) of the simple IR I have in mind are those of the representation. The provenance (or whatever) should be determined not arbitrarily, but by in a way determined by the authorization trail. Perhaps this should be axiomatized.

(def) An 'information resource' is 'bound to' a URI iff every simple IR that is 'authorized for' the URI 'is a reading of' the information resource.

This formalizes "on the web".

TBD: Need to think about 'circumstances'.

Please check; this axiom may be too weak. Could it be that an authority, who means to bind the IR to the URI, just feels insecure and doesn't want to be held responsible for saying that a simple IR is a reading of the IR? And that they would be happy to authorize a reading after being convinced that it was indeed a reading?

To make 'bound to' inverse functional (probably impossible and perhaps undesirable), we would have to say which IR is bound, either as a least upper bound of the readings, or as determined by the authorization process. This is tough - this ought to be a matter of fact, independent of anything the URI owner says.

One case where 'bound to' might not be inverse functional is data: URIs, for the same reason that simple IRs are not the same as representations. For example, data:,chat could be independently generated by an English speaker and a French speaker, and intended to refer to simple IRs depending on context. In this case the provenance of the simple IR would be the provenance of the URI itself. Or would these dereferencing be unauthorized? Hard to say.

There exists an 'information resource' that is bound to the URI "http://google.com/".

For any set {S1, S2, ...} of 'simple IRs' there exists an IR that has S1, S2, ..., as readings, and no others.

That is, a server operator can throw whatever he/she likes at us, and we will still be able to treat their URIs as being bound.

Still thinking about exactly whether/why this is needed. My intuition tells me that it is.

(def) An interpretation of an RDF graph 'respects IR bindings' if, for each dereferenceable URI occurring in the graph (outside of literals), the URI is interpreted to be an IR bound to that URI.

This is explicitly not an assertion that all interpretations 'respect IR bindings'. However it does let us express the first clause of the httpRange-14 rule (or rather what was intended by it) as "kindly use interpretations that respect IR bindings".

If this definition makes you formally queasy (that's you, Pat) try the following alternative: A satisfying interpretation 'respects IR bindings' if it is also satisfying for the graph formed by merging (a) the given graph, (b) a set of 'binding statements', one per dereferenceable URI occurring in the graph, and (c) an appropriate RDF axiom set derived from the above axioms. The 'binding statement' for a URI uuu is defined here to be the statement <uuu> :boundTo "uuu"^^xsd:anyURI.

('Satisfying' is relative to your choice of logic, of course.)

This would be simpler if we could define which IR is implicated for each URI, but this may be both impossible (the URI owner certainly won't be helping us out) and unnecessary (the IR is already constrained as far as observable behavior goes, what else do you want?).

Test

TBD: transcribe to OWL, and derive an inconsistency from an example (toucan, flickr, jamendo, etc)