AwwswAnalysis

From W3C Wiki

AwwswHome . AwwswTopicsBrainstormPage

Analysis around HTTP interaction semantics

Text mostly by JAR. Exceptions noted. Tim's rules have been modified significantly; his original version is here: AwwswTimsRules . See also n3 rules in AwwswDboothsRules .

We start with the question "what can be inferred from an HTTP interaction". By HTTP we mean HTTP/1.1 as defined by RFC 2616.

We are working in an RDF context so we take the question "what can be inferred" to be the same as "what RDF triples (statements) can be inferred". This is not a big leap since RDF is in principle expressive enough to encode anything that can be inferred.

By focusing on "what can be inferred" we might be able to avoid getting bogged down in questions of what a resource or information resource is. Instead we look at what we would like to be able to say about such things.

Recording the facts of the interaction

First of all, there are some concrete observations about the HTTP interaction that can be expressed in RDF, perhaps using an ontology such as that used by Tabulator (http://www.w3.org/2007/ont/http, http://www.w3.org/2007/ont/httph). These would be the premises from which inferences might be made. Some terms that one might find in an ontology around such observations:

  • http:Message (class) - things that have headers and optional content, etc.
  • http:RequestMessage (subclass of Message)
  • http:ResponseMessage (subclass of Message)
  • rfc2616:Entity (not a subclass of Message) - from RFC 2616: "The information transferred as the payload of a request or response. An entity consists of metainformation in the form of entity-header fields and content in the form of an entity-body"
  • rfc2616:Representation (subclass of Entity) - "An entity included with a response that is subject to content negotiation"
  • rfc2616:Resource - "A network data object or service... Resources may be available in multiple representations (e.g. multiple languages, data formats, size, and resolutions) or vary in other ways."
  • rfc2616:get-response (domain rfc2616:Resource, range http:Response) - a request for a representation for the given resource resulted in this response
  • rfc2616:representation (domain rfc2616:Resource, range rfc2616:Representation) - the given resource is available in the given representation

David Booth would prefer to relate the URI (written with "..."), not the resource (written with <...>), to the GET response. This would be cleaner than what we have here, but I'll leave Tim's version in for now.

We do not rule out a separate, perhaps incompatible, set of classes and predicates corresponding to notions in AWWW.

Tabulator says the http:Response (not the rfc2616:Representation) "is access of" the resource, or "resource access response." This does not agree with RFC 2616 as in 2616 a representation is an entity, not a response. An entity is part of a response, but not all of it.

For reference, from RFC2616:

entity-header  = Allow             ; Section 14.7   = allowed http methods
	       | Content-Encoding  ; Section 14.11
	       | Content-Language  ; Section 14.12
	       | Content-Length    ; Section 14.13
	       | Content-Location  ; Section 14.14
	       | Content-MD5       ; Section 14.15
	       | Content-Range     ; Section 14.16
	       | Content-Type      ; Section 14.17
	       | Expires           ; Section 14.21
	       | Last-Modified     ; Section 14.29
	       | extension-header

Any other headers would not belong to the entity (which might or might not be a representation).

Inferences from the status-line and response-headers

httpRange-14 resolution

AWWW effectively extends RFC2616 from the case where the URI "identifies" either a rfc2616:Resource or nothing to the case where the URI "denotes" denotes anything at all, maybe a rfc2616:Resource (roughly speaking an "information resource" in AWWW) but maybe not. The purpose of httpRange-14 seems to be to request that any given server produce a rfc2616:Representation only when, according to information from the naming authority coming over an independent channel, the URI "denotes" something that can meaningfully have one, i.e. an "information resource" (rfc2616:Resource).

httpRange-14 reads as follows:

  • If an "http" resource responds to a GET request with a 2xx response, then the resource identified by that URI is an information resource;
  • If an "http" resource responds to a GET request with a 303 (See Other) response, then the resource identified by that URI could be any resource;
  • If an "http" resource responds to a GET request with a 4xx (error) response, then the nature of the resource is unknown.

(We find the phrase "resource responds to a GET request" incorrect, would prefer: "If an HTTP server responds to a GET request". See ErrataHttpRange14.)

(Tim) Draft format for rules .. does this work? Ontology to be refined!

(Tim as modified by JAR) Architectural rule:

{ ?x  rfc2616:get-response ?y.  ?y http:status "200" } => { ?x a awww:InformationResource }.


  • JAR: a rfc2616:representation cannot have a status, so I change http:representation to rfc2616:get-response.
  • JAR: it appears that link:Document is the same as AWWW's "information resource", so I will use the latter.

(Tim) Ontological assumptions that allow us to infer a problem are of course open but include:


   link:Document owl:disjointWith foaf:Person, doap:Project, ical:Event. # etc etc


Redirection

(Tim) Architectural rule:

The 301 Moved and 302 Found redirections point to a different URI for the resource. So conclusions about the second resource to a certain extent can be carried onto the first resource.

{ ?x  rfc2616:get-response ?y.  ?y http:status "301"; hh:location ?z.  ?z a awww:InformationResource }
  => { ?x a awww:InformationResource }.

{ ?x  rfc2616:get-response ?y.  ?y http:status "302"; hh:location ?z.  ?z a awww:InformationResource }
  => { ?x a awww:InformationResource }.

There may, however be other classes for which this does not work. The class of information resources that do not change with time for example: a URI x can 302 to a URI y denoting a time-invariant IR, but that doesn't imply that the x denotes a time-invariant IR.

(JAR has reworded the previous sentence.)

Caching

Caching information gives certain promises about other responses that might be received in the future. This is cumbersome to express in RDF and will be left as an exercise.

Inferences from the representation

Does the representation (entity) tell you anything at all about the resource? Not clear. RFC 2616 is silent on this except to imply that the resource (which is only visible through its representations) is "data" and that the representation represents this data.

The representation "conveys" the resource

I would think that some clients might want to use the representations to figure out what the URI denotes. A reasonable deduction from a representation might be that the URI denotes some abstract idealization of the "message" carried by the representation, one that may vary by language, format, resolution, etc. but always "says the same thing" as the sample representation. For example, if the representation says "Moscow is the capital of Russia", one might want to conclude that the URI denotes a document that says that Moscow is the capital of Russia. Note that this is different from believing the representation.

I (JAR) believe that this is the intent of RFC 2616.

Question: Suppose I get a representation R for a resource X, and then set up a web server that delivers that representation R for a given URI U. Can we infer that U denotes X? (I.e. is the resource identified with what is conveyed (by R), or is it identified with the network source that delivers R? See "change" section below.

Believing the content

If the representation's content is RDF (or RDFa), and if the agent is willing to "believe" (i.e. assert) that RDF, then in a sense one can infer those RDF statements from the representation. This would of course be subject to the client's trust policy and to temporal limitations on the viability of the content.

Gleaning the reference of fragment identifiers

By looking at the received representation and understanding the particular MIME type specification, one may be able to determine the denotation of URIs based on the URI for the resource that have fragment id's defined in the representation. [example]

Gleaning link structure

(Tim) Architectural rule:

Here ?z link:mentions ?x is true if, for example, a representation of ?z parses to an RDF graph which includes a node whose URI is ?x. Other languages could have similr definitions.

{ ?x  rfc2616:get-response ?y.  ?y http:status "301". ?z link:mentions ?x}
  => { ?z link:warning "Document uses a URI which has moved" }.


Change

Possible attitudes - nonexclusive - toward variability in GET responses (representations) over time (e.g. http://news.google.com/):

  • We form RDF that is specific to the moment (limited by declarations in the caching headers), and do not worry about anything we infer being true beyond that moment. For example, other things being equal, we would not put anything we infer from an HTTP interaction into a durable knowledge base.
  • The reference to the resource in the GET request is indexical. Different times, different resources.
  • The rfc2616:representation relationship is time-varying. A single resource just has different representations at different times. (This is the position taken by the REST philosophy and seems consistent with David Booth's definition of information resource as a "network source of representations". In this case "representation" is similar to "state".)
  • We infer nothing from HTTP interactions, unless we have specific reason to think we can.

The notion of a resource being an abstract message with more or less faithful representations is ontologically incompatible with the REST notion of a resource as a container for states, although how this dissonance plays out in practice is not clear.

Things that one would like to know that can't be inferred from the HTTP interaction

Metadata

One wants to know author, publisher, etc. even in cases where this information is not carried in the representations.

Stability

One wants to know how the representation stream is likely to change in the future - e.g. will it start to carry different messages at some point, or only change in minor ways such as formatting, or is it completely stable.

Versioning

One wants to be able to discover other resources from which this one derives through lineage and merging chains. One wants to be able to obtain URIs for stable resources that have the representations that the given resource has at the moment, but might not have later.

See Also