Towards Formal HTTP Semantics - AWWSW Report to the TAG

Jonathan Rees, David Booth, Michael Hausenblas
4 December 2009

A report of the AWWSW Task Force. This version, latest version

Abstract

Discussions of Web architecture and the semantics of communication on the Web get mired in confusion over definitions and how different conceptual models of Web resources compare. By creating a modest formal semantics we hope to be able to put such discussions on a rational footing and promote transparency by surfacing as many assumptions as possible. This should in turn help point to a way forward for work on a variety of architectural and curatorial questions.

We are speaking here not about the details Web protocols, but about what these protocols say about resources "identified" by URIs, and how what is transmitted relates to these resources.

Application areas for a formal semantics of HTTP include, but are not limited to, validators (such as Vapor) and debug tools, generic data browsers (for example Tabulator), server access log analysis (see e.g., Linked Data Access Analysis), persistent caches, documenting properties of resources such as stability, and handling the provenance of web artifacts (in particular for scientific applications).

We assume the reader is familiar with RFC2616, RFC3986, the AWWW I, @@TODO what else?

Introduction

Although the Web has expanded well beyond its roots in hypertext-oriented interactions, such interactions are still at the core of much that goes on, and describing these interactions accurately remains an open question with relevance to protocol design, metadata semantics, testing, accountability, authorization, provenance, policy, and a variety of other related questions. It is therefore of interest to consider formalizing such interactions.

As HTTP remains the canonical embodiment of Web interactions, we focus on the semantics of HTTP exchanges, although having done so we'll be able to talk about other situations as well. HTTP is specified by RFC 2616 but is currently under revision ("HTTPbis"). We take the October 26, 2009 draft of the semantics section of HTTPbis as our point of reference, as it resolves a number of sticky issues in RFC 2616 that we would otherwise be forced to address. If important differences emerge as HTTPbis proceeds, we will track them.

While the target resource seems to be the center of attention in HTTP, RFC 2616 says so little about it that formalizing what it does say is essentially trivial. The semantics given here, which is derived from HTTPbis, consists of little more than a single "correspondence" relationship that holds between a "resource" and an "content entity". The relationship varies over time, i.e. a content entity that corresponds to a resource at one time may not correspond to it at another time. It does not take much imagination to rewrite this as a three-place logical relation corresponds_to(E,R,t).

Drilling down on this:

resource is not defined in HTTPbis, but we take it to be adequately accounted for by RFC 3986 "whatever might be identified by a URI". This differs from the RDF sense "anything in the universe of discourse", but the difference is probably inconsequential, so we will not pay attention to it.
content entity is the "entity" syntactic category defined in HTTPbis: a set of entity-headers coupled with an entity-body. Although this is clearly similar to "representation" in AWWW (see below), we avoid using "representation" both because HTTPbis uses "representation" differently from AWWW, and because the properties of representations have been called into question - can they be stored (as opposed to just transmitted)? Should they ever be named? With "content entities" the answers are yes and yes.
corresponds to as we use it comes from the definition of the 200 status code: "an entity corresponding to the requested resource is sent in the response". "Corresponds to" is not defined by the RFC or by HTTPbis and therefore in effect takes whatever meaning it might have [Harry, can you help here?] from whatever practices has evolved around what little the RFC does say this relationship. (For 2616 to have said anything more would have constituted restrictions on how servers are allowed to respond based on some theory of resources, which in the absence of enforcement or social pressure would have been fruitless.) (Thanks to Pat Hayes for liberation.)

It is worth enumerating some of the interesting questions that we will not address here:

The mechanics of the protocol - how agents exchange HTTP messages.
The syntax of HTTP messages (see HTTP Vocabulary in RDF).
Error conditions such as 4xx and 5xx responses.
How servers and clients discover general information about resources (by "nose following" or reading "description resources") thereby forming an idea of their "identity"
Provenance and trust - who said what and whether it should be believed.

The problem of time in RDF

Time is very important in HTTP, both because the spec expends considerable energy developing a theory of caching (validity through time), and because from the point of view of some important applications, questions of change, versioning, and currency are so important.

In order to render the three-place correspondence relation in RDF, we require some treatment of time. RDF itself is completely neutral on this question. There are two approaches, and we'll use both:

Spacetime worm

We can postulate the existence of particular things - we might call them "correspondences" - such that corresponds_to(E,R,t) if and only if there exists a correspondence that is "of" the entity E "to" the entity R "at" time t. A correspondence is of only a single content entity ("of" is functional) and "to" a single resource, but it can endure through time, i.e. it can be "at" multiple times. See wikipedia Four dimensionalism or compare "qualities" in BFO [citation needed].

Different URI interpretations for different times

Alternatively, we can just drop the time variable entirely and write the RDF equivalent of corresponds_to(E,R). If we want to talk about change through time, we simply reinterpret the URIs in our RDF graph(s), i.e. we interpret the URI for the corresponds_to() relation as being corresponds_to(_,_,t1) at one time, corresponds_to(_,_,t2) at another, and so. (To be clear, the graph could include a statement "time has-value DDDDD" communicating that time has a fixed value in any interpretation.) This breaks down if we want to talk about states of affairs at different times that are inconsistent with one another, and there is a real danger that the client of an RDF graph may make mistakes about which time was the one for which this graph was meant to be "about". In some RDF applications time is not an issue, and time-oblivious representations of this kind may be OK.

HTTP exchanges

The primary purpose of an HTTP GET exchange is to enable the communication and deduction of correspondence relationships between resources and entities. Although we don't propose to formalize the exchanges themselves, let's review what happens in an exchange process:

A client wants to learn a correspondence between a resource and an entity.
The client selects a name (a URI) for the resource. The URI need not be an http: URI.
The client composes a GET request, chooses a server (not necessarily one named in the URI), and sends the request to the server.
The client hopes, or expects, that whomever handles the request will recognize the URI as a name for the resource that the client wishes to learn about. This usually goes without saying, but misunderstandings are always a lurking threat.
The server interprets the target-URI in the request as naming some resource, preferably the one the client is asking about.
The server attempts to locate or synthesize an entity that corresponds to the resource.
An appropriate response (200, 304, 307, etc.) is sent, from which the client may be able to infer a correspondence, and/or other information.

Ontology

In the below the "target resource" is what the HTTPbis draft calls the "resource identified by the request-target [URI]" - that is, the resource that the request and response messages are "talking about". We assume that the server and client are talking about the same resource (or if they aren't, that it doesn't matter to either).

Namespaces:

ht:	<http://www.w3.org/2001/tag/awwsw/http#>
rdf:	<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
owl:	<http://www.w3.org/2002/07/owl#>
other:	...

Content entities

ht:ContentEntity: Entity-headers + entity-body; see above. Although for simplicity we assume these use the syntax specified by HTTP, the HTTP protocol is not assumed. For modeling purposes, content entities transmitted by other means could be mapped into this syntax, or else additional members could be admitted to this class, with HttpContentEntity a subclass. TBD. A detailed account might use HTTP Vocabulary in RDF.

Correspondences

To render the framework in RDF, we take the spacetime-worm approach and posit the existence of entities called "correspondences" (see above). The exact nature of correspondences is not important; they are defined by their formal properties.

ht:Correspondence - class: A member of the ht:Correspondence class is the correspondence of a particular content entity to a particular resource over some continuous interval of time. At any given time many correspondences may hold for a given resource (i.e. a resource can simultaneously correspond to many content entities), and many ht:Correspondences may hold for a given ht:ContentEntity (i.e. many resources may correspond to a single content entity).
ht:ofContentEntity (property): Relates a ht:Correspondence to the ht:ContentEntity that does the corresponding. Functional. Domain: ht:Correspondence. Range: ht:ContentEntity.
ht:toResource (property): Relates a ht:Correspondence to the resource to which the ht:Correspondence's ht:ContentEntity corresponds. Functional. Domain: ht:Correspondence.
ht:heldAt (property): Relates a ht:Correspondence to a time at which the correspondence holds. This relation might be inferred, for example, from the Date: or Last-Modified: header of a 200 response to a GET.
ht:holdsUntil (property): Relates a ht:Correspondence to a time just before which the correspondence holds. The end time of the ht:Correspondence is no earlier than this time. This relation might be inferred, for example, from the Expires: header of a 200 response to a GET.

[David: I think the ont would be clearer if the n-ary relations are made more explicit, perhaps with a diagram.]

To assist reasoning we impose a continuity requirement on correspondences: The lifetime of each one must form a continuous interval. The continuity condition enables the following inference rule:

if C ht:heldAt t1, and C ht:holdsAt t3, and t1 < t2 < t3, then
C ht:heldAt t2, and C ht:holdsAt t2.

If a content entity corresponded to a resource at t1 and t3 but not t2 (in between them), there would exist two ht:Correspondences, one whose lifetime contained t1 (but not t2) and the other whose lifetime contained t3 (but not t2).

Applications that needn't keep track of what is true beyond a single point in time might use:

ht:correspondsTo (property): True when a content entity corresponds to a resource. This is a time-sensitive relation. Domain: ht:ContentEntity.

Coincidences

We need a way to account for 300 Multiple Choices, 307 Temporary Redirect, and the . All of these responses authorize the deduction of corresponds_to(E,R,t) from corresponds_to(E,S,t) where S is the "second resource" as named in the HTTP Location: or Content-location: header. It's hard to find a good name for the relation between R and S, so for now we'll call it "coincidence". Thus the premise that is encoded in a 307 response or a Content-Location header that enables the deduction of corresponds_to(E,R,t) is written coincides_with(S,R,t), that is:

If coincides_with(S,R,t) and corresponds_to(E,S,t),
then corresponds_to(E,R,t).

A "coincidence" does not imply delegation from one resource to another; it is a server that delegates to another server (chosen by the client!). But knowledge of the coincidence is what justifies the server in sending the 307 response; it is the server's way of expressing the particular coincidence.

Another way to understand 307 redirects is through time-varying RDF interpretations: the two URIs would be interpreted to be the same resource during the period of coincidence, and as different resources at other times. This approach does not lend itself to reasoning about change in correspondences over time.

Like correspondences, coincidences are time-bound (as expressed by the cache parameters supplied by the server). This means that a correspondence to the target-resource R only holds in the intersection of the lifetime of the S-R coincidence and the lifetime of the correspondence of E to S.

ht:Coincidence (class): As described above, and analogous to Correspondence.
ht:ofTargetResource (property): The resource "receiving" correspondences.
ht:ofSecondResource (property): The resource "donating" correspondences. If the coincidence holds at time t, and C is a correspondence of E to the second resource holding at time t, then there is a correspondence of E to the target resource that holds at time t.
ht:heldAt, ht:holdsUntil (properties): These properties also apply to coincidences.
ht:coincidesWith (property): Time-sensitive.

What this semantics is careful not to say

RFC 2616 says much less than AWWW and some other treatments do.

Entities do not necessarily "come from" resources. Maybe they do, but we don't know that. The server may be the one synthesizing the content entity based on what it knows about the resource, without any physical connection with the resource itself. (Example: Tim's "Platonic" resources, such as Persuasion.) (See also various writings of Pat Hayes relating denotation to access, e.g. this one.)

Content entities are not necessarily "determined by" resources. The bits therein may be derive from the resource, the server, some other information source, or some combination. For example, a server may know how to do an automatic translation between languages or formats, and consult an external dictionary to do so; or it may decorate content with advertising that is unrelated to the resource.

A resource is not "located at" a single server. Different servers can simultaneouly provide correspondences for the same resource. Moreover, the correspondences offered for the same resource at the same time (even same HTTP request) might be different. For example, one server might give English content entities, while another might give French ones.

Resources are not determined extensionally. That is, A and B can have the same correspondences and coincidences, and still be distinct.

Correspondence doesn't mean anything in particular. If servers and clients are to get any meaning from correspondences, they'll have to possess theories beyond what's given here that allow them to draw conclusions from them. That is, it might be the case that some E does not correspond to R, but to determine that you would need to have a theory of "correspondence" for whatever kind of resource was in question.

Falling trees make sounds even if no one hears them. Resources that are unknown to servers, correspondences that are never expressed in HTTP exchanges, content entities that are never transmitted in messages are all possible here, and might even find practical applications in reasoning about systems.

Pathologies of theory aren't rules out. Pathological cases such as a "top" resource (one having a correspondence to every ContentEntity at all times) are not ruled out, although it sure would be nice to be able to do so.

In order to rule out any of these situations, you have to make assumptions beyond those that RFC 2616 makes. One way this might be done would be by defining particular classes of resources that behave in an orderly way.

Interpreting HTTP exchanges

The goal here is a partial interpretation of HTTP exchanges into the above, rendering the interpretation in RDF.

A GET request might be interpreted as a request for information about the resource, and rendered as a SPARQL query; indeed the information needed by a client could in theory be provided by a SPARQL endpoint instead of by HTTP directly ("HTTP over SPARQL").

Response interpretation for GET/200 exchanges

A 200 response to a GET could be rendered as

  [a ht:ContentEntity; 
   ht:correspondsTo <target-URI>;
   {properties of content entity ...}]

in a time-oblivious application, or as

  [a ht:Correspondence;
   ht:ofContentEntity [a ht:ContentEntity; 
                       {properties of content entity ...}];
   ht:toResource <target-URI>;
   {lifetime information}]

in a time-aware application.

"Properties of content entity" could be simply its syntactic properties as described using something like HTTP Vocabulary in RDF.

"Lifetime information" would be as follows:

Date: DDD	C ht:heldAt "conversion of DDD".
Last-modified: DDD	C ht:heldAt "conversion of DDD".
Expires: DDD	C ht:holdsUntil "conversion of DDD".

Some care needs to be taken in the interpretation of dates. If server times are to be given any meaning other than as ordering events on that particular server, preparations for clock skew need to be made, such as corrections to server-reported time to produce a time you think is adequately correct.

Response interpretation rules for GET/30x exchanges (including 301, 304, 307)

RFC 2616 seems to suggest that 301 means that a new URI has been given to the resource. It is not clear just how far this can be taken - perhaps there are two resources that are indistinguishable now, but might have been distinguishable in the past. Some caution should be exercised. We might record a 301 interaction with an owl:sameAs assertion.

A 307 response can be interpreted to be an assertion of a "coincidence" (above). The cache parameters of the 307 response provide the lifetime of the coincidence.

A 304 response is very similar to a 200 response, except that the content entity is one returned in a previous 200 response, not the one carried by the 304 response.

Rules for PUT, DELETE, etc.

A PUT/200 exchange tells you that the content entity in the request corresponds to the resource - or at least it did at the time of the exchange. PUT/201 tells you that the resource did not exist before the exchange.

A DELETE/200 exchange tells you that the resource does not exist after the exchange.

Resource existence is not modeled here, but it could be.

Applications of the ontology

Although there is very little of substance in the above, it still provides some structure for analyzing and comparing other theories of web interactions.

REST

Fielding offers us REST as a software architecture. Not coincidentally (as REST was developed in part to aid in the revision of the HTTP RFC), REST maps directly into our theory: the "temporally varying membership function" that he uses to model "resource" is, at a given time t, membership in the set of content entities E for which corresponds_to(E,R,t).

Although his formal model implies that the membership function is all there is to a resource, it is clear that he also attaches additional semantics to a resource, which means that contrary to formal model there could be two resources with the same temporally varying membership function that are in fact not the same resource. It is not at all clear whether in his theory there is anything that is not a resource, but this is an ontological question that one would not expect to be addressed by a software architecture.

AWWW

"Representation" in AWWW, defined as "data that encodes information about resource state", is similar to "content entity" as used here, but makes a stronger ontological commitment; it says that there are resources that have state, and that a representation encodes such state. HTTP makes no such commitment; it only says that a content entity "corresponds" to the resource (at some time), without elaborating.

To treat the ontology implied by AWWW pedantically, one would have to a class of AWWW-representations (perhaps: E is an AWWW-representation if E is a content entity and there exists a resource R such that E encodes some state of R) and a class of AWWW-representation-enabled-resources (perhaps: those that have state that can be encoded). It is not clear how one would decide whether a resource has state (numbers don't, but people do?) or even whether the definition holds up to scrutiny (Persuasion by Jane Austen has representations, but what/where is its state?).

Genont (TimBL)

There are two documents, the "design issue" memo and an RDF file. (Both of these change from time to time, so the comments below may become stale.)

We can model static aspects of this account as a "generalizes" relationship on resources:

R generalizes S iff for all t, corresponds_to(E,S,t) implies corresponds_to(E,R,t)

Suppose that R is example 2 from the memo: The Bible, King James Version, and S is example 3: The Bible, KJV, in English. R generalizes S because every content entity corresponding to S also corresponds to R (if it is a KJV in English, it's a KJV).

This idea does not necessarily generalize to the relationship between a time-varying document and a particular time-invariant document than which it is supposed to more generic. Suppose that R is The Wall Street Journal (which varies with time) and S is a particular issue of it. Now I have not really told you very clearly what R is supposed to be. If R really generalizes S then we would expect R to generalize all other issues of S', since R has no particular relationship to S. However, if I told you that a given URI named the Wall Street Journal, then I bet you'd be disappointed if a web cache gave you an issue of the WSJ that was several years out of date. R defined casually as "The Wall Street Journal" is more likely to mean something to which only content entities that are current at time t correspond at time t.

Conversely, if I told you on Tuesday that S was Tuesday's issue of the WSJ, and I cached a content entity E corresponding to it, I would be confused by any statement (from you or anyone else) on Thursday that E did not correspond to S. E should not stop corresponding to S just because time has passed.

Instead we suggest modeling the time-specific / time-varying relationship not as genericity but as versioning:

S is a snapshot of R at time t0 iff
for any t,
corresponds_to(E,S,t) implies
corresponds_to(E,R,t0)

This better captures the relationship between, say, http://www.w3.org/TR/2003/PR-rdf-mt-20031215/ and http://www.w3.org/TR/rdf-mt/: the former is a snapshot of the latter. We can say what it means for something to be a snapshot (time invariant):

S is time invariant iff
for any t1 and t2,
corresponds_to(E,S,t1) implies
corresponds_to(E,S,t2)

"Fixed resource" might be defined

R is a fixed resource iff R is time invariant and
there is a unique E such that
corresponds_to(E,R,t) [for some/all t]

httpRange-14

The httpRange-14 advice does two things. First, it interprets the http: scheme registration in RFC 2616 on a point that is mooted by the October 26 HTTPbis draft, namely whether there it is OK to use http: URIs to "identify" arbitrary things. We do not need to be concerned about this. Second, it asks the community to use the HTTP protocol in a particular way: servers and clients are asked to treat 200 responses as meaning that the target resource is an "information resource".

The purpose is to help prevent miscommunication. Consider the following scenario. A server wants to use a URI that it has minted to refer to Alice. A client issues a GET request, not knowing what the resource is. The server says that some content entity E corresponds to Alice.

How does the client get confused? Suppose that E says (carries the information) that Alice lives next door to Bill, but that Alice doesn't know this. The server would say that because E carries information about Alice, E corresponds to Alice. But it is common practice to take content entities as speaking for the resources to which they correspond. (See Lampson et al's authorization calculus for an explanation of "speaks for".) The client may therefore conclude that Alice says Alice lives next door to Bill. But Alice wouldn't say this, since Alice does not know this to be true.

The httpRange-14 rule seeks to prevent this kind of confusion by asking the community to restrict use of "corresponds to" statements to "information resources". In the above, Alice would not be a communication resource, so the server would not have said she corresponded to anything.

But the rule does not achieve its purpose when the resource is an "information resource". For example, it is tempting to use an existing URI from an established naming system or catalog, such as Crossref or Amazon, as the name of the thing named in the naming system or described in the catalog, which in these cases might be a journal article or book for sale. But when we do this - and it's not ruled out by the httpRange-14 rule - an HTTP interaction tells you that the content entity carried in a 200, which clearly does not always speak for the resource, corresponds to the resource. So the client may form a very different idea of what resource is being discussed.

(This is not a squatting issue as the publisher itself may be the one using the URI in this way.)

The Alice example suggests a different rule, namely that the use of "corresponds to" (i.e. 200) should be limited to cases where the content entity speaks for the resource. That is, "corresponds to" should be taken to be a subproperty of "speaks for". (Equivalently: define "information resource" as a resource with the property that if a content entity E corresponds to it then E speaks for it.)

"Speaks for" may not be not strong enough, as an article's abstract "speaks for" the article but isn't really an adequate substitute or spokesperson for it - its omissions may be taken as significant ("the resource does not say ..."). The connection between the two has to be deeper, such as "speaks adequately for, given communication channel constraints".

Some may object to information-like entities in an active role such as "speaking", but the usage is common (what did the sign say?), and theoretically justified (e.g. Lampson).

A different approach would be to instruct clients not to draw conclusions about what a resource says from what a corresponding content entity says. Certainly such conclusions are not licensed by HTTP. But without the ability to draw evidence about a resource from its corresponding content entities, it becomes difficult for observers to say much of anything about the resource, in the absence of external evidence (such as independent communication of the properties of the resources). While "a content entity corresponding to R mentioned the Louvre" is certainly better justified than "R mentioned the Louvre", it has much higher cognitive overhead, and it is unclear whether the practice would be either desirable or successful.

First-party metadata

We can interpret 303 responses as saying that the target resource is describedBy the second resource. The relationship is already defined by POWDER.

Like all redirects, 303 responses are subject to cache control, so the second resource might only be "about" the target resource for a limited period of time.

(Fuller discussion of 303, LRDD, Link: header, .well-known, describedBy, POWDER, is-about, tdb:, duri:, is interesting but out of scope...)

The 'data' URI scheme

As an example relating the framework to another protocol, consider data: URIs. Whatever resource a data: URI names, it's got to have the property that an content-entity with the indicated entity-body and content-type corresponds to the resource at all times. Moreover, we know that any content entity corresponding to it will have the same entity-body and content-type. Of course, the words "identify", "representation", "resource", "correspond", and so on do not occur in RFC 2397, so we can't say which resources are meant; but given the way that everyone (especially browsers) has understood the specification, it seems pretty clear that whatever they are, they have to satisfy these constraints.

There's a valuable idea in REST that is in danger of being lost: that a resource has "semantics" and that the purpose of it all is hyperlinking. The resource has to be defined by a set of reasonable expectations about the resource that are communicated by some side channel such as the pcdata content in an anchor element. Otherwise why would you bookmark it, or send it to your friend, or use its URI to talk about it?

[More TBD]

Appendix: HTTP response status code analysis

We examined all the HTTP response status codes. Other than those listed below, we ascertained that none of them communicate anything about the resource itself; they all reflect phenomena related to the protocol and the server.

Reference: HTPPbis semantics

200 OK	correspondence
201 Created	lifetime
203 Non-Authoritative Information	(not sure, think about this)
204 No Content	(needs some thought)
206 Partial Content	(how to represent?)
300 Multiple Choices	yes
301 Moved Permanently	two URIs for same resource
302 Found	same as 307
303 See Other	description
304 Not Modified	correspondence
307 Temporary Redirect	time-bounded coincidence
409 Conflict	(interesting. needs some thought)

Acknowledgments

Pat Hayes provided the key insight that permitted this project to move forward.

Alan Ruttenberg, Noah Medelsohn, Stuart Williams, Harry Halpin, Henry Thompson, Tim Berners-Lee, [who else?] all contributed to this project.

Fodder

Memento: Time Travel for the Web (Library of Congress) talks about the time dimension of the resource/representation world.

Semantical statements about resources and correspondences have nothing to do with whether or how the entities are named, so the role of URIs is immaterial.

References

RFC 2616, RFC 2616 bis, HTTP IN RDF, Webarch, RFC 3986,

Leonard Richardson, Sam Ruby. RESTful Web Services (see Appendix for status codes)