The TAG Member's Guide to ISSUE-57 Discussion

Jonathan Rees, 27 March 2012

This territory is like a pinball machine - hitting any part of the problem always leads quickly to other parts, in a very confusing way. It is important that we have a clear view of the structure of the problem, which is complex. Here are some points I would like us to be in agreement on - let's go through them carefully.

The proximal goal is to close TAG ISSUE-57 in some reasonable way.

I assume you have read the issue description and know what has been asked of us.

The community has asked us for help, the problem is in the TAG's court, and the TAG has to either solve the problem or get the heck out of the way so someone else is empowered to solve the problem.

In doing so it is possible we may decide to hold the resolution of the issue hostage to new work. For example, we might decide that we can't address ISSUE-57 unless we answer some broader questions about web architecture or linked data architecture. We should be explicit in justifying decisions that might delay resolution of the issue.

We have reasons to try to get out of the way sooner rather than later: lack of time among TAG members, lack of interest among TAG members, perhaps lack of consensus in the TAG.

On the other hand, it is not obvious that if the TAG dropped the ball, that anyone else would pick it up. In any case I think "dropping the ball" has to be done in conjunction with issuing at least some minimal statement on the matter.

The statement of ISSUE-57 assumes basic understanding and acceptance of linked data architecture.

ISSUE-57 as stated is not asking for a reexamination of the way linked data abstraction works for hash URIs or 303 URIs. It is asking for incremental change.

We can accept the way linked data works, with all of its faults (which we are not being asked to address), and proceed to address the problem on its own terms - as a mechanical question about performance and deployment, not semantics. Or we can decide to reexamine the fundamentals, and question the entire approach.

We could take time to review how linked data actually works in practice, or how it might work in an ideal world. We could talk about what meaning, reference, and "identification" are, where it comes from, how it is coordinated, context sensitivity, the ideal of a global namespace, how RDF works, and so on. This would take quite a bit of time.

So far we have assumed this is a problem of Web architecture, not of the design of RDF.

The justification is that (1) "identification" is a central idea in Web architecture, (2) URI reference in RDF should be harmonious with "identification" in webarch, (3) straightening out "identification" issues is the TAG's job.

We could take time to debate this. I think this point is not obvious and adequate discussion could take hours. We may have to decide based on preference or taste, rather than reason.

If ISSUE-57 is about RDF, and not Web architecture, then we should move its consideration out of the TAG. It would be really nice to be able to say goodbye to ISSUE-57 by putting it on someone else's plate. Maybe the TAG, by putting itself in a position of authority, is only being an impediment to progress on issues the linked data community wants to take care of itself.

We should be clear on what the social contract around any agreement would be.

We are trying to set expectations among those writing RDF and those reading RDF (and any declarative language that uses URIs in a similar way). These expectations will never be universally met, so this is not an empirical or descriptive exercise, it is prescriptive. The question is what someone is "empowered" to do when expectations are not met - whether we add our weight (whatever that might be) to requests to meet the expectation. If we (or W3C, if through review it agrees with us) say something is "good practice" or "in spec" we're in effect saying that it's socially OK (with the TAG or W3C) for someone to ask a party at variance to change their behavior.

In any case it's obvious that RDF you find in the wild won't necessarily conform, no matter what advice is given. Those writing RDF have plausible deniability: I never agreed to that agreement, so why are you complaining to me?

This present situation is especially tricky because there is no place to put anything like a "version indicator" that would signal an intent to conform to new advice (as there was none for our previous advice). A version indicator would give additional "ammunition" for those noticing variance to exert social pressure since it is a statement of an intent to conform.

Nevertheless it would be useful to give a name to any kind of recommendation or advice (supposing someone issues one). Then people can replace "what you're doing is bad practice" with "what you're doing doesn't conform to recommendation ZWX-9931" which at least will be objectively true. A name is also useful so that statements of intended conformance can be placed out of band.

Proposals addressing ISSUE-57 are of several different kinds.

I'll talk generically here in reference both to change proposals submitted in response to the call and other proposals that have been discussed.

Those that do not address ISSUE-57.
Those that are not about hashless http: URIs.

For example, proposals to create a new URI scheme, or proposals that using hashless http: URIs in RDF is not a good idea.

We need to consider these ideas, but my guess is that there will not be TAG agreement to pursue any of them.

We probably ought to prepare a thoughtful response to any change proposal(s) along these lines (either a sunk cost argument, or a reinforcement of the linked data principles, or a hung jury).
Proposals that are compatible with httpRange-14(a).

Although these approaches (MGET, 209, .well-known, "punning", etc.) have been discussed over the years, I don't think any such change proposal was submitted. I found this interesting.
Proposals that give new guidance in the retrieval (200) case.

This is the case I want to look at more closely since it has received the most attention.

Before we proceed further we have to agree to retract or modify httpRange-14(a).

The phrasing of httpRange-14(a) has caused untold suffering. People hyperfocus on the "what is an information resource and why would it matter" question.

Can we agree that this is the wrong question to ask? The right question is, is the retrieved representation content of the referenced resource, or a description of it? (Or maybe both, or neither.)

I use the word "instance" instead of "content" but the idea is the same. It's not that the resource is the content, it's that there is some similarity or generic/specific or document-representation relationship between the two which is very different from a 'describes' relationship.

The TAG needs to make this change in public, prominently, or people will just keep bickering pointlessly about it. But how we change it interacts with our attitude toward the 200-based solutions to ISSUE-57, so we need to look at that next.

How a URI is interpreted constrains how it is to be used.

Any design for the use of 200 (or more abstractly: "retrieval-enabled URIs") needs to tell agents how to interpret a GET response, as ostention or as description (or both or neither or unspecified).

Those generating RDF are then implicitly advised to write whatever will be interpreted as they intend, according to the agreed design. If a particular hashless http: URI doesn't express what they want to say, they write something else - a different URI, a blank node (see what I said about a "hasInstanceUri" predicate), or whatever.

ISSUE-57 is often couched as a problem for those publishing linked data, but if what is written can't be understood that's of little help.

So the job of any ISSUE-57 design that involves 200 responses is to actionably answer the question: for a given hashless URI and HTTP 200 response (to a GET of that URI), is the response (by agreement) related to the identified resource as

content (ostention)
description
both
neither
the answer is not to be specified by the agreement.

I would like for those submitting change proposals to address this clearly. That I didn't ask this explicitly in the first place was probably a mistake.

It is easy to write proposals that do not answer this question in actionable form, i.e. without requiring case by case human judgment. For example, one could answer that the response is a description if it looks like description, or if it contains RDF, or if it talks about what the URI refers to, and that it is mere content in other cases. These criteria are not actionable, and therefore do not help much in preventing miscommunication (i.e. in promoting communication).

httpRange-14(a) as stated did not do the trick, as the Flickr example shows, but its stronger variant, that what you get is always content, is actionable.

An example of an actionable criterion would be: if the response's media type is application/rdf+xml, then it is description, otherwise it is content. Whether this is a good rule or not is another question, but at least it is actionable.

The foundations of httpRange-14(a) have always been shaky.

The question of whether the httpRange-14(a) rule - or rather the stronger proposition that a retrieved representation is content/instance of the identified resource - is or was justified, may never be settled. It appears to be either valid or unproblematic for many URIs, and it is useful when writing RDF. It may be what some of the designers of the Web and Web architecture intended. It is clearly what the Resource Description Framework originally intended. And it may be what the 2005 TAG wanted. But whether it logically follows from RFC 2616 or 3986, or even from AWWW, is not clear, and probably never will be. It is shocking, given the centrality of "identification" in the architecture, that none of the canonical specifications tell you anything about what the identified resource is or what its properties are. Advancing the idea through consensus process (for some class of hashless URIs) might gain it broader acceptance, but then it becomes a new thing to be opted into.

Roy Fielding is the major theorist of "representation", but I have had a hard time understanding his attitude. I've tried to push him into taking a stand on whether he means for "representation" to be a term of art, or just an ordinary language word, and this could bear on whether there is something special about things that are identified by hashless http: URIs, according to HTTPbis. If "representation" is by fiat (as opposed to by ordinary-language), as in one obscure interpretation HTTPbis, then there could actually be something to this "information resource" business, since only special kinds of things would have representations by fiat, but this is very much a long shot. I await his response.

One suggestion for resolving the issue.

In the absence of an actionable consensus rule, or in the presence of nonconformance that can't be made to go away, there ought to be a way for someone who writes RDF to make their meaning clear. This is the idea behind the 'hasInstanceUri' property that lets you say that you're using the URI in the representations-are-content sense (opt-in); it can also be used to talk about Web content without using a URI referentially (in <>) at all.

There could be a parallel predicate saying that a URI or term refers to what is described (opt-in the other way).

My two cents: If the TAG agrees with the way I have laid out the problem, I think it should just say so and let some other group hammer out the details. If it's just about a rule classifying 200 responses as description or content to the taste of the linked data community, then the problem no longer seems like "architecture" and the TAG may not need to have much of a role. I am open to other suggestions, but this approach has the advantage of expediency (for us at least).

Appendix: Let's not get stuck in any of the same old tangles.

Avoid the word "resource" as it raises unanwerable questions. TBL and JAR prefer "thing" because in RDF (as in logic) a term can refer to anything, and whether something exists is unrelated to whether or how we talk about it (URIs). If we use the word "resource" we'll just have to argue, probably fruitlessly, about what it means.

Don't argue about whether X "is a representation of" Y. There is just no way to answer that question. It's pretty safe to talk about "retrieved representations", and we can repeat RFC 2616's line about GET U [nominally] yielding a representation of what U identifies, as long as we don't read any meaning into it that we don't find in 2616. TimBL has proposed "content" and JAR has proposed "instance" for the more specific relationship between a document, image, etc. (in the abstract) and the bits you get when GET.

Just give up on "information resource". Repeated attempts to clarify this have failed, and the audience has not been forgiving. To the extent we need something in its place we might consider "generic resource" or a new term.

Be very careful about "X identifies Y". "Reference" is better understood - it has a good philosophical pedigree, it is clearly a social construction, it is clearly context dependent, and it relates to what a speaker intends and what a listener understands. There is not as good a theory of "identification" in Web architecture. "Identification" in Web architecture seems to be related to the ideal of bringing about a global namespace. If "identification" is seen as invariant across context, it seems to assume that global consistency has been achieved, which is not evident.

Appendix: Remedial RDF for TAG members.

Part of what is confusing about RDF is that the same notation has been put to several different purposes. It has been used as a Resource Description Format, as a so-called "knowledge representation" language, and as a data format.

RDF originally evolved out of PICS (which itself evolved into POWDER), and it was meant to be used to describe Web resources, i.e for metadata - thus Resource Description Format. Early RDF statements had URIs as subjects, and they said things about the content retrieved from those URIs.

RDF as it came to be specified is a form of first-order logic, and therefore a language for expressing propositions ("representing knowledge"). Natural languages are also for expressing propositions (among other things), so the two bear many similarities. For example, parts of RDF statements refer to things, just as phrases in natural language statements refer to things. RDF can be precise or imprecise, true or false (or neither), understood or not understood, just as natural language propositions can be. There is inference in RDF just as there is inference in natural language. In both cases meaning is a social construction and therefore depends on context (including time). We might attempt to talk objectively about whether some phrase refers to some thing, but such claims have to be understood as statements about how some population of communicators behaves.

The use of URIs as logical symbols in RDF was meant to align RDF with the "identification" idea in the contemporaneous URI specification. But not everything in an "RDF universe" has to have an understood URI, just as not everything in the world has a devoted term or proper name.

Sometimes people use the word "concept" in talking about RDF. Just as not all phrases in natural language refer to "concepts" - some refer to cars, or earthquakes, or relationships - similarly, not all phrases in RDF refer to "concepts". In fact they rarely do - it is much more common to talk about Paris, than the concept of Paris (whatever that might be). We have ideas about what the properties of Paris are; the properties of concepts are much more mysterious. Try to get the word "concept" out of your mind. Also try not to confuse expression (such as a statement or URI) with what is expressed (such as a proposition or the referent of a phrase).

There is another view, that RDF is just another data format that can be used as an alternative to, say, XML. It dispenses with the "knowledge representation" pretension and "reference" to arbitrary things. There is just data and it is linked. This view is popular because very few people understand or care about language and logic well enough to be able to get any value out of the knowledge representation formulation.

The data format view does not make semantics go away, it just makes it harder to figure out.