AWWSW brainstorm on possible topics for group to take up

We request that you not edit this page unless you have agreed to the terms of the W3C patent policy http://www.w3.org/Consortium/Patent-Policy-20040205/.

This page gives a list of potential problems and topics for discussion by the AWWSW group. Everyone at the first telecon was wary of scope creep, so it was felt that a survey of potential tarpits should be conducted so that we are better able to maintain our guard. Our actual work will be restricted to some subset of these items.

HTTP semantics

Clarify web architecture around what is (or should be) implied by HTTP responses.

What can an agent infer from a 200 response?
What can an agent infer from a 303 response?
What can we infer from particular response headers?
many many other questions of this kind.

It is desirable to capture the answers to these questions formally, and in particular as RDF statements. To this end we will need an ontology (possible starting point: http://www.w3.org/2006/gen/ont) for expressing such statements.

Clarify the nature of this clarification activity. Is it

descriptive (intended to interpret HTTP 2616 but not build on it),
prescriptive (intended to clarify what is "good practice," a la AWWW), or
prescriptive (intended to establish new but compatible practices - the specification of a named protocol "layer" on top of HTTP)

This idea of layered protocol (which I think Noah and Jonathan both subscribe to) is a bit subtle. The httpRange-14 resolution is an example of a very small but definite protocol layer that is completely compatible with HTTP. It says that an agent can infer, from a 200 response, something particular about the nature of the resource. Without the resolution, the inference is not possible, so it is not a descriptive interpretation of RFC 2616. In practice one speaks of servers adhering to RFC 2616 that do, or do not, also adhere to the httpRange-14 resolution.

As I see it, the difference between 2 and 3 above is not in what we say, but in how it is presented: as an unnamed set of "shoulds" (e.g. server should avoid 200 for non-IRs) and "cans" (client can infer IR from 200) vs. a new communication-game that we would like to enjoy. For case 3 the new game should have a new name to make it easier to talk about cases where it's being played from those where it's not.

The changes we're talking about may be a combination of fully compatible changes (like new response codes), which would make relate the new protocol to HTTP/1.1 in a manner similar to the relation of HTTP/1.1 to HTTP/1.0, and possibly incompatible changes, such as specifying cases in which the class of 200 responses generated by a server for a URI could not possibly be representations of a single resource and would therefore be incorrect (or not "best practice"). (Supposing we wanted to do this, and it's not obvious that we do. This is just a for-instance.)

dbooth comments: I think #3.  We need to define architectural conventions for the Semantic Web that build upon the old-fashioned Web architecture, preferable in a fully compatible way: the Semantic Web needs to understand the old-fashioned Web, but not vice versa.

Resources and representations

Clarify the relationship between a resource and its representations (HTTP responses).

For a resource, what is a satisfactory representation? Can it be anything? If one representation is a photo, another perhaps shouldn't be a cartoon, but a lossy photo might be acceptable.

Are a resource's representations sufficient to figure out what the resource is - do they define the resource? What do 200 responses tell you about the resource, if anything?

Is http://news.google.com/ an information resource? If so, then its representations are representations of what?

Is there a difference between an information resource and its essence? Between its essence and its representations? What is the ontological type of an essence, what is its identity, and what are the operations on it?

 dbooth comments: I think the current WebArch definition of "information resource" is just plain wrong and needs to be fixed.  I think fixing that is an important part of what needs to be done.  The defining characteristic of an information resource is that it has the potential to give 200 responses, period.  Whether those responses convey the state of the resource, and whether the resource's essential characteristics can be conveyed in a message, are secondary matters.

How does Content-location: relate to representations?

Giving teeth to "web architecture"

How can you write a program (validator) to determine whether a web site is not following "web architecture"?

 dbooth comments: This will not be possible in general, because some aspects of the architecture involves real world intent.

Just using HTTP URIs

Clarify the argument that you can do everything with HTTP URIs.

There's a tag issue and finding saying "just use HTTP". So in scope for this group would be explaining and embellishing how to use HTTP. This may help in the struggle to explain why LSID and other [schemes] are unnecessary.

 dbooth comments: Not sure what more we should say about this.

Metadata

There's a need to be able to obtain metadata about a data source (similar to "getMetadata" in the LSID protocol). Maybe write this up and liaise with the HTTP WG.

 dbooth comments: Can you give an example of a use case for this?

E.g. How do you know how many representations there are (or will be) for a resource? Should there be a way?

 dbooth comments: Again, how about a use case?

Other issues

The specification of 303 See Other is not necessarily precise enough for the semantic web use case: It would be nice if we could at least expect RDF, and maybe specific kinds of information.

 dbooth comments: I agree with Stuart's comment that the content type should not be tied to the response code.

Location independence: What happens when a resource moves and the community wants to do something about it (issue a "third-party redirect")?

 dbooth comments: I think this issue is a bit overblown.  I think it is far more important for the WebArch to make the chain of authority clear, from a URI to its declaration, than to try to prescribe what should be done if it breaks down.  We might say something, but it shouldn't be very much.

Another issue is what to say about time and RDF. This keeps coming up. (Timbl: "architecture doesn't have time; new model of time is out of scope; but HTTP has its own notion of time.")

What's the web analog for doing citation? (I.e. how to cite articles in published literature in such a way that we can tell when two RDF documents are citing the same article. Problems: common names, stability, third-party metadata.)

 dbooth comments: This seems to be more important to HCLS than I previously realized.  I think there are a variety of ways it can be done, and I'm not sure we are ready to endorse only one over others.  However, I think it would be good to explore the problem a bit.

Possible work products

Any output of this group is intended to be fed back to the TAG or other groups in order to inform or guide further action.

HTTP semantics ontology
List of problems that need to be solved, missing functionality, possible implementations
List of things needing better exposition
Set of best practices, to be folded into web architecture as a TAG finding
FAQ on web architecture and/or HTTP semantics

 dbooth comments: I think an HTTP semantics ontology should be our starting point, so that we can be clear in our discussions.

This page was initially prepared in order to satisfy http://www.w3.org/2007/11/13-awwsw-minutes.html#action02 .