Joint meeting of W3C HCLSIG and TAG -- 5 Nov 2007

Introduction to HCLS and topics of interest

JR: (Jonathan Rees) We are interested in HTTP Semantics, URI Resolution, meaning how do you get a definition etc. from a URI. By the way, the URI scheme need not be http. Is resolution deterministic?

SW: Are you interested in "information resources", "non-information resources", what's the difference, what are the definitions? etc.

JR: Yes, interested.

<DanC_lap> (aha... on the hcls list today... looking...)

<eneumann> http://sw.neurocommons.org/2007/uri-note/

We are discussing http://sw.neurocommons.org/2007/uri-note/

<DanC_lap> ah... "sustainable" is an interesting keyword

<DanC_lap> W3C Semantic Web Health Care and Life Sciences Interest Group

JR: Introducing HCLS, Science Commmons:

TBL: More than FOAF, DBpedia?

JR: Not sure. Was true earlier. We want to figure out how to make the Semantic Web work.

EN: (Eric Neumann) There's a lot of life sciences data already out there on the Web, a lot of standard terminology, but only limited use of Semantic Web and URI. We're trying in this note to get people to realize what's possible, and then to use it. A lot of groups are waiting to see what we come up with.

<DanC_lap> ("insurance policy" another acknowledgement of the social side. good.)

JR: I want to do this now in part to give people an insurance policy that all of this stuff will really work and scale as massive deployment happens. Need a path starting from where we are. Before HCLS there was I3C and LSID.

<DanC_lap> (whoa... groundhog day... the 1st IETF WG on URIs was called IIIR I'm pretty sure, with the III expanding much like the I3C)

JR: LSID's have caught on mainly in smaller communities. The LSID work is important because it solves, or at least claims to solve, 5 problems in this space. So, that sets the bar in terms of peoples' expectations for what we need to do in a more purely http-based approach. We have a database of 300 million triples.

AR: (Alan Ruttenberg) that's jointly hosted by HCLS and Science Commons.

TBL: How are the conversions done? One-time?

JR: No, it's scripted. Intended to be sustainable and repeatable. Some of this was demonstrated at the Banff Web Conference in May. Example: this plasmid was used in the research that's behind this research article. There are actually several facts in there, and we want them all to be available for query.

<eneumann> http://geneontology.org/

JR: We want RDF to be in published papers, as an alternative or supporting means of representing information.

AR: The goal is eventually that all scientific knowledge is available at your fingertips. Semantic Web is the only promising technology for doing it. It's a crazy, ambitious goal.

TBL: Is it linked data?

AR: I don't conceptualize this as data, but rather as statements: things that are true, things that have been tried. My goal is for it to work in the tabulator, indeed sometimes better than tabulator itself can handle it now.

TBL: It's not linked data?

AR: Heading there, but some short term issues. For example, our 303's don't currently redirect to RDF.

TBL: SPARQL?

AR: Yes, in fact focussing mainly on SPARQL. Our focus differs from tabulator's to some degree in our focus on scientific statements and the rigor with which they're made. Somewhat the difference between browsing and sending out an agent to, e.g. choose a medication for me. Much of the criticism or analysis you've seen for me is motivated by the need for a greater level of accuracy than is required in browsing scenarios. So accuracy of statements is very important.

JR: We recognize that there is a range of use cases for RDF. The spectrum runs from quick n dirty, which needs to get out quickly and may not be completely clean, to supplementary material on published articles, which needs to be very accurate. Similarly, the different consuming applications have a range of requirements for accuracy. We've focussed on a lot on where we see Semantic Web currently as being weakest.

<Zakim> ericP, you wanted to mention abstraction for lsids

<ericP> (resolution?) abstractions:

<ericP> protocol abstraction for LSIDs

<ericP> protocol injections for fallback resolution for b0rken links

EP: (Eric Prudhommeaux) We have a couple of motivational abstractions in this paper. One is about protocols, discussing things like resolution. We also have protocol injections for fallback on how to resolve things where links are broken.

<alanr> "b0rken" intentional?

DC: You've told us a lot that I mostly agree with; are there questions?

DO: I'm really interested in questions of versioning.

AR: It's been a sore point. Everyone's wrestling with it. I could tell you what Open Biomedical Ontologies (OBO) is doing.

DO: We've been doing a 3 part draft on versioning in the TAG. Not done, but major pieces have been subject to repeated review.

AR: It would be interesting to sit down with you and go through use cases.

DO: Yes. The first document is a terminology document, the second focusses on forwards compatibility and what you have to do from the start to enable forwards compatibility.

AR: One of the first goals is to be able to say something about stability. E.g., I'd like to be able to say things like "this will stay the same forever"

DO: The same?

AR: That's what you'd get to define. You'd be able to say what you mean by "the same". For example, part of the LSID contract is that the data portion stays the same forever. I'd like to be able to say that in RDF.

SW: The data portion?

<timbl> http://www.w3.org/2006/gen/ont#FixedResource

JR: That which is identified stays the same.

AR: The metadata can change.

<DanC_lap> (I wrote an N3 specification of this "same bits forever" thing he's talking about...)

<timbl> collapsefixed resource

<timbl> Type expandloadedClass

<timbl> Comment A resource whose representation type and content will not change under any circumstances

AR: I'd like to be able to say things like "This is a demo. These URIs are only guaranteed stable for 6 months."

<DanC_lap> "these are stable for the next 6 months, but after that, we're not promising anything" <- interesting use case

<dorchard> Draft TAG versioning terminology: http://www.w3.org/2001/tag/doc/versioning

SW: Some of this is way above HTTP?

AR: You are saying things about resources accessed through HTTP.

<dorchard> Draft TAG versioning on forwards compatibility: http://www.w3.org/2001/tag/doc/versioning-compatibility-strategies

SW: Do you need to find all this from HTTP headers?

AR: You need to find it all, but how is TBD. The Architecture Group may need to figure that out.

DO: The third part of the draft TAG finding is about evolution of XML-based languages. The 2nd related topic I wanted to ask about, and I know David Booth has looked at this, is about when and why to use http-scheme URIs. Henry Thompson and I have been working on that together. Your document says "use http URIs", but didn't see link to our draft. Curious as to why?

AR: I want to use http, but honestly am somewhat nervous about some of the issues.

Several: We'd like to hear about that.

AR: An example of an issue is what to do when an http link "breaks". We think there's a community interest in restoring links even when the provider goes away. We've been working on standard rewrite rules for URIs to do this. We're getting concerns from LSID folks who say that the location independence they provide makes this less of a problem. We need metadata in lots of cases, but we're not sure you want 303 in all cases?

TBL: Do you mean 303 to something that would then go to 200?

<eneumann> Need to first be clear what LS means by 'metadata'

AR: I thought you didn't want everything to do 303, and that some people were advocating "#"

<jar> something like LSID that has both GET and GETMETA

<DanC_lap> (the 'doppleganger' is the magic of Web Architecture, to me)

AR: Third issue is what we call the "doppelganger" issue, which has to do with the dual role of the URI in "denoting" something and providing for access to it.

<dorchard> Henry and I are working on the URNsRegistries, and so any feedback the document and the rationale embodied in the document is very welcome.

EN: Every group needs to be clear on what they mean by metadata. Typically in life sciences, metadata is that which is not explicitly part of "that data record". Annotations are often viewed as metadata. That's to some degree historical. There tends to be a notion that there's a flexible part, that changes, and it sits next to a core part that needs to be understood from the start to be immutable.

AR: You are 1) denoting a thing and 2) getting information about it.

TBL: Thing? You can't publish "Eric" the person.

AR: I guess we mean the minter of the URI.

<Zakim> eneumann, you wanted to illustrate life sci metadata

<Zakim> Noah, you wanted to ask about tradeoffs in making all this machine readable

<jar> oops. not an irc pro. sorry zakim

<timbl> Sometime an advantage of doing this is for agents t be able to help more .... to what extent did you expect htat t happen (?)

<timbl> -- Noah

NM: Curious about the cost/benefit of formalizing things like "my URI is going to get stale in 6 months because it was only for a demo". Will people write automated agents that will act on this?

AR: Well, the LSID community finds the statement of immutability important.

NM: That's immutability.

<DanC_lap> (it's also called 'policy', I suspect; which should remind us that W3C just issued a WS-policy REC)

AR: And we've heard Cool URIs Don't Change. Having a way to say "I broke that rule" seems useful.

TBL: We in the consortium and the IETF talk a lot about protocols. When you talk about contracts, I think you mean something similar.

<jar> protocol = rules governing the way you talk (in some situation)

AR: I think so. There's a difference of emphasis. Historically, protocols have been somewhat rigid and unextendable. You can't add a verb to HTTP.

Several: Sure you can.

TBL: It's perfectly reasonable to use ontologies to set out protocols, typically at a higher level than HTTP.

JR: There's a very close connection between definitions and protocols.

<DanC_lap> (indeed, RDF ontologies in the web are easier to update in a distributed way than centralized specs like HTTP, of course.)

JR: The protocol tells you what to do.

<ht> http://www.ietf.org/internet-drafts/draft-duerst-iri-bis-01.txt

TBL: And the outcome. Bathroom protocol is "always use the smallest toilet paper roll, and you'll never run out of paper!"

AR: You should have something at the end of a URI.

NM: It's good practice.

<DanC_lap> ack

<Zakim> DanC_lap, you wanted to ask about this "community interest" in recovering from 404; sounds like eminent domain and to take issue with "all cases", given what Jonathan said earlier

<timbl> ______________________

<jar> semantic web protocols end with understanding. outcome is getting the right drug prescribed.

DC: First thing: you said you didn't have a title, but I like the word sustainable that you have in your working title. I like your reference to "insurance policy", which is a use case, which is good. The 404 stuff sounds like emminent domain. Tell me about qualities of curation.

JR: We're still working on that.

DC: The Web itself scales from informal connections between a few people, up to national libraries.

AR: Mainly we're aiming at shared stuff.

DC: Be careful, a lot of important stuff is done, you know... You're asking for guidance...you seem to be on the leading edge yourselves.

JR: Who said we needed guidance?

DC: He did (pointing to AR).

JR: We asked for help, not guidance :-).

<Zakim> timbl, you wanted to point out that the concept f 'contract' sounds ike what I mean by 'protocol'

TBL: I think we've gone a round of agreeing on a lot of stuff and philosophy. Maybe it's time to get more specific. How can we help.

AR: Say we wanted to reimplement LSIDs, with the constraint that data portions need to be unchanging while metadata changes.

JR: We could come up with our own solutions without the TAG. We'll be doing the work in any case. We want to figure out how to have the most influence, and doing this with W3C seems to promote a lot of good network effects.

<DanC_lap> (conversation last summer... was fun enough for a blog item...)

<DanC_lap> (... http://dig.csail.mit.edu/breadcrumbs/node/178 )

JR: The TAG's resolution is helpful because it gives a point of reference. So, having formal TAG resolutions is helpful in getting attention for things like this. So, if we could do 3 or 4 more such things, that would be useful.

<Zakim> DanC_lap, you wanted to offer TAG review of drafts, use cases, and test cases

DC: You folks already have a W3C interest group. You can publish Notes. The TAG can comment on those.

EP: You can give us thumbs up or thumbs down?

DC: Yes, if we choose to review them.

AR: I think Jonathan and I seemed interested in the notion of a separate Semantic Web focus group that might work more directly with you all. We can both learn from you, and bring issues that might add perspective on the Semantic Web as a whole. We'd like to see a more focussed effort by the TAG on Semantic Web.

TBL: Having other people write things, do an end of day thumbs up/thumbs down seems the wrong model. Ongoing collaboration seems better. The group seems reasonable to me. A danger is failing to focus the group sufficiently narrowly. Risk is that the group tries to do everything. One way to do it would be to take the existing draft it would bound the scope.

AR: It's already quite broad in scope.

SW: Yes, that covers a lot of the hard problems.

AR: There are other possible concrete goals, such as reproducing what LSID wanted using http.

SW: I'm attracted to trying to pick up something small. We also need to figure out what next steps will be.

DC: Just realized that this is in some sense the same thing as Tim's call to do a group to formalize HTTP. I think you should set up whatever's convenient. I >really< wish I had the time to be very active in it.

TBL: This morning Dan reminded us of older work he did on formalizing HTTP in Larch.

<Stuart> http://www.w3.org/2000/Talks/www9-larch/all.htm

EN: Regarding Information Resource vs Non-Information Resource: there is already an infrastructure out there for naming genes, and people use it. They will all use a standard name. They pretty much agree on being able reference the real actual (non IR) gene. We could in theory, if we had a good model, map their existing work to URIs.

<DanC_lap> (I think the outcome of the protocol we're talking about here is that people happily use each other's names.)

<DanC_lap> (a little like currency)

<DanC_lap> (so we're designing an economy.)

<jar> list of major trouble spots with URI note is here: http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/Tasks/URI_Best_Practices/Recommendations

EN: Also, the chemistry folks are proposing URIs for their compounds using info: URIs. The suffixes are International Chemical Identifier strings, which has everything you need to specify the structure.

Someone: It's like data:

AR: Not data: because it's not data.

TBL: Yes, but similar in that the identifier conveys that which is identified.

<DanC_lap> (hmm... data: sounds right in that it's like immediate mode addressing in assembler, but... hmm...)

NM: You said it has things like weight. What if they mis-estimated the weight. Does the identifier change?

EN: The INCHI changes.

TBL: Suppose I were to get inchi.org....

<DanC_lap> (this sounds like one of the few reasonable uses of info: or tag: or the like.)

JR: They could have done that, but chose not to.

EN: The discussion is still ongoing. The International Union of Pure and Applied Chemistry would be involved.

<eneumann> http://www.iupac.org/inchi/

<timbl> http://bio2rdf.org/geneid:15275

AR: Interesting to ask why they've focussed on info:

TBL: One of the things you could do, is grab a domain and compete.

<eneumann> http://www.chemspider.com/

JR: People don't want to compete. The id's are for the benefit of the community, not the publisher, but the community isn't organized enough to maintain the domain. Technically this is all very easy, but getting people to trust that you'll hold a domain forever is harder. Rightly or wrongly, info: is generating that trust.

DC: You're trusting that nobody squats on info:

DC: Let's drill on this. Let's say you tried to force someone to zap the list of scheme names. IETF would appoint someone else.

<Zakim> ericP, you wanted to get some hard work out of the way

EP: Thought I'd ask what would be the hardest thing we could try to do in the room. One would be to mesh the terms in the document with TAG's terminology.

SW: TAG hasn't read the document.

TBL: I read half the document, and the only one I objected to was "locator"

JR: I just need a word for a URI has http or https, and known not to have a #.

NM: Is it the base for one with a #?

JR: No. I could change it to fragmentless, http URI...

TBL: The history is that "locator" and URL are criticised as being fragile against relocation of the resource.

JR: I struggle with Name vs URI.

TBL: Do you have a way of pointing to metadata?

<eneumann> s/ICI/InChI

JR: There's a concern that both # truncation and 303 are both heuristics in that the server you're talking to may or may not be following best practices. We'd like to have a conversation about that. If you could know whether what you're talking to does best practices, that would help.

SW: Can you illustrate non-determinism?

JR: I get a 303. I follow it. That's unpredicatble. No defined protocol, except with prior knowledge of the server. RDF may not refer to what I was asking about.

DC: That's like asking about not obeying the law.

<ericP> Noah: We've been taking status code and gradually clarifying

<ericP> ... also introducing new status codes to solve specific problems

<ericP> ... if we'd done that for this problem, i think we wouldn't be having this conversation

<Zakim> Noah, you wanted to talk about best practice signalling

NM: I think this is aggravated by the fact that we've done so much in the style of clarifying use of already-deployed HTTP status codes and headers, vs. defining new ones which any reasonable person would use only in the exact way we suggest.

JR: Yes, that's the crux. People have other reasons for using some of this.

DB: (David Booth) I'm intrigued at subgroup idea. Would like to know who and how.

TBL: I'm in. But realistically time may be an issue. We were talking about protocols. I'm concerned socially and technically about a notion that we'd have a flag saying "I'm one of the subset of deployers who really implement the architecture".
... The only servers sending 303's in response to GET, almost without exceptions, are doing it for semantic web reasons.

JR: There are other examples.

<Zakim> DanC_lap, you wanted to say if, when we push on 303, we find we don't have consent of the governed, we'll have to do over and to nominate EricP to make a mailing list

DC: If when we push on 303, we don't have consent of governed, we have to redo

TBL: FOAF has a protocol. I don't see why there shouldn't be analagous protocols for Life Sciences. We need to make sure you get for free what you can from Web Arch, and what you define is consistent.

AR: The things we see in the Web Architecture aren't solid enough on the Web Architecture side. 303 says "See other", it doesn't say RDF should be there or what that RDF should be about. We need something you can rely on. We need a name for what you'd like to rely on.

SW: I was thinking "this is the Web, this is a fact of life

JR: But you can give names to layered protocols like GRDDL.

<Zakim> Noah, you wanted to talk about rfc 2616

<DanC_lap> (the charter of the new HTTP WG is conservative; it's not supposed to do new features, but just to clarify.)

NM: If 303 really means "use this only for Semantic Web to return RDF about the redirecting resource", then RFC 2616 should say that. If 303 really means a looser "see also", then we should acknowledge that getting a 303 only leaves open the possibility that you'll get the RDF you need there. That said, I also think it's crucial that the Web is a latebound, discoverable mechanism. The way you find what's out there is to interact with it. Trying to get early warning about what you'll get is just asking for hints (not always a bad thing, but I don't want to lose the ability of any HTTP resource to return any media type it likes...modulo accept headers, etc.)

<ericP> some amount of effort in (re)defining 303 for our special case

<ericP> teaching your server to respond with a 303

<ericP> if 303s conflate 200s and 401s, can we live with that

EP: There's some effort needed to teach servers to send 303's at the right times.

DC: Taken a long time.

<DanC_lap> oh... it's purl.org servers

EP: There's the issue of having decided to use new code vs. 303. I haven't heard any discussion of whether we needed status code or not. Seems to conflate 200 and 401s.

<DanC_lap> (I have a worry about OpenID's use of http 200... sorta like the hotel internet login problem.)

NM: (mumbles) I've always wondered whether it should have been a header.

EP: Yes.

TBL: What if you ask for all of wikipedia, which is too big? I tell you you're not allowed to have it all.

EP: How do I tell you all that?

TBL: RDF.

<Zakim> ericP, you wanted to ask if it's harder to invent a new status code than to re-interpret 303

JR: Is the only action item for me to set up a meeting with Tim and David?

DC: I think we need a mailing list. Need a name the baby contest.

<alanr> awwsw

<timbl> public-awwsw

<Stuart> skw@hp.com

<Rhys> rhys@volantis.com

<scribe> ACTION: Alan Ruttenberg to set up biweekly teleconferences on HTTP semantics to be held alternate Tues mornings starting 13 Nov.

<trackbot-ng> Sorry, couldn't find user - Alan

<ericP> jar@creativecommons.org

<ericP> alanruttenberg@gmail.com

<ericP> skw@hp.com

<ericP> rhys@volantis.com

ADJOURNED

- DRAFT -

Joint meeting of W3C HCLSIG and TAG

5 Nov 2007

Attendees

Contents

Introduction to HCLS and topics of interest

Summary of Action Items