TPAC 2007: Session 6: URI-Based Extensibility: Benefits, Deviations, Lessons-Learned

Transcript

David Orchard [introducing the Panel participants]: Tim [Berners-Lee], Dan [Connolly], Chris Wilson, Ian Hickson.

David Orchard: Apparently I don't have any audio/video today, that's OK.

So what I am going to do is... so our panel today is about URI-based extensibility and some of the challenges etc. associated with that.

You have a wonderful, lovely technical description in the Technical Plenary program you have got. We start off by saying "Using URIs, as opposed to plain strings, either directly, or as a means of scoping, for example namespaces or profiles, grounds extensions in URI space. So there is a bunch of different ways that we can come up with URIs for things. We can use URIs directly, we can use namespaces, such as using in xmlns declaration, we can use the profile attribute in HTML, we can also have custom algorithms for creating URIs from names like schema components designators, WSDL 2.0 components identifiers, ways of taking names and create URIs from them.

Next sentence is "this enables decentralized extensibility and it enables follow-your-nose style discovery of information about extensions". So you... somebody's browser gets a document, and stops on one of these names in it, or a URI, you can serve a URI from the name, and then your piece of software can go somewhere and find some information. This is what the TAG has been working on for quite some time, we call the self-describing Web, and we spent a fair amount of time working on it. We spent a fair amount of time working on particular formats for how we would enable this, such as GRDDL and things like that. So this is how you dereference a URI and you get a representation.

The next sentence is "overtime the W3C TAG has come across several examples of extensions that are not grounded in the URI space and the difficulties that it causes". Our recent interests are the possible duplication of extensibility attributes from HTML 5. So what's going on here? Is this going to be duplicating the profile attribute, so we won't actually be able to follow our nose on these things, what's exactly going on here? There's a proposal to add a form of namepaces to HTML 5, Sam Ruby on his blog wrote a blog entry called "HTML 5 and distributed extensibility" where he roughly proposes using xmlns in HTML and using a colon as a separator etc. and the TAG thinks trying to come up with ways for decentralized extensibility is a good thing so we sent off a note saying "hey, this looks really quite interesting".

And the use of unqualified scoped class attribute string as semantic tags in microformat definitions. So if we go to the xfn description, it says "xfn enables Web authors to indicate their relationships to the people in their blog rolls simply by adding a rel attribute to their <a href...> tags.

And then there's a <a href... rel="friend matt">, so somebody just plunks a name inside an attribute and you know they got some value there, but we don't really know if that gets a URI there or not.

Now microformats claims that you should use the profile attribute there for the meaning of "friend" and "matt". We need something. But we found that there is not widespread support for the profile attribute used in microformats, as this example exactly shows, because there is no mention that you should use the profile attribute. Tantek Celik, who was supposed to be here, and wanted to be here but unfortunately could not make it at the last minute, wrote up an entry in the microformats wiki called "misconceptions" where he says actually the names are grounded in URI space we know we already had some dialog about the real world aspects of that vs. the specification style.

A little bit more on URI-based names, some of the real and perceived limitations of URIs under XML namespaces sometimes provokes other solutions. One good example of this is CURIEs. when the example shows where you declare something like a wiki namespace and then when you refer to something you go [wiki: ] and that parses strangely sometimes and OK other times, so you know limitations of XML namespaces in a particular names caused another solution for how to create URIs for these names.

So we are seeing a number of new kind of things emerge here and you we may not get these in URI space.

So that explains our technical description, and we have a panel here, a wonderful panel that is going to talk a little bit about that.

And what our format is that I am going to introduce each panelist, most of you know anyways, but I'll introduce them, and then each of them... we tried to do something a little different here, each of them, or most of them, have presented a question and each of the other panel members will be answering that question, so we have asked them to provide their own question, etc. So we thought that it would be kind of fun to get a, you know, a stance to start off with.

So our panelist are... everybody knows Tim Berners-Lee... Dan Connolly is... his interesting side is obviously as co-chair of the HTML Working Group and he's played a lot of roles in all aspects of the web, URIs, HTTP, XML, etc., Ian Hickson, editor of the HTML 5 specification, and Chris Wilson, co-chair of the HTML Working Group and platform architect of the MS IE team.

So what we are going to do is start this.

Ian Hickson was the first one, he is going to ask his question, and the others in the panel have a chance to answer, and then Dan actually came up with questions, so he is going to go next, and then we are going to see where we go from there and try to have the last 10 minutes or so for audience community,

So, Ian take it away.

Ian Hickson: So my question was that several technologies have tried using URIs as a base for extensibility, XML Namespaces, foaf, the profile attribute that you mentioned, RDF and so forth. Despite their quite mixed success in the standards world and with computer scientists they seem to always completely fail when the larger Web community, or real world, if that's how we want to call it, is exposed to them. For example, in most of about a billion pages I hit in 2005 I found that the html 4 profile attribute was used in about 1.4 million pages... which sounds like a lot, until you realize that the dc.title in the <meta> element, which you should always use with the profile attribute was used about 19 million times. So 1.4 million times we use profile ever, and 19 million times we use dc.title, just one of any number of values that requires the profile attribute. So that's 10 times more times we use the extension mechanism than we use the extension declaration it wants.

So my question, now that I have, you know, just stepped onto the soapbox, my question is what would an extension mechanism that actually works, when all those works, supposedly look like?

Dan Connolly: So, I think... I still think that the existing mechanisms can be workable [microphone problem] I have not given up on the existing mechanisms, namepaces and the profile attribute. If we look at the statistical argument and if I look at... internet domains names are a little bit like real estate in cyberspace and URIs are, similarly, like the size of a little place, so if you look at the size of my backyard compared to the size of the planet, you know, it's zero to several significant digits, so then clearly I'm insignificant and you should also take away my backyard... No.

[Applause]

Chris Wilson: Well, I'm not sure I want to follow that one. But I think the problem from my perspective is that there's not been a huge benefit actually from using the profile attribute. It may be the right thing to declare your namespace but if it stills works any ways, who cares? I think we had a similar issue with... IE has actually for quite a long time implemented something similar to Sam Ruby's proposal for HTML extensibility, in allowing some -- oh, let's call it a bastardized version of namespaces-- inside HTML syntax and the downside of this, the massively nasty thing in my opinion about the way that it works is that it really doesn't care about the URI, you were supposed to declare one, but it is not really significant for any reason in the system in that it does not have anything sensical... frankly even if it is anything but an empty string, it really does not matter in the way that it is used in the system in IE today. So I think the point is really... as long as you are unambiguous... URIs can be a good thing to disambiguate but if you are not using it for that reason then I am not sure it's necessary.

Tim Berners-Lee: We have heard a lot about that community. So... as long as there's one community, there's an HTML community. There are in fact lots of other communities... there are some big communities and there's a community of people who read badly nonconforming [...] pages and it is huge, and the community of people who make them is smaller but it is still very very large. There are lots of other communities as well. So, for example, some in other languages... the community that is just behind firewalls, you know, the banking community, there are all kinds of people that have different languages.

HTML has got in the unique position of being the most widely deployed language out there and it has this huge community. Now... For HTML, it makes sense to say "well, we just have this huge sort of meritocracy/democracy thing to bring all those minds together to produce HTML, and when we need some things that are of interest in that community, like a microformat format, saying who wrote this any way and where in this event is taking place, those things are of interest to the entire planet, so we all have the chance for that agreed, as dc.title just agreed at the United Nations level, or an equivalent of that. And we all do it with some process, not necessarily as bureaucratic as at United Nations, but you know, that's how that scales for that sort of things.

However, when you look at smaller languages, if you want to put or add something into HTML, and you happen to be just a member of a smaller... banking community and you want to put something in HTML something like a check then you won't get the large HTML community to do it.

So the reason for namespaces being in there is to allow this sort of extensibility, so that the smaller community can be go in there, and put its little URI up there, people never know who it is, why are they asking any way? They weren't actually requirements for HTML, they weren't requirements for XML, namespaces weren't requirements for RDF. RDF works in such a way that when you get a shipment of RDF across the network, each statement, all those statements are independent, and there's a very very strong "ignore what you don't understand" rule. Every statement has got its meaning defined by a URI. Every single statement.

You can look those up independently. So namespaces are not only something rdf find useful, rdf absolutely thrives on them. Namespaces are the things that RDF are made out of. That is not surprising then, that RDF asked for namespaces in XML in order to be able to put that in.

So the... I'm not sure, now you could argue that XHTML should not use namespaces, that it should be a single language, it should not really be extensible, and if you want to put something else in it, like SVG, you don't use the namespace for SVG, you just invent canvas, you just invent a new SVG, and that will scale for the HTML requirement, but if you feel you have a requirement to be able to interoperate with the smaller communities, then you have to provide the hook, you have to allow people to put a hook in.

Now, in RDF, the difference between HTML and RDF is that when a RDF processor finds a URI, it looks it up, and pulls in an ontology, and gets a lot of information,that allows it to actually show how to display it, shows how to manipulate it figures out how to process that, without having seen the ontology before. They are actually used, seriously, and if it breaks, then the RDF application breaks, so if people spells the namespace wrong, things won't work. So there is immediate feedback, which you don't get when you say "look will you please, like for the sake of form, if you want to, put this URI on the top of things". So there are very good reasons to put this hook into RDF, and I am wondering if we should draw a line to find out at some point where we can... where we get some acceptance to the responsibility to put this hook in for other, smaller, communities. Does that line actually exist and actually at the long term.

David Orchard: So we've kind of heard at least a fair amount of support for keeping the use of namespaces etc. I think I would like to return to Ian's question at some point, which is "Is there anything that's needed for a simpler mechanism that might be actually used for something like microformats. Maybe there is just no need."

Dan Connolly: So support in dreamweaver for the profile support, please.

[ APPLAUSE ]

David Orchard: Dan, you had a question, that you cautiously wrote up beforehand, so we'll ask you to ask your question and then each of the panelists will answer.

Dan Connolly: In C programmers, I don't know how many are here, and in emacs lisp community, maybe a smaller community, certainly a smaller community... emacs has a global namespace for function names and the... the convention for avoiding collisions is basically to prefix names with your initials or something like that. So in the next generation of software development languages, we've got Java, Python, Prolog, Ruby, etc. and they pretty much all have explicit package namespace mechanisms. Is it a bug or a feature? Is that Java's package names rooted in DNS, so that you write org.w3.something... for the java package names for the java bindings for the W3C DOM API, but in Python you just say "import minidom" or something like that. So is that a bug or a feature.

So my answer: some of each. It's a bug when company A buys company B and all of company C's code has to change to update the references from A to B. But I guess there are a couple of workarounds in the refactoring tools so if the people in company C have Eclipse or whatever and does a global replace and... It's a feature to have this DNS-based stuff when the community out grows and "everybody plays nice" models out of perl CPAN and the Python cheese shop. So the DNS model actually supports this mutually distressful parties marketplace that scales much larger.

Chris Wilson: So I think that it's a feature with a bug in it. It's... the feature obviously is disambiguation of a global namespace, right, it's easy to create a new component or library or whatever and have it live in the global space and clearly inambiguously. The bug is... maybe a buggy design decision, which is that you can't decide not to do that, you can't say "I am the implementor of this component, I don't have to tell you where I'm coming from". It's kind of interesting because @@ takes a very different approach, which is: I'll just make up a random number and that random number will be unique to me. So if you know what my random number is then you can try to replace me, if you don't, then it's irrelevant, your random number doesn't really change.

Ian Hickson: I think it would be an interesting alternative to using URIs as just using the domain names, I think it would be a good idea actually, because the names are short and easy and manageable, and like Chris said you can actually, not in the java case, but we could take that mechanism and say in some cases all what we need is only the first part.

Chris Wilson: I thought for a second you were going to suggest that we can use the random numbers for everything.

[LAUGHTER]

Ian Hickson: That would also work.

Tim Berners-Lee: There are a lot of systems that work with the random numbers. In fact I just came across a whole bunch of linked data where the URIs are all random numbers that got the http:// in front of them. So they are random numbers exposed by HTTP.

I like the fact that in Web Architecture that you can follow your nose, that one of the... There's some variance that... If I give you a URI I shouldn't have to give you the context. If I give you this URI, you should be able to figure out what it means and the browser should figure out what to display to you or what the ... If I give you this URI and I am this bank and you follow it out and you get your bank statement, then there should not be any room for discussion about how much you owe me or I owe you. So when you go through this "following your nose" thing and then every now and again you will have to look it up on the Web.

So at the moment... so RDF, as I said, benefits from these things NOT being random numbers. It would be nice to start off with a little core and then develop very sophisticated application by pulling in stuff.

I like Dan's analogy with the programming languages, it is interesting.

Looking at some... A lot of the problems that we all got [...] are induced with package management, management of installs.. without picking any particular operating system. But some of them does it quite nicely, for example Debian, that actually carefully pops up the relationships between things that you might want to install and allows you to have two different versions of something because you have two things, one wishes version 1 and one wishes the other, it's something that I think would be an incredibly powerful tool if it'd used URIs. If instead of having to be registered -- you have to go to Debian to register a package name -- if it was just opened up [...] Similarly, I'd like to be able to say "from http://something, import something", so I know exactly what it is I'm importing. And we can use metadata, you know, add the package metadata to it. But otherwise we don't have things... we end up with search paths, for example, search paths, have been in a way wasting bugs code. I've spent time in my time, in my life, figuring out why some piece of java does not run, some piece of Python does not run, because I have not set up the such path, having to... give someone a pointer to something, but also give them all tons of context, "oh, by the way, you should have your search path to look like that", "when I say /usr/sbin, you should have the following things in /usr/sbin", "by the way, you should have version 6 of that particular dynamically linked library", you know, there's a huge amount of stuff that is extra-context that you need. Yeah I can see use of those random numbers which would be a competition of which would be the great index of "I'll give you a random number and you give me back some sanity and some information about it", it would be a huge competition for the first to give this sanity or who can lookup at those random numbers.

Dan Connolly: First, Debian's great, I love Debian, I use Debian, but again that the "everybody play nice" model and there's a bunch of mutually trusting Debian developers that agree on all the package names and when there is an integration bug, that community works it out, and they have this advanced camp to set up voting thing, fights, and it's a whole political world by itself. But it's ultimately a trusted community. In contrast you have the operating systems that consumers use, Windows and MacOS X, where you have mutually distrustful parties deploying stuff there. That's the hard part I think. What am I trying to say... what goes on for the registries is diffcult, that's for the first point.

The other part is the search path problem. The search paths problem is dealing with the lack of consensus in the world, that's always going to cost. You are trying to have several releases of stuff and you have a private build and you've got one for this network-wide, and stuff like that, so you try to have several versions of reality on your machine at the same time. When the software gets mature and everybody agrees that version 7 is good enough for whatever and it stops evolving then these problems diminish considerably. that's when you have consensus... you can publish the whole thing on the Web and just cache the software on your machine and then you would be okay, but we don't reach global consensus on our software all that often.

David Orchard: OK. We have one more set of questions, Tim is going to ask... I think you prepared two questions you can choose either or both, it's up to you.

Tim Berners-Lee: We have, maybe, covered some of these... Okay. So I think the first one for the record was "should we really just be doing namespaces in RDF?" because it's clear here that they are very important, should we draw a line somehwere between HTML and RDF, because there should really be namespaces in RDF, we've discussed that already.

The other question, so the micro format folks feel,... when they don't put a profile at the top to indicate that there's not new micro profile and my point of view is that something writing a client, which is supposed to understand the data out there on the web, and wanders around to... web pages and supposedly able to suck off data in the latest cool style, this means basically that I have to tune in to the latest microformats. So that means that I have to get after the servers tonight that presumably tell me... does this mean I have to get an RSS feed of Tantek's announcements of what is now we decreed a microformat. And whenever Tantek says there is a new microformat, does that mean that I have to sit back down at my javascripts, or is there another way... That, to me, ... we don't have Tantek online... [David: he is on IRC] ... I say Hi Tantek on IRC...

David Orchard: I wonder if we can get his question captured accurately, get it through the panel here, then if Tantek has the chance to respond online and somebody could read that back Tim, that would be kind of fun. Actually, Tim, do you have an answer to your own question?

Tim Berners-Lee: Question is "do I have to add new code every time Tantek has added a new microformat?", answer is "Yes I do". Or do I have to ignore the microformats data or do I have to just wait and hope that there will be enough data that people will pull put in with... a URI which will give myself a clue to where to pick up the metadata which would allow to follow its nose, to be interesting and people will realize after a while that the generic browsers won't see the data which does not have the profile on it.

Ian Hickson: The short answer is, like you said, Yes, like we do on the mobile web, the reason behind that though is that the use cases that were considered by the microformats community assume that whatever you are going to do with the data is specific to that particular type of data. For example, with hcard, you're not going to just be using a generic process for it, you're going to be using it specifically to put it an address card in an address book mechanism/system or whatever. So you need to have special knowledge of that microformat anyway, so there's no value to automatically process it. I'm not saying that it's a valid argument, I'm saying that it's probably where it came from.

David Orchard: Attempting to channel Tantek, are you?

Ian Hickson: I'm sure I'll be able to talk for myself :)

Dan Connolly: My problem with this is that microformats community has claimed eminent domain on "class=v..." events, I suppose. I wanted to use them for something else they said "No, you know, on the global scheme of things it's just too much of a pain in the ... and too late to do that. So we'll take it away from you". So I think it's cost-effective for people to put profile URIs at the top.

Ian Hickson: Just call it "DanC v event".

Chris Wilson: So I... part of my answer is certainly agreeing completely with you. Yes, when new microformats come up you will have to add code, and yes, you will have to subscribe to the RSS feeds of new known microformats, but the reason there is that micro formats is really... it's an interesting graph of the new known semantics into an already existing extensibility mechanism. Really the question is "HTML class, is that a viable extensibility mechanism?... and should we have never done HTML class" would probably be a better answer there. The hard part is of course, as Dan said, there's a kind of eminent domain in microformats. Tantek and I have had discussions about whether we should have prefixed them all or put something to separate that out, allow the CSS vendor extensibility, which I think I am fairly unqualified to be a proponent of, right now, but would still probably be a good idea.

Dan Connolly: Actually, the HTML5 spec, the current draft I believe, says that the class values and relationship values live in a registry or a wiki. So "rel" as a central registry or wiki if you like, that would be a solution if the community says "yeah, I don't really manage my own relationship values, I just want a big list in the sky". That's a stable position. We have that big list in the sky for MIME types and we don't like it but we are good by. So think about that if you think about HTML5 and relationship names and class names and stuff.

Chris Wilson: I would say that... with microformats in particular, the whole point is that it's a well-known semantic, right, it's a well-known schema, a well-known vocabulary, and if it isn't, if new ones pop up and are not being used or there are no processors looking for them, then it really becomes pointless and just it's just another class value.

Dan Connolly: Right, well, that's a community value that is reflected in their wiki, all I'm saying is that currently, the HTML5 spec has a link from the specification to this wiki, which allows to do the follow-your-nose thing, so if we want to say that microformats.org is the registry for rel values or something, that's the global consensus, that's stable, it's a marketplace that makes sense to me.

Chris Wilson: Probably, the hard part of it is... what you would be really saying is "that wiki is the repository of class values" and that means they can take over, via this you put eminent domain... ... They could take over class values that people are already using in the wild for CSS styleheets or things like that.

Ian Hickson: That is why class currently in the HTML 5 spec doesn't have one of these things, because we actually had that at some point, and several people were like no, no, you are stepping on my toes.

David Orchard: So you have done this experiment, not the thought out experiment but the community experiment.

Ian Hickson: There's still that wiki page out of that claims to be that because I have not removed it.

David Orchard: Did Tantek get an answer into IRC by any chance? Could one person come up to the microphone and read it?

[trying to get audio communication with Tantek Celik... failure...]

??

Dan Connolly: I am happy to read for Tantek, but I can't see the answer, I'm sorry.

Molly Holzschlag: Okay. So speaking on behalf of Tantek, this is Molly here. Tantek says... his first answer was... the first point was to "decide to support a snapshot, a static set of microformats as we are doing this now". And the second part of his answer says "only support contents with explicit profile declarations and follow your nose on those URIs, and only look for those microformats defined in those URIs." That is what he has to say.

David Orchard: Interesting. Okay... I think we have three to seven or six minutes left.

So let's open it up to the floor to questions from the audience, and remember to say your name and your affiliation if you have not done so already.

Sandro Hawke, W3C team: I am the staff contact for the Rule Interchange Format Working Group. About 2 years ago we started, our charter included a line "We need to come up with an extensible framework for the rule languages", because the rule engine market is very fragmented and there is a lot of different rule languages, you can't come up with one that covers everything. So we've been working on that and ... I've tried to throw that together... and the plan we have for RIF looks like it applies to all XML-based languages as well. So it's a solution for forward compatibility. I'm not sure it works, it's not implemented yet, but if people are interested in joining that discussion, I'll paste the URI to our current editor's draft on IRC when I get back to my seat or you can come talk to me.

David Orchard: Do any of the panel members have any comment, have you looked at what they've been doing?

Dan Connolly: I have had a very good conversation with the RIF Working Group about extensibility and stuff, and Sandro has sent me a pointer, but I haven't read it yet, but I hope to.

David Booth, HP: With regard to microformats, I in some sense find ok to say there's a well-known semantics to them, but then there might be other ones that somebody makes up some personal ones that nobody cares about, but then you have this gap in between and how do you get from a new microformat that nobody cares about initially to one that does have well-established semantics. If you don't have something like URI extensibility, you've got that gap, how do you bridge that gap?

Chris Wilson: I think the first answer is: A personal microformat is not as a class. It only has the applicability in your document or your set of documents because you have put that on it.

To grow that applicability, it's probably true that the best way to do that in a medium range extensibility system is to use URIs or some disambiguating factor, whereas microformats today really disambiguate by just saying "I know what an hcard is" or whatever, and there's only one... there's a fairly limited set of those things.

David Orchard: The one thing that I will point out is that they have had conflict in the microformat stuff already... in particular when one of the attributes clashes with title, and when the multimedia, I think the multimedia semantic web folks did a really interesting report on that and said it had nothing to do with that track:title and album:title, they had introduced their own form of namespaces because title had already been taken. So they ended up introducing namespaces in there, that was kind of "oh yeah, that's obviously the problem, you have name clashes".

Philipp Hallam-Baker , Verisign: I reminded of David reading this maxim saying that "every problem in computer science can be solved by adding another layer of indirection", and he also went on to say that you create a problem by doing that...

Chris Wilson: That is job security for you.

Philipp Hallam-Baker: Yeah, well. Now it seems to me that a little piece of mechanism that is missing here is the ability to say, to make assertions of the form "this identifier says that same thing as this identifier". And this is one of the criticism that gets made about the ontology in Semantic Web, it is a very static view. You map out a vocabulary of shared terms and that is what we are going to use. Actually what happens in the real world is that ontologies are malleable and evolve over time, particularly for my field of computer security. When we have a emergent threat, when a virus is about to appear from the wild, everybody sees it and names it and then afterwards we agree on what we are going to call the common name that we are going to use for it. I think that is that malleability that we've got to get into, and the process of convergence on a common identifier, and it's something that must be faced as a first class object.

Dan Connolly: Yeah, I just got a pointer to a conference next year that I hope to go to, something about semantic web and mixing in pragmatics, which is a lot of that.

David Orchard: Any other panelist want to answer that question? okay... We have got a last question before the break.

Phil Archer, Family Online Safety Institute: Talking about the semantics and pragmatics, the GRDDL Recommendation references an HTML profile, RDF/A is going to its Working Draft, my Working Group, the POWDER Working Group, will be referencing an HTML profile to set up "rel=powder"... Given the state of disagreement in the panel, what practically should I put in our Rec track document now, so that "rel=powder" is defined?

Ian Hickson: You can add "powder" to the WHAT Working Group wiki, where we define "rel" values.

Phil Archer: And what's to stop someone else from coming next week and changing it?

Ian Hickson: Me, I come out and check those on a regular basis. But if someone else comes up with a "rel=powder" which turns out to be more successful than yours, then they win.

Dan Connolly: Right, so if you want your own piece of real estate, you know, you're in W3C you can have it and you can use GRDDL.

David Orchard: Tim? Last answer?

Tim Berners-Lee: Yes, Use GRDDL.

David Orchard: Thank you all panelists...