Personal RDF Thomas Lörstch thomas@stray.net The main focus of this text is the issue of "named graphs". We need a fourth element in RDF to contextualize triples and it needs to be a full citizen of RDF 2.0. The second section contains a few remarks and proposals, without going into detail, about blank nodes, reification, a 303-solution, the inclusion of a base vocabulary into RDF and the importance of templates, howtos and other modeling aids. 1. Named graphs, contextualized graphs, personal graphs RDF 1.0 always seemed overly positivistic to me. 10 years ago, when I first stumbled over the semantic web, I was active in grassroots politics and looking for ways to connect the archives and publications of different communities. These communities excelled at creating different terms when they essentially meant the same thing but were very specific about the exact interpretation of that term, reflecting their very own understanding of the subtleties involved in the correct interpretation of the underlying topic. Words can be powerful and so people fight about words, and meanings, and slight differences in interpretations etc. The idea that all knowledge could be connected in one big graph through some shared vocabularies seemed very inadequate under these circumstances, maybe fitting some e-commerce vision but certainly not the advancement of communication and understanding in the social space. Consequently I turned to Topic Maps which from the get go included two crucial features: - scope, a notion of which RDF lacked completely and - reification, not like rdf but in a decent way (they also included 303 done right, but that's another story) This was perfect to model topics and how they were connected in the views of different people and groups, how those views were connected, and so on. reification of a reification of a scope of ... topic maps a very powerful tool indeed. But after a while I returned to RDF, partly because of the money - there was no free and scalable topic maps engine - but also because I found that topic maps are overly complex for mass data, e.g. for filling the archives after a common and interconnected ontology had been established. I figured that it would be better to mimic Topic Maps in RDF for modelling the terminologies and use plain RDF for instance data. I started to think of Topic Maps as the T-Box and RDF as the A-Box of a powerful and scalable knowledge tool. Still it was harder than I thought to model Topic Maps in RDF. Can't you model anything with triples? Yes you can, but it's not necessarily a pleasant and efficient experience. Reification for example isn't and scope isn't either. RDF efficient and streamlined and elegant only as long as you (can afford to) stay focused within the confinements of a common world view and a fairly one dimensional application model. Put anotehr way: RDF is fine to describe everything but it lacks one degree of freedom to put that description into perspective. Put yet another way: while RDF is based on the open world assumption it does not take into account that there may be more than one world. The web is increasingly becoming the read-write medium that it was intended to be in the first place. This means it's not only the place to publish finished products anymore. It increasingly becomes the place where ideas start to grow, where discussions form consent, where many-to-many processes evolve. If semantic technologies want to be useful in this fluid and rapidly changing process they have to offer more than a shared logic and shared vocabularies: they need to provide means for personalizing the semantics, for contextualizing and outlining discourses, for making stages and places and conditions explicit. We're not talking about universal truths so much anymore. At least we start to agree that there isn't much universal truth that's relevant and true to everybody. The rest has to be evaluated and negotiated and re-negotiated and re... these are the contexts and they need named graphs. The web 2.0 communities are a reflection of this process but it will get much more diversified. The communities will become both more and smaller, they will leave the central servers and connect decentralized, they will become 'personal'. RDF has to facilitate this process. This is where the really interesting stuff is happening. This is where people articulate themselves, where they meet and organize. This is where communication is happening. We do not have to bring linked data online to bootstrap the semantic web (although it's a useful thing to do). The web didn't grow that way, and the semantic web won't either. We have to go to where the needs are. and the needs aren't to immerse oneself into a sea of data but rather to make it personal data, "my data", "my peer group's data". To facilitate this personalization of the shared network, the blending of individual graphs and the giant world wide graph - without loosing neither of them in the process - a mechanism to contextualize graphs is essential. Observing the discussions about named graphs I was worried that this approach would again not reach far enough. There was talk about adding provenance information as if that was the only use of a graph name. Or trust. Or something else. We will fail to build a scalable solution if we do not make the fourth element a full citizen of RDF. It must be possible that a triple is part of different contexts. It must be possible that these contexts are characterized by arbitrarily complex constructs. Applications may very well be streamlined to only support one kind of graph in order to be more efficient and scalable. Tractability may require constraints on how complex the description of a context is allowed to become. But as a general rule the name of a graph may be a graph in itself. Then we have a truly universal language to describe the semantics of what's going on on the web and what we do there. 2. Some more stuff 2.1 Blank nodes as containers Blank nodes are a gopuping mechanism. The should be rethought as such und put in relation to containers. That would make them less arcane. 2.2 Reification Reification like it is defined by RDF 1.0 is best suited for the special case of citation, for speaking in quotation marks. Other uses should be deprecated, since named graphs can take care of that in a semantically much sounder way. 2.3 Getting the 303 right The 303-solution to the question if a IRI denotes a subject of discourse or only a proxy for such a subject has serious drawbacks. Most of all: the distinction made is not visible to the user. Which browser does show the status code of an answer? Where in the source code of a HTML page can I see what I'll get? How do I encode what the user should expect? Is there any formal way how to descrtibe which IRI points to a Resource and which points to aproxy of a resource? Is there a way to I tell my server administrator that doesn't have the semantics of a post-it note? But this is something that RDF could do very well (possibly embedded into a web page as RDFa). Therefor I propose to add to RDF a property "subjectIdentity" with two possible values "subjectLocator" and "subjectIndicator"- or something along these lines... In case the subjectIdentity of a resource is not specified and a decision has to be made the default would be "subjectLocator" since that is what common sense seems to expect from a IRI. 2.4 A base vocabulary Probably nobody wants to get into this discussion but still... Dublin Core tremendously pushed the idea of the usefulness of metadata because the element set was there and it was immediatly intuitive to use. But it was there early, maybe did good marketing too (I don't know) and at the time (mid of the nineties) seemed like a nobrainer to use. Otoh how many vocabularies today define a person? SIOC user, FOAF friend, DC author ... just to name a few. I think it would be good to have a vocabulary of about 10 to 15 terms formalizing the very common, very general, very often used concepts. That vocabulary should be reasonably organized and define mappings to established and popular vocabularies and ontologies like word net, wikipedia, FOAF, SIOC, DC etc. This would not only give users a hand in adding basic semantics without having to hunt down and choose a specific vocabulary first but also facilitate linkage between different vocabs. Some candidates would be: individual organization time location quantity account adress That's only 7 so there's still space for improvement... 2.5 Examples, HowTo's, recipes, templates The way RDF works doesn't end with the triple. It often requires some involved modeling and elaborate constructs to convey exactly the intended meaning. Seemingly negligible details can have a huge and undesired impact on the semantics of statements. The techniques underlying the semantic web are not common sense and logic is far from trivial - but it often looks like being so, which can make it especially dangerous. Most of the people building the web are not computer scientists but learned it by doing, gleaning from other peoples code - HTML, PHP, CSS etc. Like a little semantics can take you a long way, a few examples and templates can cover a lot of use cases. Ideally a tutorial adds enough background to enable the user to understand and correctly modify and extend them. The Primer is probably the right place for this.