BnodeSkolemization

From W3C Wiki
(Redirected from BnodeSkolomization)

Bnode Skolemization

This page is for collecting use cases and requirements as a prelude to drafting a Bnode Skolemization Specification ("bnode skol spec"). It arose from an email thread on the semantic-web@w3.org list. The bnode skol spec would define a standard (but voluntary) process for skolemizing bnodes into URIs -- i.e., converting bnodes into URIs -- that RDF users could use to eliminate bnodes from RDF graphs. The main motivation for skolemization is to make the syntactic "names" of bnodes stable in the face of queries and other RDF graph operations.

Suggestion: For each use case, requirement or idea that is listed, specify who is willing to champion it.

Use cases

  1. Alice performs a SPARQL query and the results contains bnodes. She later wishes to perform another, followup query that refers to specific bnodes that appeared in the results of her first query. [This needs a bit more explanation and reference added.] Champion: ????.
  2. Bob has a body of RDF data which he plans to publish as 'linked data'. After reading Bizer, Cyganiak & Heath he realizes that his data cannot be truly said to be 'linked' as it has bnodes in it. He wishes to get rid of the bnodes without changing the essential content of his data. Champion: Pat Hayes
  3. Charles periodically publishes an updated version of his RDF dataset. Typically each version contains additional triples, but sometimes a few triples are deleted. The original data contains bnodes, but Charles publishes his RDF in skolemized form. To avoid spurious differences from version to version, Charles wishes to ensure that the same skolemized URIs are used from version to version as much as possible. Champion: David Booth

Requirements

  1. Round tripping. It must be possible to skolemize the bnodes in an RDF graph, and then de-skolemize them back to bnodes, resulting in an equivalent RDF graph -- ideally with the same bnodes as the original graph, i.e., the same blank node identifiers Champion: David Booth
    David talks about restoring the 'same bnode'. This does not make sense. Bnodes do not have a stable identity. Think of a graph as like a drawing on a piece of paper. A bnode is just a mark made on the paper. Skolemizing it erases the mark and writes in an name instead, in the very same place. If you then erase the name and make a mark again, it does not make sense to require that this mark be the very same mark as the first one. The identity of the actual marks is irrelevant, anyway: all that matters is that there is a mark there, at that position in the graph. -Pat Hayes
    Okay, I have now added clarification to indicate that this is about restoring the same syntactic blank node identifiers. While it is true that the identity of the marks is not relevant to the *semantics* of the graph, this is about retaining *syntactic* stability (while retaining correct semantics, of course). And yes, it is true the RDF specifications do not give bnodes a stable identity. That's why this requirement is included here. -- David Booth 00:57, 19 June 2011 (UTC)
  2. Syntactically distinguishable. A URI that was skolemized per the bnode skol spec must be distinguishable from non-skolemized URIs by a simple syntactic pattern. Rationale: This is to support round tripping and anything else that relies on recognizing skolemized URIs. Champion: David Booth
  3. Uniqueness. The process for generating bnode URIs must never generate the same URI for different bnodes or a bnode in a different graph. This constraint must apply globally over the entire Web for all time. Champion: Pat Hayes
  4. Stability. Given a graph g2 that is known to be a modification of another graph g1 (i.e., some triples added, some deleted), the user should have the option of skolemizing g2 such that the differences from the skolemized g1 are minimized. In the simplest case where g2=g1, the skolemized g2 should be the same as the skolemized g1. (This feature should probably be optional, since it gets into the issue of graph isomorphism.) Champion: David Booth
  5. Simplicity (of RDF Semantics and implementations that deal with RDF data). The ability to work with a Set of Triples as a Set, such that one can do simple set operations (union, intersection, difference etc.) and such that the RDF Semantics can be simplified accordingly. Champion: Nathan Rixham [Comment: the RDF semantics already assumes that graphs are sets of triples. - Pat Hayes]

Ideas

List ideas/proposals below, perhaps as links to archived email messages.

  • (add ideas/proposals here)
  • Idea. Skolemization is restricted to a special category of 'graphs', to wit, named stable g-boxes. The skolem URI should incorporate the graph name, so that later processes can discover that the entity being named by the skolem URI is a thing described by that graph. This is analogous to the English conversational construct of using a phrase like "That person you were talking about last Tuesday" to refer to an earlier un-named entity. -Pat Hayes

Background Materials to Consider