Difference between revisions of "User:Rcygania2/B-Scopes"

From RDF Working Group Wiki
Jump to: navigation, search
(Someone being pedantic)
m (Definitions)
Line 40: Line 40:
 
'''''Note''': When declaratively describing the structure of an RDF graph, it is often convenient to use the concept of a “'''fresh blank node'''”. This is an arbitrary blank node that has not yet been used within a given b-scope. This implies that an implementation will either keep track of all the identifiers that are already in use within the scope, or alternatively will have some sort of sequence generator that can dispense a new identifier that is guaranteed to have not been dispensed before.''
 
'''''Note''': When declaratively describing the structure of an RDF graph, it is often convenient to use the concept of a “'''fresh blank node'''”. This is an arbitrary blank node that has not yet been used within a given b-scope. This implies that an implementation will either keep track of all the identifiers that are already in use within the scope, or alternatively will have some sort of sequence generator that can dispense a new identifier that is guaranteed to have not been dispensed before.''
  
An RDF graph can be '''copied into a b-scope''' by systematically replacing all the graph's blank nodes with fresh ones in the target scope. The original and the copy are thus guaranteed not to share any blank nodes.
+
An RDF graph can be '''copied into a b-scope''' by systematically replacing all the graph's blank nodes with fresh ones in the target scope. The original and the copy are thus guaranteed not to share any blank nodes. If the source and target scopes are different, and the blank node identifiers do not occur elsewhere in either scope, this can be achieved by simply using the same blank node identifiers in both graphs.
  
 
The '''merge''' of two RDF graphs is the result of copying both graphs into a target b-scope.
 
The '''merge''' of two RDF graphs is the result of copying both graphs into a target b-scope.

Revision as of 16:50, 14 November 2012

B-Scopes

This is a proposal to modify the design of blank nodes in RDF. In the tradition of Sandro's g-* terminology, I will use a placeholder term for the key concept: b-scope.

Requirements

  • Consistency with all resolutions the WG has made so far
  • No changes to other specs beyond Concepts and Semantics necessary
  • No changes to implementations necessary

Motivation

  • The distinction between blank nodes and blank node identifiers is vague and confusing.
  • Blank node identifiers have clear scope, but the specs don't always make that as clear as they should.
  • In practice, blank nodes also have clear scope (they cannot “move around” arbitrarily), but the specs don't acknowledge that, leading to confusion.
  • As we standardize models and languages for working with multiple graphs, this disconnect becomes a larger problem.
  • An example of this disconnect: There is a widely held misconception in the RDF community that graphs cannot share blank nodes. This stems from the fact in practice they rarely do (except in SPARQL stores with a union default graph) and almost never need to, and the specs neither rule it out nor explicitly allow it.
  • The RDF core specs are rather abstract and can sort of get away without talking about the scope of blank nodes, but more concrete specs built on this foundation need to address the issue, and do so in awkward and sometimes incompatible ways; cf. treatment of blank nodes in SPARQL query results, and the different assumptions regarding graphs sharing blank nodes in SPARQL Update and R2RML.
  • The notion of a “fresh blank node”, often used when describing algorithms and mappings that generate RDF graphs, is hard to explain in terms of a single universal arbitrary set of blank nodes.

Origins of the design

  • Pat's “RDF surfaces”
  • Richard's “blank node sequences”
  • Ted's mantra that snapshotting yields new blank nodes
  • ISSUE-107 discussions

The design can be seen as an attempt to take some ideas from Pat's “RDF surfaces” proposal (the notion that blank nodes “live” on a particular surface and therefore have scope, and the notion that graphs can be “copied” from one surface onto another), and fitting them into the WG's existing framework of RDF datasets and g-boxes, while ignoring the other ideas of the proposal (different kinds of surfaces, bundling surfaces into codices, etc.).

Definitions

A b-scope is a scope for blank node identifiers.

A blank node is a pair consisting of a blank node identifier and a b-scope. The blank node identifier uniquely identifies the blank node within the b-scope. If the same blank node identifier is used in two different b-scopes, then we have two different blank nodes. Two blank nodes are equal if their blank node identifiers are equal and they are in the same b-scope.

Note: B-scopes do not need to be explicitly modelled or managed in most implementations. They are theoretical constructs that allow us to talk more formally about “system boundaries” and what happens when data containing blank nodes crosses such a system boundary.

Note: Only blank nodes are bound to a b-scope in this proposal. RDF triples, RDF graphs, and RDF datasets are not. This means RDF graphs and RDF datasets can contain blank nodes from multiple b-scopes, and multiple graphs or datasets can share the same blank node.

Note: When declaratively describing the structure of an RDF graph, it is often convenient to use the concept of a “fresh blank node”. This is an arbitrary blank node that has not yet been used within a given b-scope. This implies that an implementation will either keep track of all the identifiers that are already in use within the scope, or alternatively will have some sort of sequence generator that can dispense a new identifier that is guaranteed to have not been dispensed before.

An RDF graph can be copied into a b-scope by systematically replacing all the graph's blank nodes with fresh ones in the target scope. The original and the copy are thus guaranteed not to share any blank nodes. If the source and target scopes are different, and the blank node identifiers do not occur elsewhere in either scope, this can be achieved by simply using the same blank node identifiers in both graphs.

The merge of two RDF graphs is the result of copying both graphs into a target b-scope.

Note: The set union of two graphs maintains the meaning of the graphs only if the graphs don't share blank nodes. This motivates the “graph merge” operation.

Note: A graph and any of its copies are isomorphic, and are equivalent under any entailment regime.

B-scopes in practice

Blank nodes in RDF documents: A document in a concrete RDF syntax always forms its own separate and self-contained b-scope. For example, taking a snapshot of a g-box, and serializing it in Turtle, creates fresh blank nodes in a new b-scope that is unique to the Turtle g-text. Also, parsing an RDF document implies that there is some target b-scope, and yields fresh blank nodes in the target scope.

Blank nodes in SPARQL and in graph stores: Other specifications that use RDF may place stronger constraints on the management of b-scopes. For example, SPARQL Update is most easily explained by saying that the entire graph store forms a single self-contained b-scope, as blank nodes can be shared between g-boxes in the store, but (for the time being) not between graph stores.

Blank nodes in implementations: Where specifications don't constrain the use of b-scopes, implementations are free to define their own rules. For example, a large RDF processing system may maintain only a single b-scope, and any incoming data that contains blank nodes will first be “adopted” by copying its graphs into that b-scope. Or it may treat each graph/dataset data structure as a separate b-scope, meaning that re-allocation of blank node identifiers may be needed when two such data structures need to be combined into one.

If all blank nodes within a system or within a data structure are guaranteed to be in a single scope, then the scope doesn't need to be explicitly tracked for each blank node, and therefore the blank node identifiers can be treated as being the blank nodes. This is, in fact, what most if not all implementations do today.