From RDF Working Group Wiki
Jump to: navigation, search

This is defining the notion of blank node scope and related concepts.


In RDF Semantics, a blank node is interpreted as an existential variable, that is, it indicates the existence of a resource instead of denoting the resource as IRIs or litterals do. The same blank node can appear in an infinity of RDF graphs and depending on which graph is considered, the blank node may indicate the existence of different things. For instance, given a blank node b, the graph comprising the following triples:

(b, <p>, <x>)
(b, <p>, <y>)

indicate the existence of a resource in relation with both <x> and <y>. However, the two triples separately, or rather, the two singleton graphs, together, indicate the existence of a thing related to <x> and the existence of a thing related to <y>, potentially two distinct resources, in spite of the graphs using the same blank node. The existence is scoped to the graph being considered for interpretation. When processing a set of triples, an application must determine what subsets have to be considered as forming one graph, so that the scope of the existential quantification be known.

Considering the example above, there are cases when the two triples are found in a single file, say in RDF/XML or Turtle, in which case, it is normally assumed that the blank nodes refer to the existence of one thing. If the two triples come from different sources, it is usually assumed that they are distinct. However, there are cases when the scope is not well determined, so that the application has to make a choice about how to group the triples to be processed. For instance, a single file in TriG may not correspond to a single scope of blank nodes because TriG can serialise several well delimited graphs in the same file. A set of distinct N-triples files may serialise a single RDF graph that is too big to be handled conveniently in one file. Or a data stream may split triples into several packets.

Scope and concrete graphs

For this reason, we introduce the notion of scope explicitly. A scope is something that group together a set of triples to indicate that any blank node appearing in those triples is assumed to express the existence of the same thing throughout the triples (in some sense, it defines a region in which each blank node has one meaning, to follow PatH's words). Triples can be given a scope and form a "concrete graph", formally defined as a pair (s, g), where s is a scope and g is an RDF graph. A concrete graph can be seen as a "copy" of an RDF graph, which can exist in a file, in memory, on paper, etc. So the same RDF graph can be attached to multiple scopes.

This specification does not say how the scope is explicitly provided and it is up to the application to determine how it groups triples. However, there are several existing conventions: a file in RDF/XML clearly indicates a beginning and an end in between which triples are defined; those triples are normally assumed to be part of one scope. In Turtle, the same is usually assumed, although there are files that can be concatenate to form new Turtle files, so the beginning and the end of the graph in consideration could in fact span across files. In N-triples, there is no indication of where the triples start and end. A stream of RDF data could transmit triples in multiple packets comprising sets of RDF triples. In this case, the application has to rely on external knowledge to decide what the scope of the blank nodes is. In serialisation formats for RDF datasets, it can be assumed that the scope is given by the named graphs. However, this would depend in part on the semantics of dataset, which is not normatively defined.

What to do with scope?

An application that determines that two sets of triples are in the same scope should treat the two graphs as the union of the sets. An application that determines that two sets of triples are in different scopes should either merge the two graphs into one with its own distinct scope, or keep the two graphs separated, for example by assigning them to different named graphs in a dataset.