Warning:
This wiki has been archived and is now read-only.

User:Azimmerm/Blank-node-scope-again

From RDF Working Group Wiki
Jump to: navigation, search

The key to my proposal is that at any given point in time, there are a number of RDF triples that have a concrete realisation in the world (in files, in digital memory, on sheats of paper, on surfaces). We start by asserting that:

  1. RDF graphs are sets of triples;
  2. there is a distinguished subset of all triples that comprises the "concrete triples";
  3. each concrete triple exists in a single scope (i.e., equivalently, there is a partition of the concrete triples, where each set of the partition delimits a scope);
  4. it is possible to distinguish the bnodes in different scopes as they are physically placed differently, so we can assume that set of bnodes in a scope are disjoint from the ones in any other scope (this may not be completely necessary, but it's probably simpler this way);
  5. some bnodes are recognisable by an identifier (that can be an in memory structure or a handwritten symbol) that is used at multiple places: it is assumed that the same identifier in the same scope always indicates the same bnode, but always indicate different bnodes if placed in different scopes. This means that two physically distant occurrences of a symbol may refer to the same bnode.

Formally, we have the set of RDF triples T, a subset CT of concrete triples and a partition P(CT) of CT such that for each pairs of sets G1, G2 in the partition, G1G2 => bnodes(G1) disjoint from bnodes(G2). The RDF graphs that form the partition are called the scopes of the concrete triples, and all subsets of them are called scoped graphs.

For each scope s, there is a labelling function l(s) that assigns identifiers to the bnodes appearing in s, such that l(s)(b1) = l(s)(b2) iff b1 = b2. Obviously, labelling functions of different scopes operate on disjoint sets of bnodes, so the same identifier on bnodes in different scopes does not indicate the same bnode.

An application, or a person, must be able to decide whether two occurrences of an identifier belong to the same scope, such that, if desired, they can unify them. This situation is conveniently addressed by the convention that the representation format (serialisation syntax, in-memory model, drawing convention, etc) provides on bnode identifiers.

These conventions allow agents to recognise the identity of bnodes, so they know what union is, for instance. The problem is, the union of two graphs does not necessary correspond to a scope. In fact, with the formalisation above, it never corresponds to a scope. This is because concrete triples and their partition correspond to a state. Union of all sets of graphs exist at the abstract level. But we need a notion that actually builds a new graph and makes it "real" (makes the new graph a set of concrete triples).

A state comprises a set of concrete triples and a partition of that set in scopes. In a given state, the merge of two scoped graphs in different does not exist (yet). But there is always a merge of a set of scoped graphs made of triples that are not concrete. So it suffices to say that a "concrete merge" (as opposed to a merge of any abstract set of triples) of a set of scoped graphs {G1, ..., Gn} is a set of non-concrete triples that is equivalent to the union of the scoped graphs. So the merges of all possible sets of scoped graphs is formally defined, but these merges do not exist concretely.

This leads us to introduce operations that make transitions from states to states:

A copy of a scoped graph G1 in state S1 produces a state S2 such that all scopes of S1 are in S2 and there is a single additional scope G1 equivalent to G2 with only triples outside the concrete triples of S1.

A "realisation" of a (concrete) merge of a set of scoped graphs in S1 produces a state S2 which contains the scopes of S1 plus a single scope that is a "concrete merge" of the set.

We could as well defined delete, update, etc. but it may not be necessary.