From W3C Wiki

(Trying to restate DiffAndPatch better.... Probably not done yet.)

In designing distributed/concurrent systems, it can be useful to model each state changes(each delta) as an object and the state of the system as the sum of the changes (deltas) since its creation.

This corresponds roughly to:

  • In physics, viewing motion as a series of tiny changes in position
  (delta x) and position as the sum of all such changes (relative to
  a starting position).
  • The state of a database being reconstructable by re-running the
  transaction log.
  • A CVS file's state being recontructed by starting with the original
  version and applying all appropriate diffs/patches.

This view has some advantages:

  • Deltas can sometimes be reordered, composed, and decomposed. CVS
  automerging shows this off nicely, when most of the changes can be merged,
  but one or two "chunks" need to be handled manually.
  • State can be synchronized in multiple systems by transmitting just
  the deltas.
  • Monotonic reasoning can be applied to normal (non-monotonic) datasets
  by reasoning about the set of deltas.

For RDF this is all made fairly simple by the fact that the only changes possible are adding and deleting triples, but it is also made nearly impossible (in two different ways!) by the fact that some of the terms in triples (the bNodes) are by definition impossible to name.

The first problem with bNodes is that they turn the process of calculating a delta from n-log-n (sort both graphs, then compare) into being exponential (NP-Complete, kinda; it actually has its own class called GI after graph isomorphism) with respect to the number of bNodes. (the "graph isomorphism problem")

The second problem with bNodes is that in certain graphs there is no way to specify certain deltas.

Graph-1:  [ a Old, Red, Large ].  [ a Red, Large ].
Graph-2:  [ a Old, Red, Large ].  [ a Red, Small ]. 

There is no delta from Graph-1 to Graph-2. (This is obvious if you take out "Old", but I'm pretty sure the RDF model theory makes it true even with "Old." Intuitively we want to identify the second individual as the one which is not said to be old, but ... there's probably no way to say that.)

Actually, that's wrong! (I'm correcting myself, months later.)

Note that Graph-1 is semantically equivalent to [ a Old, Red, Large ]. so the delta to reach Graph-2 is the addition of the second part there (two triples). The key here is to think in terms of the actual semantic content of the graph, not the syntax. If you just want a delta of the syntax, that's a different problem.

This probably completely removes the second problem. Any time you knew the bNodes were distinct, you could use that distinctness to specify which one to change. Or if you know there are two, and you want one to change but don't care which, that's okay too. The delta is like for some x such that p1(x,a) & p2(x,b), retract p1(x,a), assert p1(x,c). That only entitles making the change once. If p1/p2 is exactly what's know about several things, this change will be made to one of them. (You might also want a for all but that's a different problem.


See also BeesAndAnts

For Example...: One could use the RSS feed with diffs of the RecentChanges page of this wiki to replicate the Wiki.

Harmony Project home page (unison folks moving to tree-structured data)