DiffAndPatch

From W3C Wiki

Content MOSTLY moved to DeltaView.

It could perhaps use a few bits from here, still.


There is a duality in any dynamic system: there is the current version, and there is the historical series of changes leading up to the current version.

Analogies

In Unix systems administration, this is "diff" and "patch".

In Navigation, this is "Dead Reconning".

In physics this is Delta and Sigma, or the derivative with respect to time and integral over time.

In databases, the collection of diffs is the Transaction Log.

Issue

Sometimes the starting point is zero/empty, and you can omit it. Sometimes you need to be clear about it.

Idea

Merging simultanous diffs which do not conflict (a fuzzy notion?) like CVS can be a great win. Optimisitic concurrency control.

RDF

Computational Challenges

Finding the diff between two RDF graphs is computationally hard.

sandro implemented it in blindfold via XSB (LinkMe), and it seems fast but it's technically still NP in the number of bNodes. (I think...) With no bnodes, it's linear after sorting.

Jeremy Carroll wrote a paper on RDF diff for ISCS 2002. (LinkMe). Is the code in Jena?

Which-bNode Challenge

In fact, given the presence of bNodes, it is sometimes impossible.

Example:

   diff { _:x color red ; size large } to { _:y color red }

This is like trying to update a record in a relational database when the record has no key. Options are:

  • fail to create a diff
  • have the diff provide as much context as possible, then
  say it applies to all situations which match the context.
  • or have it apply to one such situation.

Best guess:

   forall ?n
   where { ?n color red ; ?n size large }
   delete { ?n size large }


and issue a warning there might be more that one such triple.... But..... there might ALWAYS be more than one such triple. Even if cardinality constraints say they mean the same thing. Hrm.

Another Example:

   diff { [ a Red, Large ].  [ a Dirty, Large ]. }
   to   { [ a Red, Large ].  [ a Dirty, Small ]. } 


In English: There is something which is red and large, and there is also something which is dirty and large (but becomes dirty and small).

Suggestion:

   forall ?thing
   where { ?thing a Dirty, Large }
   delete { ?thing a Large }
   insert { ?thing a Small }   


On the given input, knowing nothing else, this would do the right thing.

On other inputs, like

    { [ a Dirty, Red, Large ].  [ a Dirty, Large ]. } 

it would also change the facts about the first thing. So issue it with a warning, I guess? Tell people to name things (with at least sufficient description) if they are going to want to change what is said about them!

In general: avoid bNodes in RDF  :-) IdentifyEverything.