/== The Datapatch Vocabulary This document defines an RDF vocabulary for expressing the exact differences between two RDF graphs. This set of differences is collected into a "patch", which might also be called a "diff" or "delta". The datapatch vocabulary is designed to be used as part of an RDF dataset (a collection of named graphs and a default graph), so a patch must be serialized in a dataset syntax like TriG, N-Quads, or JSON-LD. When patches are stored, they are stored in something like a quad store (eg most SPARQL stores), not just a triple store. == SimplePatch The class datapatch:SimplePatch, with properties datapatch:delete and datapatch:insert can be used when the difference do not involve blank nodes. They are very easy to understand and implement, and their execution time scales linearly with the size of the patch. = Example 1 Given graph g1a: @prefix eg: eg:s eg:p 1,2,3. and graph g1b: @prefix eg: eg:s eg:p 1,3,4,5. The simplest patch from g1a to g1b would be (in TriG): @prefix : @prefix eg: { [ a :SimplePatch; :delete <#d>; :insert <#i>; ] } <#d> { eg:s eg:p 2 } <#i> { eg:s eg:p 4,5 } The relative IRIs <#d> and <#i> serve simply to connect this patch with the triples containing the triples to be deleted and added. In the current RDF 1.1 drafts, this connection must be made by IRIs, not blank nodes. For the purposes of datapatch, blank nodes would make more sense, especially given confusion over the active base IRI during an HTTP PATCH operation. == VarPatch The class datapatch:VarPatch extends datapatch:SimplePatch so that all graph changes can be expressed, even ones including blank nodes. By using variables, it allows some differences to be expressed more tersely. VarPatch patches are somewhat harder to understand and implement, and their worst-case execution time scales exponentially with the size of the patch. = Example 2 Given graph g2a: @prefix eg: [ eg:who eg:Alice; eg:where eg:Boston; eg:when "2013-04-20" ]. [ eg:who eg:Bob; eg:where eg:Paris; eg:when "2013-04-20" ]. and graph g2b: [ eg:who eg:Alice; eg:where eg:Boston; eg:when "2013-04-21" ]. # note the date change [ eg:who eg:Bob; eg:where eg:Paris; eg:when "2013-04-20" ]. The simplest patch from g2a to g2b would be (in TriG): @prefix : @prefix eg: { [ a :VarPatch; :where <#w>; :delete <#d>; :insert <#i>; ] } <#w> { _:x eg:who eg:Alice }. <#d> { _:x eg:when "2013-04-20" }. <#i> { _:x eg:when "2013-04-21" }. In this case, the blank node referred to in the TriG file as "_:x" is treated as a variable. Execution proceeds by finding all the bindings for this variable to blank nodes in g2a such that (under those bindings) the where and delete graphs are subgraphs of g2a. Once those are all found, for each such binding, the "delete" subgraph is removed and the "insert" subgraph is added. (These semantics match those of SPARQL 1.1 Update, which explain them in more detail and provide test cases.) VarPatch also provides for name variables which can bind to any RDF term, not just to blank nodes. These are specified by using the datapatch:varPrefix property; IRIs in the where, delete, and insert graphs which start with this variable prefix string are treated as variable. If not specified, the variable prefix string defaults to "http://www.w3.org/ns/var#". = Example 3 The following patch converts any graph to be the empty graph: @prefix : @prefix var: @prefix eg: { [ a :VarPatch; :delete <#d> ] } <#d> { var:x var:y var:z } == MultiPatch @@@ MultiPatch is a sequence of patches. This isnt needed with SimplePatch's since they can always be combined into another SimplePatch, but VarPatch's can't be. See this in example 2b, if we want to also change Bob's location. == Patchable Resources One application of the datapatch vocabulary is to allow an HTTP client to modify data on a server (to which it has write access) without sending the entire resource state representation (as would be done with HTTP PUT). If the server can perform datapatch operations, it MAY use the datapatch classes in advertising this to clients, using some "accepts" predicate. This could be done inside the body of the resource, if it has an RDF representation, or in the metadata available via the HTTP Link header. For example: <> ldp:acceptsViaPOST datapatch:SimplePatch, datapatch:VarPatch; ldp:acceptsViaPATCH datapatch:SimplePatch, datapatch:VarPatch; In this example, the server is saying it implements both SimplePatch and VarPatch, and that they can be sent using either HTTP POST or HTTP PATCH. This example is not suggest that servers SHOULD allow patches to be sent via POST -- just to illustrate how it could be done. This mechanism is general purpose enough to allow new kinds of patches to be easily invented and deployed, without breaking existing systems. == The "From" and "To" Context For some applications it is useful to attach the "from" and "to" states of the patch to particular versions of particular resources. In order to support this, we define datapatch:from and datapatch:to, to link to these versions, and also datapatch:location, datapatch:lastModified, and datapatch:etag to identify them. = Example @@@ ... sending a PATCH to a resource and saying which version it's coming from, so patches can be merged or rejected (for situations where if-match and if-unmodified-since headers can't be used? or why?) = Example @@@ output of a tool that compare page1 and page2 @@@ output of a tool that compares page1 yesterday and today @@@ posting: create page2 as a copy of page, plus this patch (??) == Merging Patches @@@ explain this much more carefully When applying a patch, if :from etag or lastModified do not match, a processor MAY still apply the patch (rather than just rejecting it) if the result would be the same as applying the patch immediately after its :from version. For example @@@ two patches that can be merged In contrast @@@ two patches that cannot be merged This situation is similar to what we see in change control systems, where changes to different parts of a file can occur independently. This rule is necessary to allow for resources which change frequently, perhaps faster than round-trip-time to a client. In that case, the client could never PATCH the resource (let along PUT), because by the time it has obtained one version and sent its patch, that version is always out of date. Such fast changing resource make perfect sense if they represent a pool where a lot of client data appears, and many different clients are changing their state. In such cases, the patches are likely to be mergable. == Extensions to datapatch As an example of how the datapatch vocabulary might be sensibly extended, imagine wanting to use Perl regular expressions to modify literal values in place. Here we could do it by saying that for any binding of the variable var:x, we also attempt a regex match, and it if succeeds, we replace that matched text with the supplied replacement text: eg:RegexPatch # extends VarPatch ... :matchVar var:x; :match "[a-z_]*"; :replace ""; Alternatively, a more general solution might be to allow executable patches. That could be done like this: js:RunPatch # extends VarPatch adding onMatch: " ... javascript code ... " The code is run in an environment with the terms "db" and "binding" defined. binding is an object with a key for each variable suffix or blank node label, and the value is what it's bound to. db implements the following interface: match(s, p, o, callback(err, s, p, o)) insert(s, p, o, cb(err, wasPresent)) ... etc ... exit value of truthy means to deliver more bindings; falsy means no more bindings for this patch run. ================ TODO: Add: AtomicPatch for atomicity (which alas can't be done with std SPARQL) Switch: not from and to but fromLocation, fromLastModified, etc, because it'll be too confusing if the from/to objects have IRIs. Put this explanation in a NOTE. Don't call it "Context" since that conflicts with the idea of a context diff. Add: multiple syntaxes, json-ld, xml-ld, with pre-defined @context Add: multi-syntax examples Syntax for this example: [ ] Turtle [ ] JSON-LD [ ] XML-LD Fix: note that multipatch isn't really ever needed; just make sure there is exactly one binding. Problem: if there are a ton of solution bindings, isn't that a pain? Add: VarPatch example of setting a property, where we first delete any/all values for that property. Has to be done as a MultiPatch; Should be done as an AtomicPatch, I'd think. Request Atomic, and if it's ignored, it's ignored. Add: Conformance section, with Patch Creator, Patch Processor, and Patch Document. Add: test suite instructions (wiki test suite?) Consider: do this in a CG? bring back to WG when/if ready