This document defines a precise way to express changes in data. It is specified in terms of RDF, so that it can be applied to any kind of data for which there is an RDF view, including SQL databases and spreadsheets. It has been designed specifically for use in updating web resources (using the http PATCH method), but it is likely to have wider applicability.
# Introduction
This section is non-normative
Data is often provided on the web by making a data file available at some URL using some well-known format. In some cases, more sophisticated systems may allow authorized http clients to update that data in place. That update could be done using the http PUT operation, which allows clients to request the contents be entirely replaced, or with application-specific usage of http POST. Since RFC-5789 (PATCH Method for HTTP), another potential option is to do a partial modification with PATCH. HTTP PATCH requires a patch format be defined which is suitable for the resource being patched. For example RFC-6902 (JavaScript Object Notation (JSON) Patch) defines the "application/json-patch+json" media type for patching resources with "application/json" representations. With this specificaton, we take a broader approach, recognizing that in many cases the underlying data — not any particular representation or serialization of it — is the topic of interest. As a general abstraction of what "data" might be, we use the Resource Description Format (RDF), which is both widely supported in software and general enough, as an abstraction, to work as a view for SQL Relational data [R2RML], spreadsheet data [London Workshop], etc. We therefore view each resource to be patched as a graph-state resource; that is: a web resource whose "state" (in terms of REST) is an RDF Graph. A client doing a PATCH is requesting the server change the resource state from being one particular set of RDF triples (one RDF Graph) to being a different set of triples (a different RDF Graph). Any data change is thus a request to remove zero or more triples and add zero or more triples. If the underlying data store is not an RDF graph store, this change might actually manifest in some other way, such as with one cell in a SQL table being updated or a set of redis key values being modified. Those details will depend on the particular data storage being used; this specification remains independent of them by treating everything as RDF. ## Goals Our primary goal is to allow an http client to ask a server to modify the state of a graph-state resource, updating it from a specific state known to both client and server. If the patch succeeds, the resulting resource state will also be known to client and server, so that a sequence of patches can be made, each one starting from the ending-state of the previous one. It is a secondary goal to allow concurrency to be handled gracefully, without unrecoverable errors. That is, if client 1 is making a sequence of patch requests and client 2 also makes a patch request, the operations should be handled in such a way that clients can recover and no data is altered unpredictable or corrupted It is not a goal to allow clients to usefully modify resources without knowing their starting state. While it is conceivable that a patch mechanism could be designed so that clients could issue "blind" PATCH requests with good results, it is beyond the scope of this version of datapatch. ## Extensibility This document specifies one type of patch, called a BasicPatch, and recommends it be used in such a way that other types of patches could also be used whenever both client and server support them.

There is currently no specification for how a client can determine which patch classes a server supports. The HTTP mechanism only works on content-type, so it doesn't apply when all patches are expressed as RDF datasets. Options include using a profile argument on the content-type, or using a predicate like { ?resource :acceptsPATCH ?patchClass }

.

Some possible ideas for extensions:

* Allow wildcard deletes, so that in some cases triples can be deleted without enumerating them * Allows some code (JavaScript or JVM) to be included which programmatically adjusts the patch, so that arbitrary change can be made server-side. ## Blank Nodes and Performance The RDF model uses (subject, property, value) triples, where the subject and any non-literal values are optionally identified by an IRI. If an IRI is not supplied, the terms are called "blank nodes". In supplying data, decisions must be made about when to assign IRIs instead of leaving nodes blank. With patch, this decision has particular significance since patching a resource with blank nodes is a form of the graph isomorphism problem, which is generally accepted as being intractable. Fortunately, the problem can usually be managed in practice; details are covered in Performance Considerations. ## Syntax In datapatch, patches themselves are expressed in RDF. In order to conveniently talk about RDF inside RDF, datapatch uses RDF datasets, which go beyond a single set of triples (a single graph) to also include additional "named" graphs. A downside to this approach is that conventional RDF syntaxes like RDF/XML and Turtle cannot conventiently be used to express patches. Instead, newer dataset-aware syntaxes must be used, such a JSON-LD and TriG. SPARQL works with datasets as well, so SPARQL can be used to store and query over patches.

There is no good syntax to use in this document. The SPARQL graph pattern syntax and TriG are almost the same, but they differ in trivial ways, making them incompatible. Perhaps the next draft of TriG will fix this. For now, we'll assume it does, and use a subset of SPARQL as a declarative language, as if it were TriG.

## Relationship to SPARQL Update

TODO: Explain how users should navigate the overlap in functionality between datapatch and SPARQL 1.1 Update.

# Use Cases and Examples The following subsections explain some of situations in which datapatch might be useful and which have motivated its creations. They are each illustrated with examples based on the following scenario:
Imagine a group of scientists is replicating the classic Stanford marshmallow experiment, where a child is given a marshmallow and offered a second one if they can refrain from eating the first one 15 minutes. In this example, the scientists are looking for cultural variations, so they have recruited collaborators from around the world. Each of the collaborators is given access to the complete dataset and asked to add their own observations each time they run the experiment. The collaborators are granted machine-level access so they can create their own local-language/local-culture experimenter user interface, and they are given free reign to include their own additional data (beyond the common fields such as age of the child, and how long they refrained from eating the first marshmallow).
## Adding Data The simplest use case for datapatch is to add to some unified collection of data (a database, in the most general sense) using HTTP. Without PATCH, this would require either (1) downloading the contents and then uploading a modified version using PUT, or (2) defining an application-specific usage of POST. Using PATCH, TriG, and datapatch:BasicPatch, it could look like this:
Client sends:

PATCH /observations HTTP/1.1
Host: www.example.org
Content-Type: text/trig
Transfer-Encoding: Chunked

PREFIX :      <http://www.w3.org/ns/ldp#>
PREFIX xs:    <http://www.w3.org/2001/XMLSchema#>
PREFIX mme:   <http://www.example.org/2013/world-marshmallow-vocab#>
PREFIX univ:  <http://local-universite.example.org/psychlab-vocab#>

 [ a :BasicPatch;
   :where  <#w1>;
   :delete <#d1>;
   :insert <#i1> ]

 <#w1> { }
 <#d1> { }
 <#i1> {
    [ a mme:ExperimentRun;
      mme:observer univ:DrSmith, univ:DrJones;
      mme:countryCode "UK";
      mme:childAge 7;
      mme:secondsLasted 35
    ]
 }

Server responds with a 2xx, indicated success:

HTTP/1.1 204 No Content
Content-Location: /observations
ETag: "e0023aa4f"

## Removing Data ## Updating ## Keeping a Record of Changes ## Mirroring Data ## Sharing Application State # Definition of BasicPatch @@@ rename to DataPatch? A _patch_ is a relation involving two sets of RDF triples (that is, two RDF Graphs). Specifically, a patch __from__ RDF Graph G1 __to__ RDF Graph G2 embodies the differences between G1 and G2 such that a system storing G1 can algorithmically construct G2. A _BasicPatch_ is a kind patch of which embodies these differences by simply enumerating the triples to change. If one is construcing G2 from a copy of G1, the BasicPatch says which triples to delete and which to insert. If one is constructing G2 from G1 (without modification), the BasicPatch can be understood as saying which triples to leave out and which extra ones to include. There is one complication: blank nodes. Because they lack identifiers (IRIs), when they occur in triples, those triples cannot simply be enumerated for deletion or insertion. Instead, BasicPatch uses another blank node as a placeholder and provides additional constraints showing which blank node it is standing-in for. Servers are then required to figure out which blank node in G1 is meant by each placeholder in the patch. (See Performance Considerations for guidance on how to do this.) Formally speaking, a BasicPatch is expressed using the elements of a dataset matching this SPARQL pattern:

    ?patchId a :BasicPatch;
             :where ?w;
             :delete ?d;
             :insert ?i.

    GRAPH ?w {  ... the "W" graph ...  }
    GRAPH ?d {  ... the "D" graph ...  }
    GRAPH ?i {  ... the "I" graph ...  }
To define the meaning of this pattern, we first define these functions: * Given two RDF graphs X and Y, union(X, Y) is the RDF Graph which contains exactly those triples which are in X or Y * Given two RDF graphs X and Y, minus(X, Y) is the RDF Graph which contains exactly those triples which are in X but not in Y * Given a function M and a graph X, mapNodes(M, X) is the Graph which contains exactly the triples (M(S), P, M(O)) such that (S, P, O) is a triple in X. In the conditions below, "W" is the graph paired with ?w in the dataset, as suggested in the pattern above. Similarly for D/?d and I/?i. If the dataset pattern above is matched in a dataset, that dataset is true only if: * ?patchId is a "patch" from some graph G1 to some graph G2. * M1 is a mapping such that mapNodes(M1, union(W,D)) is a subgraph of G1 * M2 is a mapping to fresh blank nodes from each the blank nodes in I which is not in W or D * M is the union of M1 and M2 * G2 = union(mapNodes(M, I), minus(G1, mapNodes(M, D))) # HTTP PATCH using BasicPatch ## Describing vs Sending
(optional -- not really needed for LDP) Imagine the situation where we have a MultiPatch, which includes subsidiary patches, in a list. How would you know which was the "root", the one you were supposed to start with. When I POST or PATCH with an RDF graph, how do you look at it to find the root or command, in general? Or are you just supposed to query it for everything that looks runnable? But in the multipatch case, you'd get something wrong.
## Patching Datasets
just a sketch
  ?patch :onNamed ?NamedGraph
          lets you patch a dataset; you send this PATCH and including this
	  triple to say it applies to a specific named graph
 
  If you don't say this, its just a patch of the default graph.
## Literal Matching In RDF (using the Turtle notation), these literals are different literals with the same value: * "01"^^xs:int * "1"^^xs:int * "1"^^xs:decimal * "1.0"^^xs:decimal
Similarly, the language tagged strings "chat"@en-US and "chat"@en-us are different literals with the same value. -- or are they the same literal??? To-be-determined by RDF WG.
In practice, RDF systems may compare literals using either lexical comparison or value comparison. RDF systems also vary in that some retain the lexical details of the literal, while others retain only the value. In order to provide deterministic patch behavior: * When communicating with each other, clients and servers MUST always use the same lexical details (the lexical form and the datatype IRI) for each literal. * Servers MUST act as if they use lexical comparison, not value comparison, during matching. For example: 1. start woth empty-graph resource 2. client inserts { :a :b "1"^^xs:int } 3. client inserts { :a :b "1"^^xs:short } server has two triples 4. client inserts { :a :b "01"^^xs:short } server has three triples 5. client deletes { :a :b "1"^^xs:int } 6. client deletes { :a :b "1"^^xs:short } 7. client deletes { :a :b "001"^^xs:short } fails 8. client deletes { :a :b "01"^^xs:short } server has zero triples ### Using Native Literals
This section is non-normative
The above restrictions do not prohibit clients or servers from using efficient native representations, but it does requires that that they do so with a consistent one-to-one mapping, so that the lexical details of the literal can always be reconstructed. For example, JavaScript stores all numbers as in IEEE floating point double. A possible 1-to-1 mapping in that environment:
RDF Datatype JavaScript Type Constraints
xs:integer Number integer, in canonical form, in range -9007199254740992 .. 9007199254740992)
xs:double Number in canonical form, non-integer, or integer outside of in range -9007199254740992 .. 9007199254740992)
## Performance Considerations @@@ the section's a bit rough stuff. In the worst case, applying a patch takes exponential time, since calculating the blank node alignments is the graph isomorphism problem which has complexity NP. Certain structures of patches involving a complex graph using only blank nodes (see test case blank-tree-50) cannot be applied by any known computing technology. Fortunately, there are several steps the clients and servers can take to avoid this problem. * Use consistent blank node labels. If the client writes the patch using the same blank node labels as the server used in its original content, and the server tries the same-name isomorphism first, the patch can be applied in linear time. * Order the where and delete triples to fast matching. If the client puts the triples in the patch in the right order, and the server matches them in the order provided, the patch can be applied in linear time. Efficient algorithms for finding this 'right order' are not known at this time. * Avoid data models that have large all-blank sub-graphs. In the worst case, servers should set a reasonable resource limits on processing patches and be prepared to reject them. ### Stability of Blank Node Labels ### Ordering of Triples ### Skolemizing ## Concurrency @@@ this section is just rough notes....
SHOULD BE:   respect if-match, use if-match.

@@@ Basically, respect if-match.  But *I THINK* the server is allowed to cheat a little, and do a patch even if if-match is wrong, as long as the result would have been the same doing two patches in a different order.

Or not....   Actually as a client, what I want is to be getting a feed of those other people's changes.

If a server gets two or more patches with the same if-match etag it SHOULD merge them (instead of rejecting all but one) as long as the result would be the same AND it has some way to tell the client someone else also made some changes....???

Or I just don't if-match -- I could just patch blindly.

TEST CASE:
   two changes with the same if-match.

what happens???

In practice, it's probably okay to ADD blindly and even DEL blindly, but set/edit/alter  (DEL+ADD) .
# HTTP GET-ing BasicPatch
@@@ just raw notes.

Basically, the only way to gracefully handle multiple clients doing
patches is if they can also GET patches.

That's pretty easy to do, I think:
Servers MAY allow clients to get patches, so that clients can maintain an accurate copy of the resource representation when it is being changed by other clients or by server logic. One approach for this: the server provides this information either as part of the resource representation or via the Link header:

  ?resourceURL ldp:nextBasicPatch ?nextPatchURL
This tells clients a GET of ?nextPatchURL can be used to obtain a patch from the version which has that triple in it to some later version. The later version need not be the patch actually sent by a client; the server MAY merge all patches from the starting version until the time the GET is actually handled. If there have been no changes since this version, GET on ?nextPatchURL MUST return a 404. In contrast:

  ?resourceURL ldp:nextBasicPatchLongpoll ?nextPatchURL
has the same semantics, and the same behavior, except that if no changes have occured, a GET on ?nextPatchURL MUST not return a 404 or a 2xx. The server MUST keep the connection open, not returning any result, until there is a change or the server resource limits for this client are exceeded (in which case a 503 Service Unavailable MUST be returned.) Clients SHOULD anticipate receiving 504 Gateway Timeout as well, due to intervening proxies. # Test Cases @@@ use TriG to define dptest:LoopTest sequences of patches from a zero-triple resource back to a zero-triple graph.