TurtlePatch

From Semantic Web Standards
Jump to: navigation, search

Status: Proposed by Sandro Hawke, December 2011. Discussed as one option by the Linked Data Platform Working Group, but the group decided not to standardize in this space yet. Largely rewritten January 2014.

Implementations: none known at this time. Report any to sandro@w3.org.

Motivation

Motivation for RDF Patching

A Web server is offering some RDF data, at http://example.org/g1.

1. G1 was very big, and the server wants to allow authorized clients to modify it without them needing to re-transmit the whole thing using an HTTP PUT.

2. Different clients want to update different parts of G1 at the same time. With PUT they would have an if-match etag conflict and have to re-try. If G1 changes faster than some client's round-trip time, it will never be able to update it. With PATCH, in some applications, the client can omit if-match, doing "blind" patching, and still have acceptable results. (Arguably G1 should be split into different resources to address this kind of problem.)

3. The server wants to offer a stream of patches to other systems (clients or slave servers), so they can efficiently maintain an up-to-date copy of g1.

Motivation for TurtlePatch Specifically

TurtlePatch is a subset of SPARQL 1.1 UPDATE which is easier to parse, easier to process, and guaranteed to be fast to process. As such, TurtlePatch is better than SPARQL 1.1 UPDATE in two situations:

1. If patch receivers need to be simple code, not a full SPARQL 1.1 UPDATE implementation. For example, if the receiver is running in a browser or mobile app, or is a small add-on to an existing application.

2. If receivers need to be able to apply all patches in linear or near-linear time with respect to the size of the patch. (Using full SPARQL UPDATE, a poorly-constructed patch can easily require exponential time to process. For certain unlikely graphs, it may not be possible to construct a patch that can be processed under practical resource constraints. Such graph simply cannot be patched with TurtlePatch since they involve blank nodes, but there is no risk of a slow patch process.)

In these situations, it may be better to use text/turtlepatch instead of application/sparql-update for the PATCH language.

Example

 PREFIX foaf: <http://xmlns.com/foaf/0.1/> 
 PREFIX s: <http://www.w3.org/2000/01/rdf-schema#>
 DELETE DATA {
   <http://www.w3.org/People/Berners-Lee/card#i> foaf:mbox <mailto:timbl@w3.org>.
 }
 INSERT DATA {
   <http://www.w3.org/People/Berners-Lee/card#i> foaf:mbox <mailto:timbl@hushmail.com>.
   <http://www.w3.org/People/Berners-Lee/card> s:comment "This is my general description of myself.\n\nI try to keep data here up to date and it should be considered authoritative.".
 }

Details

TurtlePatch is defined as the syntactic and semantic subset of the SPARQL 1.1 UPDATE language with the following characteristics:

  • The only allowed SPARQL syntactic constructs are BASE, PREFIX, INSERT DATA, and DELETE DATA
  • The expression between the curly braces is only Turtle (no GRAPH Keyword)
  • String literals MUST NOT contain unescaped newline characters
  • The document must be of the form:
    • zero or one BASE directives, followed by
    • zero or more PREFIX directives, followed by
    • zero or one DELETE DATA directive, followed by
    • zero or one INSERT DATA directive
  • Each of the directives, and the closing braces, must be on a line by itself with no extra whitespace.

This syntax is design so that a TurtlePatch document can be (1) correctly processed by any standard SPARQL 1.1 UPDATE system, and (2) turned into two Turtle documents, enumerating the triples to be deleted and added, with very simple code.

Note that following the semantics of SPARQL 1.1 UPDATE, if any triples in the DELETE DATA clause are not matched, they are ignored.

As an RDF Syntax

In order to support patching RDF graphs which contain blank nodes, servers that offer patching via TurtlePatch SHOULD offer Skolemized views of resources. Specifically, they SHOULD answer GET requests where the Accept header is text/turtlepatch with a TurtlePatch document patching the empty graph to a Skolemized view of current state. The Skolemization MUST be done such that future PATCH operations can use these genid IRIs to refer to the blank nodes in the graph.

For example:

GET http://example.org/alice
Accept: text/turtle
 
   PREFIX foaf: <http://xmlns.com/foaf/0.1/> 
   PREFIX : <http://example.org/>
   :me foaf:knows [ foaf:name "Bob" ].


GET http://example.org/alice
Accept: text/turtlepatch
 
   PREFIX foaf: <http://xmlns.com/foaf/0.1/> 
   PREFIX : <http://example.org/>
   PREFIX genid: <http://example.org/.well-known/genid/d7432/>
   INSERT DATA {
   :me foaf:knows genid:517.
   genid:517 foaf:name "Bob".
   }

At some point after do the second GET, the client can use genid:517 (that is, http://example.org/.well-known/genid/d7432/517) in a PATCH operation on http://example.org/alice

Wildcards

It would be nice to have some way to delete triples without completely repeating them. A sort of DELETE <s> <p> * operation. Maybe we can just allow DELETE with only one variable? Maybe we can make an easy-to-parse subset of SPARQL for this.

Advice on Parsing

Is there any easy way to get a Turtle serializer to use \n instead of newlines? Let's gather a list of flags for different systems.

You can parse the patch into your prefix declarations and two turtle strings with a regexp like this:

p = re.compile(r"""
(?P<prefix_declaration>(\s*(PREFIX.*|\#.*|)\n)*)  # prefix block, with comments
\s*DELETE\s+DATA\s+\{\s*\n                        # "delete" header
(?P<delete_pattern>(\s*[^}].*\n)*)                # "delete" pattern
\s*}\s*\n\s*INSERT\s+DATA\s+\{\s*\n               # "insert" header
(?P<insert_pattern>(\s*[^}].*\n)*)                # "insert" pattern
\s*\}(\s|\n|\s*\#.*)*                             # footer, with comments
""", re.MULTILINE+re.VERBOSE)

So you run that regexp above, then feed match.prefix_declaration+match.delete_pattern to a turtle parser to get the delete triples and match.prefix_declaration+match.insert_pattern to get the insert triples.