TurtlePatch

From Semantic Web Standards
Revision as of 10:41, 2 November 2012 by Sandro (Talk | contribs)

Jump to: navigation, search

This is a sketch of a spec.

Motivation

We want to update the state of RDF Graph-State Resources, using HTTP PATCH, without a full SPARQL implementation. With full SPARQL, we could send an application/sparql-update document. But we want the server to be very easy to implement. So, text/turtle-patch.

Example

 PREFIX foaf <http://xmlns.com/foaf/0.1/> 
 DELETE DATA {
 <http://www.w3.org/People/Berners-Lee/card#i> foaf:mbox <mailto:timbl@w3.org>
 }
 INSERT DATA {
   <http://www.w3.org/People/Berners-Lee/card#i> foaf:mbox <mailto:timbl@hushmail.com>
 }

Goals

A sub-language of SPARQL update.

Trivial to parse, if you have a Turtle parser

Operations are simple to code, and never computationally "hard", because the syntax limits us to easy patches. In particular, no WHERE clauses, with their need for doing JOINs.

Discussion

Blank Nodes

Do we allow blank nodes in DELETE? We can't allow a blank node label to be used in two or more places, because that would require a join, but we could allow blank node labels which are used exactly once in the patch. (What are the semantics of that in SPARQL? Does it act like a wild card, or just match another blank node? Can it delete more than one triple?)

Variables?

Maybe it's okay to have variables and WHERE clause. As long as the predicates involved in the graph pattern only occur once for each subject/object pair, operation time will remain linear with the number of variables, instead of exponential.

Musings about Parsing

Can we actually parse out the Turtle subsections with a Regexp? Discussion on parsing Python's triple-quoted strings using regexps suggests that while it might be possible, it's not worth it. Note all the Turtle quoting and escaping mechanisms that might come into play.

That makes it start to seem like we'd need a modified Turtle parser to do this.

An alternative is to use a subset of Turtle without 0x10 (line feed) inside string literals. Require that the escape sequence \n be used instead of the literal character. Maybe turtle serializers should have a flag for that.... that might already be the default for many of them. If they can't do that, is the process of turning \n inside strings simpler than parsing full turtle? It's pretty bad,

Anyway, with this restriction, it would be okay to parse the patch with a regexp. Something like this:

p = re.compile(r"""
(?P<prefix_declaration>(\s*(PREFIX.*|\#.*|)\n)*)  # prefix block, with comments
\s*DELETE\s+DATA\s+\{\s*\n                        # "delete" header
(?P<delete_pattern>(\s*[^}].*\n)*)                # "delete" pattern
\s*}\s*\n\s*INSERT\s+DATA\s+\{\s*\n               # "insert" header
(?P<insert_pattern>(\s*[^}].*\n)*)                # "insert" pattern
\s*\}(\s|\n|\s*\#.*)*                             # footer, with comments
""", re.MULTILINE+re.VERBOSE)

This assumes a rather line-oriented grammar, where the first character on the line basically tells you where we are:

# it's a comment
P it's a prefix line
D it's the DELETE DATA line
I it's the INSERT DATA line
} it's the end of either the INSERT DATA or DELETE DATA pattern

That seems reasonable.

So you run that regexp above, then probably run a findall on the PREFIX section the find pairs of prefix names and prefixes, then put those into turtle syntax for the turtle parser.