How to diff RDF
The following list is in part based on a discussion at the W3C Semantic Web mail list.
Implementations
RDFLib
In RDFLib 3, the Python library for RDF, there is a module (rdflib.compare), which has tools for diff:ing graphs (using an algorithm by Sean B. Palmer for e.g. comparing bnodes). Take a look at the documentation (docstrings) in the module for some usage examples:
<https://github.com/RDFLib/rdflib/blob/master/rdflib/compare.py>
It's programmatic usage, but since you get the diffs as graphs, you can serialize them using the API, e.g.:
from rdflib import Graph from rdflib.compare import to_isomorphic, graph_diff # ... use code like in the documentation # ... print in_both.serialize(format="n3") print in_first.serialize(format="n3") print in_second.serialize(format="n3")
By Daniel Krech (eikeon) and others
Note that with rdflib, it is possible to de-skolemize blank nodes: https://github.com/RDFLib/rdflib/issues/1404
Jena API
Jena rdfcompare - A command line tool written in java which loads two RDF files into Jena RDF models and uses an API call to check if the models are isomorphic.
- The Good: Seems to do a good job at correctly telling whether two graphs are isomorphic. Can compare two files in different RDF formats.
- The Bad: Doesn't give any analysis of the difference between the files (like you'd expect from UNIX diff). The differences can be retrieved with:
Jena rdfdiff - A command line tool that compares two graphs.
Jena is open source and grown out of work with the HP Labs Semantic Web Programme.
JSON-LD
With jsonld.js (https://github.com/digitalbazaar/jsonld.js), a normalize API call is available that will convert a JSON-LD document to normalized N-Quads using the RDF Graph Normalization Algorithm (http://json-ld.org/spec/latest/rdf-graph-normalization/). The result can be diffed using a standard text-based diffing tool. This algorithm is also implemented in Python (https://github.com/digitalbazaar/pyld) and PHP (https://github.com/digitalbazaar/php-json-ld) and may be available in Java (https://github.com/jsonld-java/jsonld-java).
Note that these tools can also convert N-Triples or N-Quads into JSON-LD, which can then be converted to normalized N-Quads. It is also important to note that the RDF Graph Normalization Algorithm will canonically name all blank nodes.
rdfdiff
bash-3.2$ rdfdiff Usage: rdfdiff [OPTIONS] <from URI> <to URI> Raptor RDF diff utility 1.4.20 Copyright 2000-2009 David Beckett. Copyright 2000-2005 University of Bristol Find differences between two RDF files. OPTIONS: -h, --help Print this help, then exit -b, --brief Report only whether files differ -u BASE-URI, --base-uri BASE-URI Set the base URI for the files -f FORMAT, --from-format FORMAT Format of <from URI> (default is rdfxml) -t FORMAT, --to-format FORMAT Format of <to URI> (default is rdfxml)
TopBraid Composer (Paid version only)
Under "compare with" menu. TBC provides a GUI, also an integrated SPARQL query interface
http://www.topquadrant.com/products/TB_Composer.html
The result will be an RDF file by itself in N3. To find added triples:
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix diff: <http://topbraid.org/diff#> . SELECT ?s ?p ?o WHERE { [] rdf:type diff:AddedTripleDiff ; rdf:subject ?s ; rdf:predicate ?p ; rdf:object ?o . }
Similarly, for deleted triples:
SELECT ?s ?p ?o WHERE { [] rdf:type diff:DeletedTripleDiff ; rdf:subject ?s ; rdf:predicate ?p ; rdf:object ?o . }
Using CONSTRUCT we can easily turn the diff into an RDF graph.
TBC is a commercial software of TopQuadrant Inc. with a free version of limited functionalities.
Further reading: SPIN Diff: Rule-based Comparison of RDF Models
RDF-Trine
http://search.cpan.org/~gwilliams/RDF-Trine-0.130/
perl-based, by Gregory Todd Williams or RPI
Serialize graphs using the Canonical N-Triples serializer and then use a standard diff utility.
- Toby Inkster says: I wrote the Canonical N-Triples serializer for RDF-Trine. While the method above will tell you if a difference exists between two graphs, it won't be very useful for telling you what the differences are. This is because adding a single bnode-containing triple to a graph can potentially cause all the blank nodes in the graph to be relabelled.
rdf-utils
Usage:
java -jar rdf-utils-compact.jar diff -M1 test1.rdf -M2 test2.rdf
Introduction http://lists.w3.org/Archives/Public/semantic-web/2005Dec/0176.html
Download http://sourceforge.net/projects/knobot/files/rdf-utils/
By Reto Bachmann-Gmür and others
Prompt
https://protegewiki.stanford.edu/wiki/PROMPT
Author: Natasha Noy of Stanford, with contributions from Michel Klein, Sandhya Kunnatur, Abhita Chugh, and Sean Falconer.
It has been replaced by Protege4OWLDiff
Protege4OWLDiff
https://protegewiki.stanford.edu/wiki/Protege4OWLDiff
Some Related Links
- Canonical N-Triples http://www.hpl.hp.com/techreports/2003/HPL-2003-142.pdf
- Delta: an ontology for the distribution of differences between RDF graphs http://www.w3.org/DesignIssues/Diff
- There is also work going on for this topic in the new WG https://w3c.github.io/rch-wg-charter/
Concerns
How to determine sameness of blank nodes?