How to diff RDF
From Semantic Web Standards
The following list is in part based on a discussion at the W3C Semantic Web mail list.
Contents |
Implementations
SemDiff Web Service
An online service maintained by Li Ding, RPI http://onto.rpi.edu/sw4j/diff.html
TopBraid Composer
Under "compare with" menu. TBC provides a GUI, also an integrated SPARQL query interface
http://www.topquadrant.com/products/TB_Composer.html
The result will be an RDF file by itself in N3. To find added triples:
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix diff: <http://topbraid.org/diff#> . SELECT ?s ?p ?o WHERE { [] rdf:type diff:AddedTripleDiff ; rdf:subject ?s ; rdf:predicate ?p ; rdf:object ?o . }
Similarly, for deleted triples:
SELECT ?s ?p ?o
WHERE {
[] rdf:type diff:DeletedTripleDiff ;
rdf:subject ?s ;
rdf:predicate ?p ;
rdf:object ?o .
}
Using CONSTRUCT we can easily turn the diff into an RDF graph.
TBC is a commercial software of TopQuadrant Inc. with a free version of limited functionalities.
Further reading: SPIN Diff: Rule-based Comparison of RDF Models
RDF-Trine
http://search.cpan.org/~gwilliams/RDF-Trine-0.130/
perl-based, by Gregory Todd Williams or RPI
Serialize graphs using the Canonical N-Triples serializer and then use a standard diff utility.
- Toby Inkster says: I wrote the Canonical N-Triples serializer for RDF-Trine. While the method above will tell you if a difference exists between two graphs, it won't be very useful for telling you what the differences are. This is because adding a single bnode-containing triple to a graph can potentially cause all the blank nodes in the graph to be relabelled.
rdf-utils
Usage:
java -jar rdf-utils-compact.jar diff -M1 test1.rdf -M2 test2.rdf
Introduction http://lists.w3.org/Archives/Public/semantic-web/2005Dec/0176.html
Download http://sourceforge.net/projects/knobot/files/rdf-utils/
By Reto Bachmann-Gmür and others
rdfdiff
bash-3.2$ rdfdiff Usage: rdfdiff [OPTIONS] <from URI> <to URI> Raptor RDF diff utility 1.4.20 Copyright 2000-2009 David Beckett. Copyright 2000-2005 University of Bristol Find differences between two RDF files. OPTIONS: -h, --help Print this help, then exit -b, --brief Report only whether files differ -u BASE-URI, --base-uri BASE-URI Set the base URI for the files -f FORMAT, --from-format FORMAT Format of <from URI> (default is rdfxml) -t FORMAT, --to-format FORMAT Format of <to URI> (default is rdfxml)
Prompt
http://protege.cim3.net/cgi-bin/wiki.pl?Prompt
http://protege.cim3.net/cgi-bin/wiki.pl?UserManagementInPromptDiff
Author: Natasha Noy of Stanford, with contributions from Michel Klein, Sandhya Kunnatur, Abhita Chugh, and Sean Falconer.
Jena API
Jena rdfcompare - A command line tool written in java which loads two RDF files into Jena RDF models and uses an API call to check if the models are isomorphic.\
- The Good: Seems to do a good job at correctly telling whether two graphs are isomorphic. Can compare two files in different RDF formats.
- The Bad: Doesn't give any analysis of the difference between the files (like you'd expect from UNIX diff).
Jena is open source and grown out of work with the HP Labs Semantic Web Programme.
GUO Graph Diff
GUO Graph Diff is a prototype script for performing "diffs" on RDF Graphs, the output of the diff is in RDF using GUO the Graph Update Ontology. The Graph Diffs produced are intended to be used as PATCHes against RDF graphs
RDFLib
In RDFLib 3, the Python library for RDF, there is a module (rdflib.compare), which has tools for diff:ing graphs (using an algorithm by Sean B. Palmer for e.g. comparing bnodes). Take a look at the documentation (docstrings) in the module for some usage examples:
<https://github.com/RDFLib/rdflib/blob/master/rdflib/compare.py>
It's programmatic usage, but since you get the diffs as graphs, you can serialize them using the API, e.g.:
from rdflib import Graph from rdflib.compare import to_isomorphic, graph_diff # ... use code like in the documentation # ... print in_both.serialize(format="n3") print in_first.serialize(format="n3") print in_second.serialize(format="n3")
By Daniel Krech (eikeon) and others
Some Related Papers
- Canonical N-Triples http://www.hpl.hp.com/techreports/2003/HPL-2003-142.pdf
- Delta: an ontology for the distribution of differences between RDF graphs http://www.w3.org/DesignIssues/Diff
Concerns
How to guess sameness of blank nodes?
