How to diff RDF

From Semantic Web Standards

The following list is in part based on a discussion at the W3C Semantic Web mail list.

Implementations

RDFLib

In RDFLib 3, the Python library for RDF, there is a module (rdflib.compare), which has tools for diff:ing graphs (using an algorithm by Sean B. Palmer for e.g. comparing bnodes). Take a look at the documentation (docstrings) in the module for some usage examples:

<https://github.com/RDFLib/rdflib/blob/master/rdflib/compare.py>

It's programmatic usage, but since you get the diffs as graphs, you can serialize them using the API, e.g.:

  from rdflib import Graph
  from rdflib.compare import to_isomorphic, graph_diff
  # ... use code like in the documentation
  # ...
  print in_both.serialize(format="n3")
  print in_first.serialize(format="n3")
  print in_second.serialize(format="n3")

By Daniel Krech (eikeon) and others

Note that with rdflib, it is possible to de-skolemize blank nodes: https://github.com/RDFLib/rdflib/issues/1404

Jena API

Jena rdfcompare - A command line tool written in java which loads two RDF files into Jena RDF models and uses an API call to check if the models are isomorphic.

  • The Good: Seems to do a good job at correctly telling whether two graphs are isomorphic. Can compare two files in different RDF formats.
  • The Bad: Doesn't give any analysis of the difference between the files (like you'd expect from UNIX diff). The differences can be retrieved with:

Jena rdfdiff - A command line tool that compares two graphs.

Jena is open source and grown out of work with the HP Labs Semantic Web Programme.

JSON-LD

With jsonld.js (https://github.com/digitalbazaar/jsonld.js), a normalize API call is available that will convert a JSON-LD document to normalized N-Quads using the RDF Graph Normalization Algorithm (http://json-ld.org/spec/latest/rdf-graph-normalization/). The result can be diffed using a standard text-based diffing tool. This algorithm is also implemented in Python (https://github.com/digitalbazaar/pyld) and PHP (https://github.com/digitalbazaar/php-json-ld) and may be available in Java (https://github.com/jsonld-java/jsonld-java).

Note that these tools can also convert N-Triples or N-Quads into JSON-LD, which can then be converted to normalized N-Quads. It is also important to note that the RDF Graph Normalization Algorithm will canonically name all blank nodes.

rdfdiff

bash-3.2$ rdfdiff
Usage: rdfdiff [OPTIONS] <from URI> <to URI>
Raptor RDF diff utility 1.4.20
Copyright 2000-2009 David Beckett. Copyright 2000-2005 University of Bristol
Find differences between two RDF files. 

OPTIONS:
 -h, --help                        Print this help, then exit
 -b, --brief                       Report only whether files differ
 -u BASE-URI, --base-uri BASE-URI  Set the base URI for the files
 -f FORMAT, --from-format FORMAT   Format of <from URI> (default is rdfxml)
 -t FORMAT, --to-format FORMAT     Format of <to URI> (default is rdfxml)

TopBraid Composer (Paid version only)

Under "compare with" menu. TBC provides a GUI, also an integrated SPARQL query interface

http://www.topquadrant.com/products/TB_Composer.html

The result will be an RDF file by itself in N3. To find added triples:

@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix diff:    <http://topbraid.org/diff#> .

SELECT ?s ?p ?o
WHERE {
  [] rdf:type diff:AddedTripleDiff ;
     rdf:subject ?s ;
     rdf:predicate ?p ; 
     rdf:object ?o .
}

Similarly, for deleted triples:

SELECT ?s ?p ?o
WHERE {
  [] rdf:type diff:DeletedTripleDiff ;
     rdf:subject ?s ;
     rdf:predicate ?p ; 
     rdf:object ?o .
}

Using CONSTRUCT we can easily turn the diff into an RDF graph.

TBC is a commercial software of TopQuadrant Inc. with a free version of limited functionalities.

Further reading: SPIN Diff: Rule-based Comparison of RDF Models

RDF-Trine

http://search.cpan.org/~gwilliams/RDF-Trine-0.130/

perl-based, by Gregory Todd Williams or RPI

Serialize graphs using the Canonical N-Triples serializer and then use a standard diff utility.

  • Toby Inkster says: I wrote the Canonical N-Triples serializer for RDF-Trine. While the method above will tell you if a difference exists between two graphs, it won't be very useful for telling you what the differences are. This is because adding a single bnode-containing triple to a graph can potentially cause all the blank nodes in the graph to be relabelled.

rdf-utils

Usage:

java -jar rdf-utils-compact.jar diff -M1 test1.rdf -M2 test2.rdf

Introduction http://lists.w3.org/Archives/Public/semantic-web/2005Dec/0176.html

Download http://sourceforge.net/projects/knobot/files/rdf-utils/

By Reto Bachmann-Gmür and others

Prompt

https://protegewiki.stanford.edu/wiki/PROMPT

Author: Natasha Noy of Stanford, with contributions from Michel Klein, Sandhya Kunnatur, Abhita Chugh, and Sean Falconer.

It has been replaced by Protege4OWLDiff

Protege4OWLDiff

https://protegewiki.stanford.edu/wiki/Protege4OWLDiff

Some Related Links

Concerns

How to determine sameness of blank nodes?