How to diff RDF

From Semantic Web Standards
Revision as of 06:21, 11 February 2014 by Dlongley (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

The following list is in part based on a discussion at the W3C Semantic Web mail list.

Implementations

SemDiff Web Service

An online service maintained by Li Ding, RPI http://onto.rpi.edu/sw4j/diff.html

TopBraid Composer

Under "compare with" menu. TBC provides a GUI, also an integrated SPARQL query interface

http://www.topquadrant.com/products/TB_Composer.html

The result will be an RDF file by itself in N3. To find added triples:


@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix diff:    <http://topbraid.org/diff#> .

SELECT ?s ?p ?o
WHERE {
  [] rdf:type diff:AddedTripleDiff ;
     rdf:subject ?s ;
     rdf:predicate ?p ; 
     rdf:object ?o .
}

Similarly, for deleted triples:

SELECT ?s ?p ?o
WHERE {
  [] rdf:type diff:DeletedTripleDiff ;
     rdf:subject ?s ;
     rdf:predicate ?p ; 
     rdf:object ?o .
}

Using CONSTRUCT we can easily turn the diff into an RDF graph.

TBC is a commercial software of TopQuadrant Inc. with a free version of limited functionalities.

Further reading: SPIN Diff: Rule-based Comparison of RDF Models

RDF-Trine

http://search.cpan.org/~gwilliams/RDF-Trine-0.130/

perl-based, by Gregory Todd Williams or RPI

Serialize graphs using the Canonical N-Triples serializer and then use a standard diff utility.

  • Toby Inkster says: I wrote the Canonical N-Triples serializer for RDF-Trine. While the method above will tell you if a difference exists between two graphs, it won't be very useful for telling you what the differences are. This is because adding a single bnode-containing triple to a graph can potentially cause all the blank nodes in the graph to be relabelled.

rdf-utils

Usage:

java -jar rdf-utils-compact.jar diff -M1 test1.rdf -M2 test2.rdf

Introduction http://lists.w3.org/Archives/Public/semantic-web/2005Dec/0176.html

Download http://sourceforge.net/projects/knobot/files/rdf-utils/

By Reto Bachmann-Gmür and others

rdfdiff

bash-3.2$ rdfdiff
Usage: rdfdiff [OPTIONS] <from URI> <to URI>
Raptor RDF diff utility 1.4.20
Copyright 2000-2009 David Beckett. Copyright 2000-2005 University of Bristol
Find differences between two RDF files.  

OPTIONS:
 -h, --help                        Print this help, then exit
 -b, --brief                       Report only whether files differ
 -u BASE-URI, --base-uri BASE-URI    Set the base URI for the files
 -f FORMAT, --from-format FORMAT   Format of <from URI> (default is rdfxml)
 -t FORMAT, --to-format FORMAT     Format of <to URI> (default is rdfxml)

Prompt

http://protege.cim3.net/cgi-bin/wiki.pl?Prompt

http://protege.cim3.net/cgi-bin/wiki.pl?UserManagementInPromptDiff

Author: Natasha Noy of Stanford, with contributions from Michel Klein, Sandhya Kunnatur, Abhita Chugh, and Sean Falconer.

Jena API

Jena rdfcompare - A command line tool written in java which loads two RDF files into Jena RDF models and uses an API call to check if the models are isomorphic.\

  • The Good: Seems to do a good job at correctly telling whether two graphs are isomorphic. Can compare two files in different RDF formats.
  • The Bad: Doesn't give any analysis of the difference between the files (like you'd expect from UNIX diff).

Jena is open source and grown out of work with the HP Labs Semantic Web Programme.

GUO Graph Diff

GUO Graph Diff is a prototype script for performing "diffs" on RDF Graphs, the output of the diff is in RDF using GUO the Graph Update Ontology. The Graph Diffs produced are intended to be used as PATCHes against RDF graphs

http://webr3.org/diff/

RDFLib

In RDFLib 3, the Python library for RDF, there is a module (rdflib.compare), which has tools for diff:ing graphs (using an algorithm by Sean B. Palmer for e.g. comparing bnodes). Take a look at the documentation (docstrings) in the module for some usage examples:

  <https://github.com/RDFLib/rdflib/blob/master/rdflib/compare.py>

It's programmatic usage, but since you get the diffs as graphs, you can serialize them using the API, e.g.:

  from rdflib import Graph
  from rdflib.compare import to_isomorphic, graph_diff
  # ... use code like in the documentation
  # ...
  print in_both.serialize(format="n3")
  print in_first.serialize(format="n3")
  print in_second.serialize(format="n3")

By Daniel Krech (eikeon) and others

JSON-LD

With jsonld.js (https://github.com/digitalbazaar/jsonld.js), a normalize API call is available that will convert a JSON-LD document to normalized N-Quads using the RDF Graph Normalization Algorithm (http://json-ld.org/spec/latest/rdf-graph-normalization/). The result can be diffed using a standard text-based diffing tool. This algorithm is also implemented in Python (https://github.com/digitalbazaar/pyld) and PHP (https://github.com/digitalbazaar/php-json-ld) and may be available in Java (https://github.com/jsonld-java/jsonld-java).

Note that these tools can also convert N-Triples or N-Quads into JSON-LD, which can then be converted to normalized N-Quads. It is also important to note that the RDF Graph Normalization Algorithm will canonically name all blank nodes.

Some Related Papers

Concerns

How to guess sameness of blank nodes?