RE: Comparing versions of SKOS terminologies

Hi Joachim.

Our Lexaurus terminology management solution allows you to obtain the
differences between any 2 versions of a SKOS vocabulary  (e.g. 4 versions -
there are available differences between 1 and 2, 1 and 3, 1 and 4, 2 and 3
etc) and also the individual lifecycle of any concept in it.

We define 2 types of 'difference output' which are 

Delta	 - differences 
History 	 - lifecycle changes

The main difference being that if a concept is added then deleted between 2
versions (e.g. 1 and 3) , this will not appear in a 'delta' but will appear
as an 'add' followed by a 'delete' in the 'history' output for these 2
versions.

Cheers

Rob


-----Original Message-----
From: Neubert Joachim [mailto:J.Neubert@zbw.eu] 
Sent: 27 August 2013 18:34
To: 'public-esw-thes@w3.org'
Subject: Comparing versions of SKOS terminologies

When a new version of, say, a thesaurus is published, user are interested in
"What's new" and "What has changed?". I'm currently racking my brain about
this. Has anyone solved the pretended-simple problem of  comparing two
versions of a SKOS file, and the obviously not-so-simple one of formatting
the output in a way that is intelligible?

When it comes down to diff RDF files, there are some solutions listed in
http://www.w3.org/2001/sw/wiki/How_to_diff_RDF. The most simple way I found
was using rdf.sh (https://github.com/seebi/rdf.sh), which simply
system-diffs sorted .nt files produced by rapper. (You need to filter out
blank nodes here, but this shouldn't be much of a problem with SKOS files.)
Using git diff as a diff tool, this gives me a stat of something like "7443
insertions(+), 6937 deletions(-)" (on the two most recent versions of STW
Thesaurus for Economics).

Obviously, this triple-level diff doesn't help much for the users. A
possible way of action could be:

1) Group changes for each concept.
2) Recognize insertion and deletion of concepts as a whole (presumably the
most important changes).
3) Recognize certain types of changes (e.g., altered prefLabel, added
altLabel, changed relations).
4) Enrich the concept URIs with the preferred label (in a given language).
5) Arrange everything nicely on a RDFa overview page (additions/deletion of
concepts, perhaps some of the more important types of changes, statistics
such as amount of changed/unchanged concepts, etc.)
6) Provide change record (RDFa) pages per concept, which can be linked from
a concept page.
7) Optionally, if the terminology includes meta-structures such as a term
classification, add aggregated information about the most intensively
changed subject areas to the overview page.

Thoughts? Has somebody done something similar already?

Cheers, Joachim

--
Joachim Neubert

ZBW - German National Library of Economics Leibniz Information Centre for
Economics Neuer Jungfernstieg 21
20354 Hamburg

Received on Friday, 30 August 2013 08:12:39 UTC