This wiki has been archived and is now read-only.

Use Case Identification And Deduplication Of Library Records

From Library Linked Data
Jump to: navigation, search

Back to Use Cases & Case Studies page


Identification and Deduplication of Library Records


Guenter Muehlberger

Background and Current Practice

Libraries have recorded their books in their electronic catalogues. Due to several reasons the records are often rather similar, but not identical. Moreover in many cases (such as in Austria or the Eastern European countries) the number of historical books already fully recorded (e.g. with MARC21) is rather low. Often just the scanned images of index cards are existing or just short title catalogues. If one wants to improve the situation currently matching of different library records is done semi-automatically or even manually which requires high effort.


The objective is to find automated matching algorithm for library records, so that finally only ONE record exists for every single intellectual item. These matching algorithms need reference data. Currently access to these reference data (e.g. the API of the WorldCat) is limited, resp. commercially exploited. If many libraries would offer their records via linked data, we would have more reference data which makes it much easier to match and identify library records.

Target Audience

Libraries and service providers for libraries. They could easier identify and get the best metadata for a (historic) book. End-user who will no more deal with dozens of the similar descriptions of the same book, but with one single record (with links to the local catalogues).

Use Case Scenario

1. Users search library catalogues, or the web for books. They will not receive dozens of the similar descriptions of the same book, but a single record with links to the several copies.

2. A network of libraries wants to unify its records by matching them into on. More or less the same process as above, the final result is the best record for a book.

Application of linked data for the given use case

There are many matching algorithm around, we in Innsbruck have a prototyp running from a master thesis which is tuned to work with the scenario above.

Existing Work (optional)

See above.

Related Vocabularies (optional)

Problems and Limitations

The scenario is rather easy to achieve but has nevertheless many benefits for the community.

Related Use Cases and Unanticipated Uses (optional)

Library Linked Data Dimensions / Topics

References (optional)