Use Case Migrating Library Legacy Data

From Library Linked Data
Jump to: navigation, search

Back to Use Cases & Case Studies page


Migrating Library Legacy Data


Gordon Dunsire

Background and Current Practice

Libraries hold hundreds of millions of legacy records containing machine-readable bibliographic metadata. The metadata are generally of high quality, controlled by the application of national and international rules for the structure, derivation, and form of content. In the case of traditional "controlled entry points", "headings", "authorities", etc. the content usually consists of controlled terms taken from a relatively small number of vocabularies. The coverage of these vocabularies is confined to the names of persons, families, organizations, and captions or notations for subject topics, and their application is constrained to traditional methods of bibliographic resource discovery. For example, a controlled term will be used for the name of an organization which creates a work, but not if the same organization publishes a manifestation of the work.

Libraries wish to convert legacy metadata to RDF triples for several reasons, including taking advantage of systems and services which may emerge in the Semantic Web environment, encouraging increased usage of the metadata and corresponding resource, and contributing to the general sharing of metadata and the common good.

Current practice is nearly all experimental. There is a growing, but still very small, number of libraries engaged in such activity, with little formal coordination between them. The environment is volatile, as more RDF vocabularies and element sets are released as open linked data.


Linked data is intrinsic to this use case. Goals are therefore:

1. To represent library legacy data as linked data, retaining as much data and utility as possible.

2. To support the use of legacy (and current) linked data in both traditional and innovative ways.

Target Audience

Library cataloguers, systems librarians, ontologists, developers.

Use Case Scenario

Linked data is intrinsic to this use case. The scenario is in two related parts:

1. A library chooses the RDF classes, properties, and vocabularies which match its legacy metadata. It selects broader semantics for those data that are not fully open for reuse because of licensing restrictions. The library makes different decisions between metadata available in different legacy formats and with distinct provenance, and treats them (and their referents) as separate collections. The library recasts all or some of the metadata records in each collection as RDF triples, using its own namespace for the identification of class individuals. The library creates linking data for some object values, but retains and preserves others as literals in cases where the linked-to data may be volatile. In some cases, the library creates both literal and linked versions of a triple, to encourage consumption by a wide range of potential applications.

2. The library wishes to encourage maximum reuse of its legacy metadata. It wants to preserve the context of the record while reducing it to triples, and it wants to know if there is a corresponding set or sets of properties which have maximum utility and ease of consumption by Semantic Web applications.

Application of linked data for the given use case

The Concise Bounded Description proposal [1] offers a framework for specifying metadata packages for optimal use by Semantic Web software agents.

Existing Work (optional)

Related Vocabularies (optional)

  • Dublin Core (dc)
  • Dublin Core terms (dct)
  • RDF schema (rdfs)
  • SKOS (skos)
  • Bibliographic ontology (bibo)
  • Time ontology in OWL (owlt)
  • ISBD (isbd)

Problems and Limitations

The metadata are not necessarily consistent over time. Cataloguing rules, formats, and controlled vocabularies have evolved in stages of "punctuated equilibrium", and records are not always retroconverted to current standards. There is a significant degree of duplication as a result of "copy-cataloguing", record supply and distribution services, and lack of coordination between libraries. Several international identifier systems are in use, none of which provides comprehensive coverage.

There are no stable community-wide mappings for common legacy metadata formats to RDF classes and properties.

Appropriate vocabularies and element sets are not completely available as linked data. Some are under construction; others are simply missing. There is very little semantic mapping between library-oriented namespaces. In some cases, there are multiple namespaces in various stages of development declaring very similar properties; in others, the properties have much broader semantics than the legacy metadata.

There are issues about the stability and availability of dereferencing services, triple stores, data maintenance, and synchronisation.

There has been little discussion, and therefore consensus, on the kinds of triples that should be included in an RDF data dump or returned by dereferencing services with respect to applications using library linked data.

Related Use Cases and Unanticipated Uses (optional)

This use case is related to every other use case in the Bibliographic data cluster: UseCases#Cluster: Bibliographic data.

It is also related to the Archives cluster (Cluster Archives), Authority cluster (Cluster Authority data), and Collections cluster (Cluster Collections).

Library Linked Data Dimensions / Topics

References (optional)

[1] CBD - Concise Bounded Description