Use Case Digital Preservation

From Library Linked Data
Revision as of 21:49, 10 September 2011 by Aisaac (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Back to Use Cases & Case Studies page

Name

Digital preservation

Owner

Emmanuelle Bermes

Background and Current Practice

Preservation of digital objects in the long term is a challenging activity which is not limited to storage and back-up : it involves complex strategies aiming at providing a trusted environment where digital objects can evolve along with the changes in technology, hardware and software environments. To manage these evolutions, strategies such as emulation and migration have to be proceeded. In order to do so, it is necessary to collect, store and manage all the information relevant to preserve a digital object through its lifecycle. This is usually done by collecting preservation metadata, i.e. metadata about digital objects, their formats, the events they have met throughout their lifecycle, etc.

Goal

  • (1) Planning and realization of digital preservation actions related to a set of digital objects.
  • (2) Linked data provides a global environment for describing the objects and their significant properties, also allowing to avoid duplication of efforts when describing for instance data formats.

Use Case Scenario

A librarian needs to undertake a preservation action on a subset of his collection which is subject to obsolescence. For instance, he wants to transform all TIFF files in his data registry into JPEG2000 files.

First, he needs to identify all digital objects containing TIFF files in his repository : in order to do so, all the files must be described with technical metadata and in particular format information. Then, he identifies the tool best suited to proceed to the transformation, and runs a test on a subset of the digital collection. The test is assessed in order to check that significant properties that were present in the source TIFF files were not lost in the process (this means that these properties have been previously identified). Finally, he processes to the migration, registers the result in a new set of (event) metadata and creates a new version of his digital objects.

If the previous version is not deleted, a link is created between the two. The two versions are made available to users, using distinct identifiers.

Application of linked data for the given use case

A lot of metadata is needed to realize this use case :

  • metadata about the object
  • metadata about file formats and associated tools
  • metadata about events and agents involved in the events
  • metadata about versions of objects.

Moreover, all these metadata need to be accessed, searched and retrieved globally, and there are a lot of links and relationships between the resources. Linked data allows to describe these different kind of resources in a standard way and to create a global information graph encompassing all the information needed to perform complex queries and actions.

Semantic Web standards provide a rich framework for describing complex information and running queries (RDF+SPARQL).

The most technical and/or non institution-specific bits of metadata, such as description of formats, tools or events, can be mutualized and shared across the library community globally, so as to avoid duplication of effort between institutions actually undertaking similar actions.

Existing Work (optional)

Digital preservation repository systems such as Fedora Commons [1] and SPAR (Scalable Preservation and Archiving System, developped at the national library of France) [2] are using RDF as their standard to store preservation metadata.

The P2 Registry project [3] aims at providing a registry for the description of formats, based on Pronom [4], in a Linked Data form.

The Library of Congress has started to provide preservation vocabularies as Linked Data, notably to describe preservation events [5] and preservation level roles [6].

The California Digital Library is using principles of Linked Data and the REST architecture in the implementation of its digital repository micro-services known collectively as Merritt. Linked Data is used for integration/coordination of functionally distinct curation services.

Related Vocabularies (optional)

Representation of preservation metadata::

  • preservation vocabularies from Library of Congress
  • OAI-ORE (to describe the structure of complex data objects)
  • DOAP (to describe software agents)
  • PRONOM (contains information about file formats, compression techniques and encoding types)
  • PREMIS

Problems and Limitations

General problems:

  • The main problem is the maturity and scalability of Linked Data technologies, since digital preservation is confronted with high volumes of metadata and has strong persistence requirements.

Problems with existing approaches and vocabularies:

  • we are still lacking vocabularies to describe a lot of needed preservation metadata.

Related Use Cases and Unanticipated Uses (optional)

  • technical information made available as linked data could be used for purposes different from digital preservation

Library Linked Data Dimensions / Topics

Topics:

  • Conceptual models and KOS > Types of library data other than bibliographic and authority
  • Management of data and distribution > linked data management, hosting, and preservation


Dimensions:

  • Systems -> Library systems -> digital preservation repositories*
  • Information assets -> digital objects*

*these items are not in the initial list, suggestion for adding them


References (optional)