Use Case Authority Data Enrichment

From Library Linked Data
Revision as of 20:21, 17 October 2010 by Jschneid4 (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Back to Use Cases & Case Studies page

Name

The Wiki page URL should be of the form "Use_Case_Name", where Name is a short name by which we can refer to the use case in discussions. The Wiki page URL can act as a URI identifier for the use case.

Authority Data Enrichment

Owner

The person responsible for maintaining the correctness/completeness of this use case. Most obviously, this would be the creator.

Kai Eckert, Alexander Haffner

Background and Current Practice

Where this use case takes place in a specific domain, and so requires some prior information to understand, this section is used to describe that domain. As far as possible, please put explanation of the domain in here, to keep the scenario as short as possible. If this scenario is best illustrated by showing how applying technology could replace current existing practice, then this section can be used to describe the current practice. Often, the key to why a use case is important also lies in what problem would occur if it was not achieved, or what problem means it is hard to achieve.

Authority control is the practice of creating and maintaining authority data for bibliographic entities. Currently, typical entities of authority data are persons, families, corporate bodies, works and subjects. The primary institutions holding authority data are libraries, archives and museums. Authority control enables catalogers to disambiguate resources with similar or identical characteristics as well as collocating resources that logically belong together. Stakeholders in the field of authority control already approached centralized organization of authority data. The intention behind is the consolidation of all the existing authority data all over the world by exclusive and reliable agents. At the moment consolidation is mostly done by copy and merge approaches. Usually it’s only realized on national level but first solutions arose for a consolidation on international level (i. e. Virtual International Authority File - VIAF [1]).

Goal

Two short statements stating (1) what is achieved in the scenario without reference to linked data, and (2) how we use linked data technology to achieve this goal.

  • (1) The enrichment of already existing entities of authority data with additional information (attributes and relationships).
  • (2) Linked data could enable the reuse of external data sets by linking instead of copying & merging.

Use Case Scenario

The use case scenario itself, described as a story in which actors interact with systems. This section should focus on the user needs in this scenario. Do not mention technical aspects and/or the use of linked data.

A librarian intends to enrich a particular authority data record occurring in his institutional data set (i.e. a person) with additional information. As a first step, the librarian has to search for equivalent authoritative entities in the data set of one or more remote data providers. Following, the librarian has to identify an equivalent entity in the search result. In case of an successful identification, both representations have to be aligned and the librarian has to decide what characteristics and/or relationships are important for the enrichment of the local representation.

Application of linked data for the given use case

This section describes how linked data technology could be used to support the use case above. Try to focus on linked data on an abstract level, without mentioning concrete applications and/or vocabularies. Hint: Nothing library domain specific.


Linked data technology allows to search and find equivalent entities in a data providers database by specific resource characteristics and relationships (via an SPARQL endpoint). Linked data vocabularies can be used to align different representations of the same real-world resource type. Linked data allows the usage of remote data in own applications. This can be realized in two ways:

  • 1. Reusing remotely used vocabularies in the own application to describe and integrate the remote data.
  • 2. Aligning remote vocabularies with locally used to “understand” and transparently integrate the remote data.

Existing Work (optional)

This section is used to refer to existing technologies or approaches which achieve the use case. Hint: Specific approaches in the library domain.

By matching and linking of name authority files for persons from all over world to the VIAF (Virtual International Authority File) the project proves that the concept of VIAF can be realized also with huge amounts of data. At the same time the project develops the functionalities necessary for the VIAF thus delivering the basis for its further development. Currently, VIAF uses OAI-PMH [2] to harvest data records (from authority files for persons and associated title records) from a number of international established libraries to analyse the person entities with respect to their equivalence.

Related Vocabularies (optional)

Here you can list and clarify the use of vocabularies (element sets and value vocabularies) which can be helpful and applied within this context.

Representation of authority data::

  • Friend of a friend (FOAF) [3]: for persons, corporate bodies
  • RDA Group 2 Elements [4]: for persons, corporate bodies and families
  • RDA Group 1 Elements [5]: for works
  • SKOS [6]: for subject headings
  • ...


Representation of equivalence relationships

  • owl:sameAs: for persons, corporate baodies, families, works
  • skos:exactMatch: for SKOS concepts
  • skos:closeMatch: for SKOS concepts
  • ...

Problems and Limitations

This section lists reasons why this scenario is or may be difficult to achieve, including pre-requisites which may not be met, technological obstacles etc. Please explicitly list here the technical challenges made apparent by this use case. This will aid in creating a roadmap to overcome those challenges.

General problems:

  • The identification of equivalent representations can be difficult.


General linked data problems:

  • If data is taken over in the local repository, provenance of aligned data is important, but maybe difficult to implement. Usage of metametadata necessary.
  • Reliability of data sources.
  • Inconsistencies between different remote data sources.


Problems with existing approaches and vocabularies:

  • Usage of different unaligned vocabularies.
  • Current practice involves skos:sematicrelations to align instance data that is not a skos:concept. Accepted approaches to link skos:concepts to real-world resources are still missing. [7]

Related Use Cases and Unanticipated Uses (optional)

The scenario above describes a particular case of using linked data.. However, by allowing this scenario to take place, the likely solution allows for other use cases. This section captures unanticipated uses of the same system apparent in the use case scenario.

  • use of centralized authority data to individualize non-individualized authority data
  • If we cannot find an equivalent entity we should mark the local entity as unique
  • The alignment of authority data enables the user to search for additional associated entities (for examples books of the same author) in other libraries. Such an extended search could be easily provided, if all data is exposed as linked data.
  • an external user intends the enrichment of a particular resource

Library Linked Data Dimensions / Topics

The dimensions and topics are used to organize the use cases. At the same time, they might help you to identify additional aspects currently not covered. If appropriate topics and/or dimensions are missing, please specify them here and annotate them by a “*”

Topics:

  • Conceptual Models and KOS -> Knowledge representation issues / Describing library and museum authorities and KOS resources as Linked Data
  • Semantic Web Environmental Issues -> Linking across datasets -> Alignment (cross-linking) of vocabularies
  • Applying SemWeb Technology to Library Data -> Legacy data
  • Management of data and distribution -> linked data management, hosting, and preservation


Dimensions:

  • Users needs -> Identify
  • Users needs -> Browse
  • Systems -> Library systems -> Authority data
  • Systems -> library and non-library system connections
  • Information assets -> Registries

*these items are not in the initial list, suggestion for adding them


References (optional)

This section is used to refer to cited literature and quoted websites.