Cluster BibData

From Library Linked Data
Jump to: navigation, search

Authors: Gordon Dunsire

Bibliographic Data Cluster

Background

The current unit of metadata processing in library and related communities (archive, museum, publisher, bookseller) is the bibliographic record, a set of data elements describing the content and characteristics of an information object manufactured for human consumption. The element set is primarily intended to support the identification and retrieval of the object by a human agent. End-user services are usually based on one or more collections of information objects managed at institutional level. Collection scope and user needs vary between institutions and consortia. Consortia with common bibliographic metadata requirements have developed and established standard element sets for over a century. Consortia exist at city, regional, national, and international levels, and may be based on a common sector, such as tertiary education, or subject focus, as well as geo-political boundaries. This has resulted in a wide variety of standard record formats and methods for determining element values. Consortia whose collections have significant overlap in content, where multiple copies of an information object are dispersed across different collections, have also developed standard methods and services for sharing records. Most services offer duplication of records to meet local requirements rather than using a single shared master record; this may occur, for example, when the service cannot add metadata about local copies to the master record, or when a library is using local attributes or local adaptations of standard vocabularies. The digital environment has increased the range of user needs and expectations beyond the scope of the collections of most consortia. End-user services are now required to operate at web-scale and incorporate metadata from sources well beyond the traditional bibliographic record.

Topic in the Context of Linked Data

Use cases identify the linked data environment as an opportunity to meet some of the barriers to the development of services:

  • Remove cultural bias in metadata format and content.
  • Replace aging standards and models that are not conducive to web-scale use.
  • Recommend a basic level of functionality and basic data requirements for metadata created by national bibliographic agencies.
  • Improve ability to re-use licensed metadata by shifting focus away from the metadata collection and record.
  • Improve matching and de-duplication of metadata.
  • Improve metadata aggregation from all sources.
  • Extend coverage to linked data from other communities.
  • Add functionality to services.

Scenarios (Case Studies)

This cluster is based on the following submitted use cases and scenarios:

[1] Use Case AGRIS

  • Cataloguer normalizes the element semantics of incoming records to local standard element sets.
  • Indexer uses incoming records to search a web index for related resources and produces a set of relevant related keywords based on standard vocabularies and authority descriptions.

[2] Use Case Bibliographic Network

  • User sees all expressions, manifestations and items related to a work that they find interesting.

[3] Use Case Community Information Service

  • Curator of a service selects data from input sources to determine what books, articles, photographs, videos, etc. were published recently which would be of interest to the community.

[4] Use Case Data BNF

  • End-users find resources on a given topic or context more easily.

[5] Use Case Identification And Deduplication Of Library Records

  • Users search library catalogues, or the web for books, and receive a single record with links to several copies, rather than dozens of the similar descriptions of the same book.
  • A network of libraries unifies its records by matching them into one.

[6] Use Case Linked Data and legacy library applications

  • Users, including those of legacy applications, are able to benefit from the addition of new (linked data) applications.

[7] Use Case Open Library Data

  • Users who encounter references to books on the Internet, in a variety of environments, are able to link to a source of access to the book, e.g. a full text version available at the Internet Archive.

[8] Use Case Pode

  • End-user can browse an author's products grouped by the abstractions of work and expression, with extra relevant information added from other datasources, and browse the collection by Dewey Decimal Classification categories.

[9] Use Case Polymath Virtual Library

  • Users can search and retrieve biographical data from digital texts, works, and other sources of information of or about a specific author, and add comments, highlight controversial data, complement the information, and navigate to other authors from the same period, etc.

[10] Use Case Regional Catalog

  • Users can search all German libraries at once, receive information about possible hits and which libraries hold them, and information about the nearest of those libraries.

[11] Use Case Talis Prism 3

  • Users can search for books and other resources by simple keyword or targeted index, refine their results, and expand their search by browsing to subjects or authors.

[12] Use Case Migrating Library Legacy Data

  • Library outputs legacy metadata in collections and sets which have optimized utility for user applications.

Scenarios (Extracted Use Cases)

These scenarios are extracted from use cases in this cluster and then generalized. Two general types of agent are involved:

  • The "processor" agent consumes, amends, and generates metadata, and may be human or machine.
  • The "end-user" agent consumes metadata, and is human.

Extracted scenarios:

  • Processor normalizes the element semantics of ingested records to a standard element set. [1], [12]
  • Processor merges duplicate records for the same resource into a single master record. [5]
  • Processor identifies web resources related to a bibliographic record and tags them with terms taken from a set of standard vocabularies. [1]
  • Processor identifies recently-published bibliographic resources for dissemination in a current awareness service. [3]
  • End-user searches metadata for all resources in a consortium using a single, integrated interface, and identifies all available copies of a resource, including the nearest to a specified location. [2], [4], [10]
  • End-user refines results of a search, and expands it to include related resources from external collections at web-scale. [6], [8], [9], [11]
  • End-user is presented with a single record for a resource, with links to records of copies, instead of multiple, slightly-varying bibliographic records. [5]
  • End-user obtains access to an online full-text version of a resource via a link from the bibliographic record for the resource. [7]
  • End-user can annotate bibliographic records retrieved by a search. [9]

Vocabularies and Technologies

Most of the vocabularies identified in the Vocabularies page are relevant to linked bibliographic data.

Linked data technologies relevant to bibliographic data include:

  • Frameworks for publishing URIs and RDF triples [1]
  • SPARQL endpoints [1], [7], [9]
  • Content Management Systems that support linked data [3]
  • Processes for FRBRisation and standardisation of bibliographic records [4], [7]
  • Triple stores [10]

Problems and Limitations

Missing Vocabularies

  • Vocabularies and element sets in widespread use in library legacy metadata are not available in RDF. [12]

Data incompatibilities or lacks

  • Lack of URIs for relevant metadata components. [1]
  • Lack of critical mass of linked data for legacy records. [5], [10]

Community guidance/organization issues

  • Lack of open licenses for re-use of linked data. [3]
  • Lack of stable community-wide mappings for common legacy metadata formats to RDF classes and properties. [12]
  • Stability and availability of dereferencing services, triple stores, data maintenance, and synchronisation. [12]
  • No consensus on the components of an optimal reusable package of RDF metadata for applications using library linked data. [12]

Technology availability/questions

  • Lack of open licenses to support APIs, data standards and client software in the linked data environment. [3]