Use Case Open Library Data

From Library Linked Data
Revision as of 20:24, 17 October 2010 by Jschneid4 (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Back to Use Cases & Case Studies page

Name

Open Library Data - Make OL data as reusable for other web applications as possible.

Owner

Karen Coyle (Feedback in red by Kai)


Background and Current Practice

Where this use case takes place in a specific domain, and so requires some prior information to understand, this section is used to describe that domain. As far as possible, please put explanation of the domain in here, to keep the scenario as short as possible. If this scenario is best illustrated by showing how applying technology could replace current existing practice, then this section can be used to describe the current practice. Often, the key to why a use case is important also lies in what problem would occur if it was not achieved, or what problem means it is hard to achieve.

The Open Library is a large bibliographic database (approx. 25 million items) with metadata for books. Over one million ebooks are represented in the database, linked from the bibliographic data. These include open access texts as well as protected texts in DAISY format with access limited to certified sight-impaired users. The sources of the metadata are: library bibliographic records, Amazon.com bibliographic data, publisher ONIX data, and direct input by OL users. OL does not store its data in a standard library format (e.g. MARC) but instead uses a series of templates that it has developed. The data in the templates is structured as key-value pairs. Some of these are organized into entities that correspond more or less to FRBR/FRAD entities, in particular Work and Author (personal authors only). There is also an "edition" entity that resembles a Manifestation-based bibliographic record similar to the data carried in a MARC record. This latter has elements that FRBR would separate out into Work or Expression if the full FRBR model were being used.

The Open Library data does not follow library standards. Personal name forms are not retained from the library data, so these no longer link to library authoritative forms. Properties are not "subfielded" in conformance with MARC data (e.g. titles and subtitles are treated as a single string). LC Subject Headings have been separated into facets: topic, place, dates. (Genres are treated as topics.) Direct interaction of OL bibliographic records with library data is therefore difficult. On the other hand, OL bibliographic data interacts well with Wikipedia entries, and is more user-friendly than most library data. (Moved from Problems Section)

Goal

Two short statements stating (1) what is achieved in the scenario without reference to linked data, and (2) how we use linked data technology to achieve this goal.

To export OL data in RDF so that it can be incorporated easily into Linked Data activities.

(1) The possibility to reuse the data in OL should be maximized. We want to provide the OL data in a way to at least support goals like:

  • Make references on arbitrary web-sites linkable to OL records.
  • Allow the user of that website to see, if there is a full text version available via OL.
  • Allow the user to link to the OL data in an as-easy-as-possible way, e.g. to link to a specific manifestation, but immediately provide information about other items or manifestations of the same work that are available as a full text.

(2) Linked data technology can be used to easily reference specific manifestations in the OL data. Bibliographic data can be exposed by dereferencing or an SPARQL end-point to enhance the citation on the website. For the third goal, an appropriate data model is needed, that gladly is already available in OL. By using a proper vocabulary (like FRBR), these information can be provided in a standard-conform way.

META: I think the reason for all the confusion might be that the term "goal" is actually misleading. There is a higher goal, that is the reason, why we create all these use-cases. And this higher goal is in the end something like "We want to expose data as linked-data". But what we had in mind when we created the template is a goal that illustrates an actors need. And the actor (maybe we should clearly state the actor in a new section) is the one who uses the system or the data and thus benefits from linked data. The actor is needed to get this "shift in the viewpoint" from the developers persepctive (or "producer") to the user perspective (or "consumer"). And this is at least in software development the actual idea of use-cases. Describe the things that the user can and will do with an application to extract actual requirements to finally create a better application.

Note that this does not mean that the data producer can not be the actor or user. In case of data curation or improvement, the producer becomes the consumer, but then the producer "uses" the system and the linked data technology to actually improve the own data. Thus the producer becomes a consumer... confusing, heh? ;-)

Use Case Scenario

The use case scenario itself, described as a story in which actors interact with systems. This section should focus on the user needs in this scenario. Do not mention technical aspects and/or the use of linked data.

1. Users encounter references to books on the Internet in a variety of environments, but these references often do not link to a source of access to the book. For each book in the Open Library with a linked ebook, any citation or reference could indicate to the user when a full text version is available at the Internet Archive.

2. There is a need to share edition-to-work relationships that have been developed in OL with databases that have only editions. For example, a user searching in a library database may find a single version of a text; that particular Manifestation may not have an available digital version. Using the edtion-to-work relationships in OL, it would be possible to determine if there is another manifestation of that same work that does have an available digital copy.

Nothing to add here. And I still think that this section makes the valuable insight of this use-case.

Application of linked data for the given use case

This section describes how linked data technology could be used to support the use case above. Try to focus on linked data on an abstract level, without mentioning concrete applications and/or vocabularies. Hint: Nothing library domain specific.

Creation of a quantity of linked open data for bibliographic items, from a variety of sources and usable in web applications.

By assigning URIs to the OL metadata records, as well as the referenced full-text resources, the linking becomes possible in the first place. RDF and SPARQL provide at least two ways to publish usable information about these resources, either by means of an SPARQL endpoint or by dereferenciing the URIs and returning the relevant information in RDF.

With the existing approaches to bring FRBR-based vocabularies as a common data-model to the web of data, it is possible to expose the valuable information about works, manifestations and items, that are already represented in OL to the web in a standardized and reusable form.


Existing Work (optional)

This section is used to refer to existing technologies or approaches which achieve the use case. Hint: Specific approaches in the library domain.

OL already exposes data as linked data. Nevertheless, it is not easily usable in any case due to the mentioned problems. Examples for exposed data are:

Author in UI: http://openlibrary.org/authors/OL22022A/Barbara_Cartland

Author RDF: http://openlibrary.org/authors/OL22022A.rdf

Work in UI: http://openlibrary.org/works/OL6037025W/Code

Work in RDF: http://openlibrary.org/works/OL6037025W.rdf

Edition in UI: http://openlibrary.org/books/OL6807502M/Code

Edition in RDF: http://openlibrary.org/books/OL6807502M.rdf


Related Vocabularies (optional)

Here you can list and clarify the use of vocabularies (element sets and value vocabularies) which can be helpful and applied within this context.

  • foaf
  • frbr
  • rdvocab
  • dcterms

Problems and Limitations

This section lists reasons why this scenario is or may be difficult to achieve, including pre-requisites which may not be met, technological obstacles etc. Please explicitly list here the technical challenges made apparent by this use case. This will aid in creating a roadmap to overcome those challenges.

The following text is moved to background, but that's arguable (or not, matter of taste ;-))


The Open Library data does not follow library standards. Personal name forms are not retained from the library data, so these no longer link to library authoritative forms. Properties are not "subfielded" in conformance with MARC data (e.g. titles and subtitles are treated as a single string). LC Subject Headings have been separated into facets: topic, place, dates. (Genres are treated as topics.) Direct interaction of OL bibliographic records with library data is therefore difficult. On the other hand, OL bibliographic data interacts well with Wikipedia entries, and is more user-friendly than most library data.

The OL data can not easily be mapped to library standards. This is especially a problem for personal names, as they are not linked to authoritative data. These links have to be provided additionally and the content has to be kept consistent with the personal names, as the direct presentation of personal names is required by the users for convenience. In general, the challenge is to maintain the easy usable format AND integrate the data with other library data and make use of them, as well as making OL data usable for them.

Related Use Cases and Unanticipated Uses (optional)

The scenario above describes a particular case of using linked data.. However, by allowing this scenario to take place, the likely solution allows for other use cases. This section captures unanticipated uses of the same system apparent in the use case scenario.

Added by Antoine following this thread

Library Linked Data Dimensions / Topics

The dimensions and topics are used to organize the use cases. At the same time, they might help you to identify additional aspects currently not covered. If appropriate topics and/or dimensions are missing, please specify them here and annotate them by a “*”.



*these items are not in the initial list, suggestion for adding them

References (optional)

This section is used to refer to cited literature and quoted websites.