Warning:
This wiki has been archived and is now read-only.

Use Case NDNP

From Library Linked Data

Jump to: navigation, search

Back to Use Cases & Case Studies page

1 Name

Name

NDNP (National Digital Newspaper Program - Harvesting Digital Objects From The Web)

Owner

Ed Summers

Background and Current Practice

The National Digital Newspaper Program (NDNP) is a partnership between the National Endowment for the Humanities (NEH), the Library of Congress (LC), and state projects to provide enhanced access to United States newspapers published between 1836 and 1922. NEH awards support state projects to select and digitize historically significant titles that are aggregated and permanently maintained by LC.

Chronicling America is a web application that allows users to search and view the 2.5 million (and growing) digitized pages as well as consult a national newspaper directory of bibliographic and holdings information for 140,000 newspapers, to identify newspaper titles in all types of formats. The directory was compiled through an earlier NEH initiative, the United States Newspaper Program.

Goals

URIS: To give each newspaper title, issue and page a unique URL to enable citation.
RELATE (new): To contextualize newspaper content by associating it with other content on the web.
PUBLISH: To allow digital objects (titles, issues and pages) and their associated bitstreams (pdf, jp2, ocr/xml) to be meaningfully harvested out of the web application, so that the data can be re-purposed, and preserved elsewhere.
API: To provide an API for third parties to use the content in their own environments without needing to harvest the actual content.

Use Case Scenario

A researcher, institution or other party wants to harvest the newspaper content out of the Chronicling America web application to perform their own analysis of the textual content. The user is able to get metadata for newspaper titles, issues, pages, and bitstreams associated with each page (pdf, jp2, ocr/xml). The user should also be able to look for new material on a routine basis.

Target Audience

Librarians, Archivists, Curators
Genealogists
Historians
Computer Programmers

Application of linked data for the given use case

Linked Data Design Issues and Cool URIs for the Semantic Web provided the foundation for the design of the Chronicling America identifier space. We wanted to enable interested parties to extract data out the web application to use in their own environments. Specifically we designed our web application to mint URLs for each and every newspaper title, issue and page. For example:

The Arizona champion - http://chroniclingamerica.loc.gov/lccn/sn82016246#title
November 3rd, 1883 Issue - http://chroniclingamerica.loc.gov/lccn/sn82016246/1883-11-03/ed-1#issue
Page 1 - http://chroniclingamerica.loc.gov/lccn/sn82016246/1883-11-03/ed-1/seq-1#page

Each of these URLs identifies the "real world" newspaper title, issue and page. When a user puts them in their web browser they see a HTML view for the resource. When an agent requests application/rdf+xml of the same URL they will get an RDF representation of the resource. The machine readable RDF was chosen to allow the resources to be described using existing vocabularies, and for resources to be explicitly linked together. In addition the page objects are linked to the digital objects that they are composed of: jp2, pdf, ocr/xml files. RDFa was used to create new vocabulary terms for NDNP specific semantics. RDF has also allowed place names, and languages to be meaningfully linked to dbpedia, geonames and lingvoj. Also, selected pages have been linked to Flickr resources, when pages have been uploaded there.

Linked Data and the OAI-ORE vocabulary allows interested clients to harvest Chronicling America objects from the web. For example, clients that are interested in crawling the content can start with a resource map for all newspapers and follow their nose (inspecting the RDF and resolving URLs of interest) to resource maps for titles, issues, pages and on down to their respective bitstreams (pdf, ocr xml, ocr text, jpeg2000, thumbnail jpg).

Existing Work

20th Century Press Archives

Related Vocabularies

Problems and Limitations

Use of RDF/XML as a serialization for RDF has proven to be a bit of hurdle for web developers not already familiar with semantic web technologies.
It would be useful to be able to document how various vocabularies are being used in RDF being delivered by Chronicling America. Possible ways to do this could be Dublin Core Application Profiles, or VoID.
There is some uncertainty about whether we should be using some IFLA sanctioned version of the FRBR vocabulary, or if using Ian Davis vocabulary is good enough.

Library Linked Data Dimensions / Topics

Dimensions:

Users needs > Browse / explore / select
Users needs > Retrieve / find
Users needs > Identify
Users needs > Access / obtain
Users needs > Integrate / contextualize
Context > Communication > Online access
Information assets > Archival materials
Information lifecycle > interpret / analyze / synthesize: > to enrich existing entities with more data
Information lifecycle > interpret / analyze / synthesize: > to identify an entity
Information lifecycle > interpret / analyze / synthesize: > to contextualise the entities by connecting them with other entities
Information lifecycle > present / publish: > to visualize entities and their relations
Information lifecycle > present / publish: > to make new entities accessible inside an information system
Information lifecycle > present / publish: > to provide new data as LOD

Topics:

Use of Identifiers
Linking across datasets
REST patterns for Linked Data
linked data management, hosting, and preservation
Versioning, updates
Search Engine Optimization for Library Data

References

Witt, Michael. Object Reuse and Exchange (OAI-ORE). American Library Association, 2010.
ORE Specifications and User Guides

Prototypes and Applications

Chronicling America

Retrieved from "https://www.w3.org/2005/Incubator/lld/wiki/index.php?title=Use_Case_NDNP&oldid=2247"

Use Case NDNP

Contents

Name

Owner

Background and Current Practice

Goals

Use Case Scenario

Target Audience

Application of linked data for the given use case

Existing Work

Related Vocabularies

Problems and Limitations

Library Linked Data Dimensions / Topics

References

Prototypes and Applications

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Navigation

Tools