Warning:
This wiki has been archived and is now read-only.

Use Case Publishing 20th Century Press Archives

From Library Linked Data

Jump to: navigation, search

Back to Use Cases & Case Studies page

1 Name
2 Owner
3 Background and Current Practice
4 Goal
5 Target Audience
6 Use Case Scenario
7 Application of linked data for the given use case
8 Existing Work
9 Related Vocabularies
10 Problems and Limitations
11 Related Use Cases and Unanticipated Uses
12 References

Name

Publishing 20th Century Press Archives

Owner

Joachim Neubert

Background and Current Practice

The 20th Century Press Archives of the German National Library of Economics (ZBW) is a large collection of newspaper clippings about persons, companies, subjects and wares, extending from 1826 to 2005, organized in thematic folders. For parts of the collections, metadata (like source and date of an article or name and location of a company) is available (solely in German). Currently, parts of the more than six million documents are accessible as digitized page images (without OCR data) through a web application. "Deep Links" into this application are just plain impossible or URLs are heavily dependent on Coldfusion syntax elements. Also, harvesting of the data for foreign applications is not supported.

Goal

RELATE(aggregate): folders allowing access to multiple documents, search results accessed via a single URIs are typical aggregation cases
RELATE(new): To provide context from metadata and link to other data relevant to the domain.
REUSE-VALUE-VOCABS: VIAF, Geonames, GND are mentioned as targets of enrichment for providing a better context.
PUBLISH: harvesting and "To support the use of a standard image and metadata viewer based on METS/MODS." put publication of data at the core.

Target Audience

Scholars and students in Economic and Contemporary History
Archivists
Journalists
The general public
Service Providers

Use Case Scenario

The user can browse and search the collections by the available metadata. Search should be supported by a autosuggest service including alternative names (eg. from the German Personal Name Authority File, PND). Every item - folder, document, single page of a document - has its own persistent web address which can be cited and linked to. Additional information is provided from other sources on the web, such as a persons Wikipedia abstract or nationality (from the authority file). For non-German users, an English version of the website is prepared with data from the web. Links to places where more information is available, such as VIAF, is offered. And of cause the user can comfortably view folders and documents with their page images and some metadata attached.

Additionally, institutional users and providers of value added services (like Europeana) can harvest the data in an efficient way.

Application of linked data for the given use case

OAI-ORE provides the backbone for organizing the large and deeply nested aggregations of data. On every level of aggregations, it provides access to the aggregated resources (which may be aggregations themselves, or image files on the deepest level). Search results (e.g. company by location) are represented as dynamically built ORE aggregations. The aggregations are described by RDFa resource maps.

Metadata provided by the application database and links into the Linked Data cloud (especially DBpedia, Geonames, the German Authorities File, VIAF and Chronicling America) enrich these resource maps. RDFa facilitates building a web application for both, humans and machines, which follows the REST architectural principles.

Personal name lookup uses SPARQL queries against a SKOS file (skos:prefLabel/altLabel) derived from PND, mediated through a web service.

Existing Work

Beta version: http://zbw.eu/beta/p20 (notice changed URI from the first prototype!)
A voiD description file is available at http://zbw.eu/beta/p20/void.ttl
Example resources:
- http://zbw.eu/beta/p20/person (the biographical collection)
- http://zbw.eu/beta/p20/person/12 (folder about a single person) (with links to DBpedia, PND and VIAF)
- http://zbw.eu/beta/p20/person/13476/0147 (single document) (with links to Chronicling America)
- http://zbw.eu/beta/p20/person/13476/0147/0001 (the first page of this document)
- http://zbw.eu/beta/p20/company/searchresult?q=hamburg (company search result for "hamburg")
- http://zbw.eu/beta/p20/company_by_geoname/2886242 (comanies located in Cologne - by geoname location)
In Chronicling America whole newspapers with their issues and pages are made available on the Semantic Web

Related Vocabularies

ore
dcterms
skos
rdaGr2 (professionOrOccupation)
exif (resolution)
p20vocab

Problems and Limitations

The RDFa pages are generated dynamically from a relational database. Since information from different levels of the aggregation hierarchy and associated metadata tables is required to build a meaningful display for the user, performance is an issue. Besides database means such as materialized views, we try to solve this by caching strategies (which leverage standard web technologies). These techonologies have also to be applied to external linked data sources in order to guarantee availability and to achieve overall performance.
The granularity of the aggregations presented to the user is also an issue to solve (e.g. the companies collection aggregates some 13,000 companies, which is far too much for display as well for an efficient harvesting).
For use in the DFG-Viewer, the aggregations are to be mapped to METS-MODS XML files in different granularities. Up to now, no general mapping methodology from ORE to METS (let alone to MODS elements) exists, so currently we generate the files directly from the database.
The order of the documents within a folder, generally following the publishing date of the articles, is crucial (especially if a set of documents comes without any metadata). Currently, the order is not expressed in RDF, but solely by convention (ascending document numbers). Now, sometimes it turns out that somebody 20 or 50 years ago had messed arround in a folder, and that the sequence has to be rearranged to meet the users expectations. Because identifiers (including the document number) are meant to be persistent, a "renumber" command wouldn't be an option. ORE provides a solution with ore:Proxy and xyz:hasNext / xyz:hasPrevious, but this comes with all the hazzles of double linked lists and a large implementation overhead.

Related Use Cases and Unanticipated Uses

Europeana harvests and aggregates metadata from sites like the 20th Century Press Archives. The Linked Data (OAI-ORE) interface aims to facilitate this.
NDNP (Chronicling America) provides a large corpus of historic newspapers, down to the page level searchable through OCR text.
Linked Data Service of the German National Library and VIAF authority file data provides links to holdings of National Libraries around the world and other linked data sources.

References

More details about the P20 application can be found in Joachim Neubert: The 20th Century Press Archives as Linked Data Application, Submission to Semantic Web Challenge 2010
OAI-ORE applied to classical archival finding aids is outlined in Deborah Kaplan, Anne Sauer, Eliot Wilczek: Archival description in OAI-ORE (OR 2010)
Scholary use for OAI-ORE is described in Herbert van de Sompel: Adding eScience Assets to the Data Web (LDOW 2009)
METS wiki about METS & OAI-ORE

Retrieved from "https://www.w3.org/2005/Incubator/lld/wiki/index.php?title=Use_Case_Publishing_20th_Century_Press_Archives&oldid=3922"

Use Case Publishing 20th Century Press Archives

Contents

Name

Owner

Background and Current Practice

Goal

Target Audience

Use Case Scenario

Application of linked data for the given use case

Existing Work

Related Vocabularies

Problems and Limitations

Related Use Cases and Unanticipated Uses

References

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Navigation

Tools