See also translations.
The mission of the W3C Library Linked Data Incubator Group, chartered from May 2010 through August 2011, has been "to help increase global interoperability of library data on the Web, by bringing together people involved in Semantic Web activities — focusing on Linked Data — in the library community and beyond, building on existing initiatives, and identifying collaboration tracks for the future." In Linked Data, data is expressed using using standards such as Resource Description Framework (RDF), which specifies relationships between things, and Uniform Resource Identifiers (URIs, or "Web addresses"). This final report of the Incubator Group examines how Semantic Web standards and Linked Data principles can be used to make the valuable information assets that library create and curate — resources such as bibliographic data, authorities, and concept schemes — more visible and re-usable outside of their original library context on the wider Web.
The Incubator Group began by eliciting reports on relevant activities from parties ranging from small, independent projects to national library initiatives. These use cases provided the starting point for the work summarized in the main report: an analysis of the benefits of library Linked Data; a discussion of current issues with regard to traditional library data, existing library Linked Data initiatives, and legal rights over library data; and recommendations for next steps.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of Final Incubator Group Reports is available. See also the W3C technical reports index at http://www.w3.org/TR/.
Publication of this document by W3C as part of the W3C Incubator Activity indicates no endorsement of its content by W3C, nor that W3C has, is, or will be allocating any resources to the issues addressed by it. Participation in Incubator Groups and publication of Incubator Group Reports at the W3C site are benefits of W3C Membership.
Incubator Groups have as a goal to produce work that can be implemented on a Royalty Free basis, as defined in the W3C Patent Policy. Participants in this Incubator Group have agreed to offer licenses according to the licensing requirements of the W3C Patent Policy for portions of this Incubator Group Report that are subsequently incorporated in a W3C Recommendation.
Selected use cases and case studies from the library community and related sectors are described in this document. These were gathered and analyzed by the W3C Library Linked Data Incubator Group, based on submissions from different organizations and individuals. Cases have been grouped into eight topical clusters, which are described below. Selected use cases from each cluster have also been summarized.
As described in the group charter, one of the main activities of the W3C Library Linked Data Incubator Group was to gather use cases and case studies demonstrating successful implementation of Semantic Web technologies in libraries and related sectors. Outreach and dissemination initiatives were also gathered, along with innovative uses, which demonstrate the possibilities and benefits offered by the application of Linked Data technologies and methods in libraries.
The use cases presented in this report demonstrate the potential benefit of Linked Data technologies for the description of library resources and their context, and the value of sharing these descriptions among institutions and with the broader public. Description in this context primarily entails the creation or representation of relationships between resources. These relationships are defined by aligning similar entities or making existing relationships more explicit. New relationships are created either through machine processing (inferences, alignments, etc.) or manual efforts (tagging, cataloging). Those relationships can then be used to provide discovery through browse and search services, and to federate or aggregate materials from different sources. Aggregation is also a factor in data management. Linked Data technologies are used to improve global interoperability of library data through the re-use of metadata element sets and value vocabularies, the application of URIs for resources, and the development publishing services such as APIs.
The collected set of use cases, case studies, initiatives, and ideas is organized into eight clusters:
After collecting and reviewing the submitted cases for each cluster, group members extracted general scenarios out of these cases. The motivation behind these extracted use scenarios is to summarize and capture the main ideas from the set of original cases. The extracted use cases presented in this document cover the majority of topics and situations related to each cluster.
The following figure depicts the use case organization and the aforementioned extraction process:
This document presents the extracted use cases (section 4) and then offers short summaries for each individual case (section 5). In order to provide more detailed information to the interested reader, cluster sections and case summaries are linked to the wiki pages the Incubator Group has used to gather and curate the original information.
This cluster presents use cases related to bibliographic records. A bibliographic record can be understood as a set of data elements describing the content and characteristics of an information object produced for human consumption.
The original wiki page created by the Incubator Group members, containing more details about this cluster, can be found at Cluster BibData.
Normalize the semantics of bibliographic records ensuring a standard element set.
Merge duplicate records for the same resource into a single master record. In this way the end user is presented with a single record for a resource, with links to records for the different copies, instead of a display of multiple, slightly-varying bibliographic records.
Identify Web resources related to a bibliographic record and tag them with terms taken from a set of standard vocabularies.
Allow the end users to search for all resources in a consortium using a single, integrated interface, and provide all available copies of a resource presented by different criteria.
There are three different scenarios related to information aggregation:
Allow the end users to annotate bibliographic records retrieved by a search.
|Case name||Short description|
|Bibliographic Network||Linked data techniques would allow bibliographic records to be described as an information graph using Web standards to facilitate users' discovery requirements.|
|AGRIS||The AGRIS (International Information System for the Agricultural Sciences and Technology) Linked Data strategy focuses on exploiting the semantic richness of its data by creating an open dataset in agricultural sciences.|
|Community information service||A Linked Data approach to community information services could provide the data with an open license and using open standards which allows its re-use and lowers the barrier to participation in information curation and sharing.|
|BnF||Linked data technologies could help Bibliothèque nationale de France (BnF) to bring together data from several sources with a scalable and interoperable data model and to improve the publication of resources in the online catalog.|
|Identification and deduplication of library records||The application of Linked Data to library records could help to develop automated matching algorithms for library records so that only one record exists for each intellectual item.|
|Linked Data and legacy library applications||The addition of Linked Data applications to the library information systems creates a challenge for system architects on how to adapt legacy systems to make use of new Linked Data applications.|
|Migrating library legacy data||Libraries wish to convert legacy metadata to RDF triples for several reasons, including taking advantage of systems and services which may emerge in the Semantic Web environment and contributing to the general sharing of metadata and the common good.|
|Open Library data||The Open Library is a large bibliographic database. Linked data technologies can be used to easily reference specific manifestations in the Open Library database.|
|Regional catalog||The publication of Linked Data from German regional library services could help to create a German central catalog more easily.|
|Pode||Pode use case concentrates on converting library data to RDF, converting it to FRBRized library data, and linking data to individual instances in other Linked Open Data datasets.|
|Polymath virtual library||The use of Linked Data will benefit the Polymath Virtual Library in improving the process of obtaining links from different sources and spreading the type of these sources.|
|Talis Prism 3||Talis Prism 3 is a next-generation OPAC/search and discovery interface. Prism 3 is powered by the Talis Platform, a hosted Linked Data service which offers both SPARQL querying and powerful full text search capabilities.|
The original wiki page created by the Incubator Group members, containing more details about this cluster, can be found at Cluster Authority data.
End users can benefit from systems that make use of authority files expressed as Linked Data. If the user wants to include metadata about an author, the title of work, keywords, etc., the system can suggest possibilities retrieved from authority files, thesauri, or controlled vocabularies. The suggested value will be uniquely identified by its URI. In this way, systems support precise retrieval, improve usability, and mitigate issues such as record duplication.
Document repositories can improve their search functions by using authority records published as Linked Data. Authority records serve to bring together all forms of a name for authorized entity names. Systems that make use of linked authority records can recognize different forms of names and direct the end users to all the records associated with the authorized form of the searched entity such as its related bibliographic records. The search results are therefore more complete. Additionally, such systems could also suggest related terms as further potential search terms.
By publishing authority data as Linked Data, each authority record is uniquely identified by its URI. Systems and initiatives, such as VIAF, can then benefit from relating records that refer to the same entities but that offer different information. As done in vocabulary alignment cases, they can create semantic links between the records. They can then publish them as clustered records with each cluster also identified by an URI. Such an approach could benefit organizations aiming to link their authority records to other cluster contributors by providing a central place to look them up. From the end-user perspective this could result in having aggregated information coming from various organizations, thus enriching the user experience.
|Case name||Short description|
|AuthorClaim||The AuthorClaim registration service aims to link scholars with the records about the works that they have written. Linked Data can be used to further generalize the basic application, and encourage the re-use of the applications' results.|
|Authority data enrichment||Linked Data applied to authority control data could enable the re-use of external datasets by linking instead of copying and merging.|
|FAO authority description concept scheme||More efficient management of the several multilingual forms of a concept in the FAO authority scheme is possible through the use of URIs and the assignment of relationships between concepts.|
|International registry for authors||The inconsistency problem in the authors' signatures has been around for many years. IraLIS proposes applying Linked Data technologies to approach this issue by using specific URIs for each author.|
|Linked Data service of the German National Library||Linked Data provides a suitable framework for publishing relevant data at the German National Library and linking it to other data sources of interest.|
|Virtual International Authority File (VIAF)||The VIAF Linked Data approach provides useful experience and knowledge on how to apply Linked Data principles to international authority records.|
The original wiki page created by the Incubator Group members, containing more details about this cluster, can be found at Cluster VocAlign.
Enrichment and discovery use cases focus on collections that have applied source or target vocabularies which are used in alignment activities. These can provide:
Vocabulary enhancement and re-use either to extend other value vocabularies or as a basis of creation of new value vocabularies:
The main services that can be provided on top of vocabulary alignment data are:
|Case name||Short description|
|AGROVOC Thesaurus||The publication as Linked Data of this concept scheme (using SKOS) helps FAO to create equivalences to other agricultural vocabularies maintained by other institutions.|
|Browsing and searching in repositories with different thesauri||Linked Data could help to create a network of interlinked thesauri, thus facilitating parallel search through several repositories annotated with different thesauri.|
|Civil War data 150||By aggregating data sources related to the American Civil War and performing vocabulary alignment to a specific Civil War ontology, it becomes possible to query information about a particular place, regiment, battle, or officer over all the data sources at once.|
|Component vocabularies||Linked Data allows metadata creators to use URIs of terms from established controlled vocabularies such as the Library of Congress Subject Headings.|
|Language technology||Issues related to the application of language technology, e.g., Named Entity Recognition techniques, could be approached using Linked Data technologies and data resources.|
|Subject search||Linked Data principles could help libraries use subject heading systems more effectively for Web discovery and re-use both by humans and machines.|
|Vocabulary merging||Linked data technologies could enable multilingual resource discovery by providing vocabulary federation through vocabulary mapping and merging techniques.|
|Bridging OWL and UML||UML class diagrams can be used to explore, re-use, and design OWL ontologies.|
The original wiki page created by the Incubator Group members, containing more details about this cluster, can be found at Cluster Archives.
A group of archives would like to better share information about their holdings. They have separate catalogs and these catalogs do not necessarily use the same data formats. Exporting and sharing their data in Linked Data format would allow them to make connections between the collections using topics, names, place names, and other information contained in their metadata.
An archive would like to provide better discovery for its users. Traditional database methods do not allow users to follow connections that may be revealed in the descriptions of the archive's materials. Because it is hard to predict what methods a searcher will use and what information will be useful, Linked Data would allow searchers to follow the paths provided by any data points in the archival metadata.
The archive would like to gain greater visibility by linking from Web resources to its materials. It would do this by creating and exporting its metadata in Linked Data format, and by adding that data to the Linked Data cloud. This scenario is expected to facilitate the creation of semantic links between heterogeneous material from libraries, archives, and museums.
Build a network of institutions using similar metadata to describe preservation actions and to exchange expertise and collection information. Use Semantic Web tehnologies to facilitate and improve interoperability among heterogeneous data described using various metadata formats. Increase the use of digitally preserved materials by a wider user base.
|Case name||Short description|
|Archipel||The availability of domain-specific metadata vocabularies and models expressed as Linked Data would benefit the Flemish Archipel Project project by allowing one to improve the quality of the mappings between vocabularies.|
|Digital preservation||Linked data provides a global environment for describing digital objects and their properties with less duplication of effort among preservation metadata providers.|
|Europeana||Linked Data technology can help Europeana enhance semantic interoperability between metadata models, enrich existing metadata, improve harvesting data on links between objects, and improve object discovery and access functions.|
|LOCAH||Linked Data is a way to expand the benefits of lateral search and to help users to discover contextually related materials by creating links between widely dispersed resources.|
|Photo museum||Linked Data could provide a good solution for technical problems caused by heterogeneity of data describing photo collections.|
|Radio station archive digitization||Linked Data could add value to the digitization process for radio materials, especially by enabling cross-references to event data.|
|Recollection||The Recollection project seeks to provide a platform using Linked Data technologies that enables the community partners of National Digital Information Infrastructure and Preservation Program at the Library of Congress to share their collections and data on an ongoing basis.|
|Ontology of Cantabria's cultural heritage||The publication of data about the heritage of Cantabria (a region in the north of Spain) as Linked Data can contribute a considerable amount of quality information on Cantabria's cultural heritage to the Linked Open Data cloud.|
The original wiki page created by the Incubator Group members, containing more details about this cluster, can be found at Cluster Citations.
Creation of an enhanced representation of publications, where the cited reference is directly accessible from the citation (e.g., position in the cited/citing document, what was cited).
Make it possible for the user to click from a citation directly to the location in the referenced publication (URI or other resolver mechanism, like OpenURL).
Determine the value of a resource (easily and automatically) by analyzing the content of citations to that work (backlinks, further qualifications like agrees/disagrees).
Find other publications that build upon the same cited resource to include them in a "Related work" section (backlinks, qualifications like "Extends", "builds upon", etc.).
|Case name||Short description|
|Citation of scientific datasets||Linked Data could provide a standard way to link scientific datasets to the publications they support.|
|Enhanced publications||The application of Linked Data could help create a collection for research material that belongs together in order to make research comparable and more transparent.|
|Mapping scholarly debate||The Bibliographica project aims to capture rich relationships between authors and works, describing the evolution of thought and scholarly debate. Linked Data technologies could help to represent these data.|
The original wiki page created by the Incubator Group members, containing more details about this cluster, can be found at Cluster Digital Objects.
Allow the end users to define groups of resources on the Web that for some reason belong together. The relationship that exists between the resources is often left unspecified. Some of the resources in a group may not be under control of the institution that defines the groups.
Enable end users to link resources together, e.g., related descriptions, persons, or topics. For example, a poem in a digital text repository may be linked to the poet as defined in an authority file elsewhere on the Web. Fine-grained linkage could even be made at the level of individual terms in a document.
Support end user browsing through groups and resources that belong to the groups. Interlinks will allow the end user to explore the connections between resources.
Users should have the capability to re-use all or parts of a collection, with all or part of its metadata, elsewhere on the linked Web.
|Case name||Short description|
|Collecting material related to courses at the Open University||The LUCERO (Linking University Content for Education and Research Online) Project at the Open University is investigating and prototyping the use of Linked Data technologies and approaches to linking and exposing data for students and researchers.|
|Digital text repository||Linked Data could allow digital text repositories to harness large metadata and authority providers to describe their holdings, thus facilitating the tasks of discovery, selection, and semantic linkage.|
|Enhanced publications||The application of Linked Data could help create a collection of research material that belongs together and create a logical whole from the research process. This will make research more comparable and transparent.|
|Editing reports on new academic documents||Linked Data technologies could help in the development of a learning classification system that is strongly dynamic.|
|NDNP (National Digital Newspaper Program)||Chronicling America is a Web application that allows users to search and view more than 2.5 million digitized pages. Linked Data principles provided the foundation for the design of the Chronicling America identifier space.|
|NLL digitized map archive||The digital version of the Cartographic Collection of the National Library of Latvia (NLL) contains scans of historical maps. Map metadata could be represented as Linked Data and enriched with links to other datasets.|
|Publishing 20th Century press archives||The 20th Century Press Archives of the German National Library of Economics provide OAI-ORE metadata aggregations, also described as RDFa and enriched with links to other datasets.|
The original wiki page created by the Incubator Group members, containing more details about this cluster, can be found at Cluster_Collections.
Provide metadata pertaining to a collection as a whole, in contrast to item-level description. Allow end users to access and retrieve information about collections held by different organizations in different physical and electronic locations, including collection scope, strength, access conditions, and contact details.
Enable innovative collection discovery such as identification of nearest location of a physical collection where a specific information resource is found or mobile device applications based on collection-level descriptions.
Identify and classify collections of special interest to the community.
|Case name||Short description|
|AuthorClaim||The AuthorClaim registration service aims to link scholars with the records about the works that they have written. Linked Data can be used to further generalize the basic application and encourage the re-use of the application's results.|
|Collection-level description||The definition and publication of collection descriptions as Linked Data, coming from multiple and heterogeneous sources, and their linkage to other datasets could provide a rich and diverse set of services.|
|Community information service||A linked data approach to community information services could result in data being provided with an open license and using open standards. These permit re-use of data, which then lowers the barrier to participation in information curation and sharing.|
|Digital resources with access restrictions||Linked Data could support making access to digital resources user-centric, rather than institution-centric: determining whether the user has access to an item, given her personal subscriptions and library privileges.|
|Library address data||The Hochschulbibliothekszentrum des Landes Nordrhein-Westfalen provides a linked dataset of library institutions that can be used for linking in different Linked Data scenarios.|
|Nearest physical collection||Linked Data technologies could help to provide a standard and more structured way to navigate through library catalogs to find specific items and their geolocation.|
The original wiki page created by the Incubator Group members, containing more details about this cluster, can be found at Cluster Social Uses.
Distributed publishing and aggregation is fundamental for the social use cases. Individuals take actions, which are then aggregated to provide larger collections (in the case of document publication) or better recommendations (in the case of behavior tracking).
In many cases, "social information" is the information of interest:
a) item-based information (circulation, catalog entries, etc.) — aggregated information about the item
b) user-based information
b-1) user-contributed information (comments, bibliographic entries, etc.)
b-2) user activity information that users create via their actions while using applications (activities such as buying a book, looking at a Web page). This includes information about the user in relation to the item.
Individual contributions to a larger whole are a major feature in the social use cases. Contributions can take a variety of forms, from "traditional" crowdsourcing, in which many individuals send contributions to a shared server, to multiperson collaboration in a shared space, or curation (such as Use Case Community Information Service) of community resources.
There is a similarity to distributed publishing/aggregation, but the perspective is different: focusing on the collaboration rather than on the publishing.
Many new uses require machine-readable data, which can be offered in a variety of formats, including Linked Data, RDFa, and microformats, and may be distributed via application programming interfaces (APIs).
|Case name||Short description|
|Community information service||A Linked Data approach to community information services could provide the data with an open license and by using open standards, allow its re-use and lower the barrier to participation in information curation and sharing.|
|Crosslinking environment data and the library||The Federal Environment Agency, Germany (UBA) uses Linked Data to link its classification system with library data.|
|Crowdsourced catalog||Linked Data will help volunteers improve bibliographic data as well as create new bibliographic records.|
|Mendeley research networks for linking researchers and publications||The enhanced relationships provided by Linked Data will help researchers using the Mendeley bibliographic software find related research provided to the Mendeley database by others.|
|Open Library data||The Open Library is a large bibliographic database. Linked Data technologies can be used to easily reference specific manifestations in the Open Library database.|
|Ranking search results by popularity using circulation data||The publication of circulation data as Linked Data could improve the ranking of library catalog materials in search results.|
|SEO||Structured data from library catalogs is exposed to ensure that library data is better searchable and ranked in Web search engines.|
|Support this book button||Websites use a button which users can click to indicate interest in helping bring a book into the public domain. This distributed information is collected and centralized by Gluejar, which negotiates with publishers based on the interest in the book.|
|Social annotation||Linked data enables users to describe and publish annotation metadata of the information they consume.|
|Peer to Peer bookswapping||Linked data could help to investigate the creation of innovative lending services.|
|Publish data for lightweight application development||Exposing library data as Linked Data enables the development of lightweight third-party applications.|
|Social recommendations||Library online catalogs could provide users with recommendations or search rankings based on information available about item popularity and user activity.|
The International Federation of Library Associations and Institutions (IFLA) initiated a fundamental re-examination of bibliographic data to produce a framework that would provide a clear, precisely stated, and commonly shared understanding of what the bibliographic record aims to provide information about, and what one expects the record to achieve in terms of answering user needs. In addition, IFLA aimed to recommend a basic level of functionality and basic data requirements for records created by national bibliographic agencies. Linked Data techniques would allow these data and the concepts and relationships between them to be described as an information graph. Web standards can be utilized to satisfy the user's discovery needs.
Since 1975, the AGRIS (International Information System for the Agricultural Sciences and Technology) database has been aggregating and disseminating bibliographic references such as research papers, studies, and theses. The references include metadata on conferences, researchers, publishers, institutions, and subjects, and are cataloged from more than 150 participating institutions in more than 100 countries. The AGRIS Linked Data strategy focuses on two objectives: to institute AGRIS as a producer of Linked Data exploiting the semantic richness of the AGRIS data by creating an open RDF dataset in agricultural sciences; and to expose it to other Web services that can consume and link to AGRIS data.
Academic organizations of varying sizes (research groups, university departments, scholarly societies, special interest groups, etc.) have a strong interest in maintaining awareness and quality of information in their domain, and in openly publishing this information to the broader academic community and to the general public. A Linked Data approach could provide the data with an open license which allows its re-use for such purposes, and support the APIs, data standards and client software that would lower the barrier to participation in information curation and sharing.
Bibliothèque nationale de France (BnF) makes ifferent kinds of resources available on the Web. Linked data technologies could help the BnF to bring together data from several sources, with a scalable and interoperable data model, to improve the publication of resources in the online catalog as well to align and link to other useful resources on the Web.
Matching algorithms in the library domain need reference data, yet current access to such reference data is limited. The application of Linked Data to library records could help to develop automated matching algorithms for library records so that finally only one record exists for every single intellectual item. This situation could provide easier identification of resources' metadata and help with the deduplication of records.
The addition of Linked Data applications to library information systems creates a challenge for system architects on how to adapt legacy systems to make use of new Linked Data applications. The main question behind this issue is: how do libraries transition from pilot Linked Data applications to the use of Linked Data throughout the library information systems?
Libraries wish to convert legacy metadata to RDF triples for several reasons, including: taking advantage of systems and services which may emerge in the Semantic Web environment; encouraging increased usage of metadata and the corresponding resources; and contributing to the general sharing of metadata and the common good. The goal is to represent library legacy data as Linked Data, retaining as much data, utility and semantics as possible, as well as to enable its use in both traditional and innovative ways. To achieve this goal there is a need for appropriate vocabularies and element sets to be available as Linked Data as well as stable community-wide mappings for common legacy metadata formats to RDF classes and properties.
The Open Library is a large bibliographic database (circa 25 million items) of metadata for books. Over one million ebooks are represented in the database, linked from the bibliographic data. The goal is to allow the user to link to the Open Library data with the least friction, e.g., to link to a specific work that then provides information about other manifestations of the same work that are available as a full text.
In Germany there is no central catalog of all holdings of German libraries. Academic libraries are organized in regional clusters, where a central catalog for all member libraries is adminstrated by one of the members. With the application of Library Linked Data technologies to the regional central services, a German Central Catalog could be created more easily.
The purpose of the Pode project has been to use technology and external data to enrich the data in library catalog records, thereby creating a platform for developing end-user services with information and functionality that is not available in the current library Web search. This use case concentrates on converting library data to RDF, converting it to FRBRized library data, and linking data to individual instances in other linked Open Data datasets.
Polymath Virtual Library aims to bring together information, data, digital texts, and websites about Spanish, Hispano-American, Brazilian, and Portuguese polymaths from all times. The backbone of the system are the authors. The use of Linked Data will provide benefits for the Polymath Virtual Library by making it easier to gather links from a variety of sources and to interact with an increasing number of sources, as well as increasing the efficiency of the collection through semi-automatic enrichment of data, obtaining URIs from available as Linked Open Data and offer data as Linked Open Data to enhance its visibility and use (mainly through aggregators such as Hispana or Europeana).
Talis Prism 3 is a next-generation OPAC/search and discovery interface. There is a need to offer a rich interface to surface the large volume of content available in libraries. Browsing by entities such as author, subject, and series is important, as is the reliable extraction of data from MARC 21 into a Linked Data model. Prism 3 is powered by the Talis Platform, a hosted Linked Data service which offers both SPARQL querying and powerful full text search capabilities.
The AuthorClaim registration service aims to link scholars with the records about the works that they have written, as recorded in a bibliographic database.The application contributes to the identification of authors. In the application scenario, document metadata records are classified by subject experts. Each expert makes a binary decision whether a document belongs to a category or not. The resulting document collection forms an issue of a subject report. Linked Data can be used to further generalize the basic application and encourage the re-use of the application's results.
Authority control is the practice of creating and maintaining authority data for bibliographic entities. Authority control enables catalogers to disambiguate resources with similar or identical characteristics as well as collocating resources that logically belong together. Linked Data could enable the re-use of external data sets by linking instead of copying and merging.
The objective of the FAO Authority Description Concept Scheme is to provide more efficient management of the several multilingual forms of a concept through the use of URIs and the assignment of relationships between concepts. Its benefits include providing efficient system searching and exhaustive search results. It also improves access dramatically by providing consistency in the forms used to identify the different entities.
One of the main pillars of scientific information retrieval is authors' names. The inconsistency problem in author name forms has been around for many years, but its negative effect is increasing due to the great number of people publishing research studies and papers. IraLIS works to make authors aware of the need to always sign in the same way, to register name variants they have used and to create an IraLIS signature that allows a suitable and unconfused signature recognition. Linked Data technologies could help IraLIS by creating specific URIs for each author.
In Germany, authority data is collected and maintained collaboratively. This data, as well as the German National Library’s bibliographic data, is relevant to many libraries and other cultural heritage institutions. Linked Data provides a suitable framework for publishing relevant data at the German National Library and linking it to other data sources of interest.
The goal of the VIAF Project is to facilitate research across languages anywhere in the world by making authorities truly international. VIAF combines the name authority files of a number of institutions into a single name authority service. As of the fall of 2009 there are 18 personal name authority files from 15 organizations participating in VIAF. The VIAF Linked Data approach provides useful experience and knowledge on how to apply Linked Data principles to authority records.
The AGROVOC Thesaurus of the Food and Agricultural Organization of the UN in Rome (FAO) is a multi-lingual SKOS concept scheme of terminology in agriculture, forestry, fisheries, food, and related domains such as the environment. These concepts are used to tag and discover research results across multiple languages; their expression as Linked Data helps FAO create explicit equivalences between AGROVOC terms and terms in agricultural vocabularies maintained by other organizations.
In the library community, there exist different thesauri for annotating entries in library catalogs. A user should be able to browse and search in several library catalogs in parallel with the keywords from any of the used thesauri. It is important that there exist mappings between the different thesauri and categorization systems. Providing these as linked open data makes it easier to integrate further thesauri and or categorization systems into the network of interlinked thesauri, thus facilitating parallel search from different catalogs.
Civil War Data 150 ("CWD150"), is a collaborative project to share and connect American Civil War related data across local, state and federal institutions during the four-year sesquicentennial commemoration of the Civil War, beginning in April of 2011. By aggregating these diverse data sources and performing vocabulary alignment to an ontology specific to the American Civil War but applicable to a broader military schema, it becomes possible to query information about a particular place, regiment, battle, or officer. CWD150 will use Linked Data technology to create connections based on the strong identifiers and taxonomy of the Civil War, particularly the regiments, battles, battlefields, officers, soldiers, and sailors.
Creators of metadata use a variety of methods to encode or reference entities associated with the resource described, e.g., names, titles, subjects, or geographic names. The goal is to allow metadata to link to established vocabularies. Linked data technology may be used to achieve this, by assigning URIs to vocabulary terms. URIs are assigned to vocabulary terms in the controlled vocabularies and metadata descriptions at external systems use the URIs to reference those terms.
Language Technology is applied in areas such as machine translation, automatic summarization, (Web) search or spell checking. To estimate the usefulness of library Linked Data for language technology, it is important to concentrate first on one specific use case. This will be named entity recognition (NER) in single and potentially across languages. A traditional approach towards NER is the application of a gazetteer, i.e., a dictionary with information about places, people, and institutions. This approach has the drawback that it is hard to keep the gazetteer up-to-date. Another problem is the sustainable creation of gazeteers across languages. Linked data could help to solve the two problems of NER ("keeping up to date" and "briding across languages").
Traditionally, subject heading systems are a way to standardize the names of things, typically concepts. Typical library practice is to store subject headings as text strings in bibliographic records and represent those records on the Web in HTML where they can be indexed by Web search engines. Linked Data principles would help libraries use subject heading systems more effectively for Web discovery and re-use by using Hypertext Transfer Protocol (HTTP) URIs and OWL to identify and deliver better-modeled resources for consolidated use by humans, machines, and semantic agents.
Library users expect single point-of-search in consortial resource discovery services involving multiple organizations and large-scale metadata aggregations. Users also expect to be able to search for subjects using their own language and terms in an unambiguous, contextualized manner. Linked Data technologies could provide the underlying infrastructure by semantic mapping or merging of concepts across vocabularies. The use case discusses several methods of Linked Data vocabularies federation.
The first Linked Data principle says: "Use URIs as names for things". However, in order to avoid inconsistencies and tight coupling, names should be systematically managed and be rationalized in an adaptable conceptual model that is based on use cases and managed with sensible meta-model languages. The Web Ontology Language (OWL) and the Unified Modeling Language (UML) are two such languages that could be content-negotiated in either direction to manage and represent a common domain model This use case illustrates how UML class diagrams can be used to explore, re-use, and design OWL-based data. The UML community has developed the Ontology Definition Metamodel (ODM) to help bridge this gap.
The Flemish Archipel Project is focused on providing access and long-term archiving of digitized material from a diverse set of memory institutions. Libraries, archival institutions, the art sector (museums), and broadcasters contribute their content to a network of repositories. One of the challenges is the domain-specific metadata models that are used in each sector. The availability of such domain-specific metadata vocabularies and models expressed as Linked Data would benefit the project by allowing better formalized and tested mappings.
Preservation of digital objects in the long term is a challenging activity which is not limited to storage and backup: it involves complex strategies aimed at providing a trusted environment where digital objects can evolve along with the changes in technology and hardware and software environments. Linked data provides a global environment for describing the objects and their significant properties. This environment reduces duplication of effort when describing resources and their attributes, and fosters the creation of a global information graph encompassing all the information needed to perform complex queries and actions.
Europeana provides a service that links archives, libraries, museums, and audio-visual material across Europe. It aggregates metadata from various cultural heritage providers. It provides a unified way to search various object collections using that metadata through a Web portal or an API. It aims to increase re-use and reference to the digitized objects it refers to. Linked Data can help Europeana enhance semantic interoperability between metadata models, enrich existing metadata, improve data objects and link harvesting, as well as to enhance search processes and provide easy access to metadata by third parties.
The Archives Hub is a national service that provides a wealth of rich interdisciplinary information about archives held across the UK. The LOCAH Project is investigating the creation of links between the Hub and other data sources including DBPedia, BBC, LCSH, and others. User studies and log analyses indicate that Archives Hub users frequently search laterally through the descriptions. Linked Data is a way of vastly expanding the benefits of lateral search, helping users discover contextually related materials by creating links between archival collections and other sources that are often widely dispersed.
Photo collections are popular material on the Internet. Institutes that are doing long-term preservation of photographic collections need new tools and procedures for presenting the images. Often they have several variably linked databases describing the collections from different angles (physical descriptions of materials and conservation, content, agents, contracts, and intellectual property rights, etc.). Linked Data approaches seem to provide a good solution for the technical problems caused by hetereogenity of data and data sources.
Many radio stations have archives of audio programming going back many years. In many cases they are not digitized and have little or inconsistent metadata. Current practice for metadata creation and transcription is often ad-hoc, conforming in various degrees to established library methods. Linked Data would enable cross references to other events (particularly valuable where the audio in question is a news broadcast) and to enable federated searching both on these cross references and generally, adding value to the digitization process as well as increasing the potential impact of the results.
The National Digital Information Infrastructure and Preservation Program at the Library of Congress is an initiative to develop a national strategy to collect, archive, and preserve the burgeoning amounts of digital content for current and future generations. These diverse collections are held in the dispersed repositories and archival systems of over 180 partner institutions where each organization collects, manages, and stores at-risk digital content according to what is most suitable for the industry or domain that it serves. Linked data technology is used in Recollection as a basic platform for librarians and curators exposing collections to the Web, and as a source of data to augment these collections. Potential users of the information can more easily discover and analyze this data in a variety of new ways as a result.
The Ontology of Cantabria's Cultural Heritage aims to bring together knowledge about the heritage of Cantabria (a region in the north of Spain) in each and every one of its aspects: from the industrial to the archaeological heritage, from the scientific and cultural heritage to ethnographic information; to each and every one of the manifestations of this rich heritage, its agents, works, events, and historical periods. These data come from heterogeneous sources such as official publications, monographs, articles, catalogs, inventories, encyclopedias, databases, Web sites, archaeological sites, or digital objects. The publication of these data as Linked Data can result in a considerable amount of quality information on Cantabria's cultural heritage being added to the Linked Open Data cloud, thus increasing its visibility and usability.
In some scientific disciplines there is a growing trend to make available supporting data alongside journal publication of research. These data can either be stored and curated by the journal, or in discipline-specific repositories where these exist. Currently, there is no de facto method for citing the data made available. Linked data can provide guidance on how to assign identifiers, how to link data, descriptions, publications, and contributors or authors through vocabularies. As a result, automated systems could be able to extract data to compute measures of credit and to allow the making of connections, particularly to the same person. Additionally, human users could be able to view descriptions of the data by navigating from publications, and then access the data.
Libraries and other repositories store scientific research papers in journals, dissertations, reports, memoranda, and book chapters. Papers consist mainly of text, tables, and illustrations. Also, a limited number of pages is available for each paper. These constraints make it difficult to include original data, video, more and larger images, etc., in the paper. Enhanced publications aim to make it possible to include not only the paper but also the underlying data, models, algorithms, illustrative images, metadata sets, or post-publication data such as comments or rankings. The application of Linked Data to this scenario could help to create a collection of research material that belong together and create a logical whole in the research process, in order to make research comparable and more transparent.
The Bibliographica project aims eventually to capture much richer relationships between works and authors than would normally be available in library data. The eventual aim is to capture the evolution of thought and scholarly debate by reference to written works and other publications and their authors. Such evolution and debate could be represented as a directed graph and published under the principles of Linked Data.
Currently a student wishing to discover all of the material — books, DVDs, CDs, TV programmes, Podcasts, Open Educational Resources, etc. — related to a specific Open University course would have to consult a variety of data sources, each with a different system and interface, for each type of resource required. They would then need to analyze their results and integrate them manually. In a similar scenario, the same resources are needed by lecturers in creating new courses or tutorials, as well as by researchers in connecting the result of their research to existing resources. The LUCERO (Linking University Content for Education and Research Online) Project at the Open University is investigating and prototyping the use of Linked Data technologies and approaches to linking and exposing data for students and researchers.
Digital text repositories produce, store, and deliver digital resources containing textual data with varying amounts and coverage of appropiate metadata. Essentially, such repositories, unlike most libraries, actually produce new Manifestations (in FRBR terms), namely the digital editions of the texts they prepare. Ideally, digital text repositories would not have to define FRBR Work and Expression entities at all; they would be able to refer to those entities defined elsewhere. Linked Data could allow digital text repositories to harness large metadata- and authority-providers, as well as other data providers, to describe their holdings, by facilitating the tasks of discovery, selection, and semantic linkage.
Libraries and other repositories store scientific research papers in journals, dissertations, reports, memoranda, and book chapters. Papers consist mainly of text, tables, and illustrations. Also, a limited number of pages is available for each paper. These constraints make it difficult to include original data, video, more and larger images, etc., into the paper. Enhanced publications aim to make it possible to include not only the paper but also the underlying data, models, algorithms, illustrative images, metadata sets, or post-publication data such as comments or rankings. The application of Linked Data to this scenario could help to create a collection for research material that belong together and create a logical whole in the research process, in order to make research comparable and more transparent.
Editing Reports on New Academic Documents (Ernad) is essentially a piece of software implementing a protocol called the Altai paper. The aim of the design of Ernad was to help editors of a subject category to decide whether documents fit into the subject category or not. A running Ernad program powers the NEP: New Economics Papers service of the RePEc digital library. Linked Data technologies could help to provide a more general specification that generalized the process of a learning classification system that is heavily dynamic. Also, it could provide a basis for making a standardized representation of the output of the services so that it is easier to import them into related information services.
The National Digital Newspaper Program (NDNP) is a partnership between the National Endowment for the Humanities (NEH), the Library of Congress (LC), and state projects to provide enhanced access to United States newspapers published between 1836 and 1922. Chronicling America is a Web application that allows users to search and view more than 2.5 million digitized pages as well as consult a national newspaper directory of bibliographic and holdings information for 140,000 newspapers, to identify newspaper titles in all types of formats. Linked Data principles provided the foundation for the design of the Chronicling America identifier space. The goal was to enable interested parties to extract data out the Web application to use in their own environments. Specifically the Web application was designed to mint URLs for each newspaper title, issue, and page.
The digital version of the Cartographic Collection of the National Library of Latvia (NLL) contains scans of historical maps from the 16th to the 18th century. Currently maps are presented as HTML index pages pointing to PDFs of scanned maps. Index pages contain semi-structured textual information about these maps. Maps and other objects are given HTTP URIs so that they can be globally referenced. Map metadata can be made available as Linked Data by using RDFa or by using HTTP 303 redirects to point to various representations of resources and their metadata. Maps are linked to other related resources (e.g., place names or authority data about persons) which may also be expressed as Linked Data.
The 20th Century Press Archives of the German National Library of Economics (ZBW) is a large collection of newspaper clippings about persons, companies, subjects, and products, extending from 1826 to 2005, organized in thematic folders. For parts of the collections, metadata (like source and date of an article or name and location of a company) is available (solely in German). OAI-ORE provides the backbone for organizing the large and deeply nested aggregations of data. On every level of aggregations, it provides access to the aggregated resources. Search results are represented as dynamically built ORE aggregations. The aggregations are described by RDFa resource maps and enriched with links to other Linked Data datasets.
The AuthorClaim registration service aims to link scholars with the records about the works that they have written, as recorded in a bibliographic database.The application contributes to the identification of authors. In the application scenario, document metadata records are classified by subject experts. Each expert makes a binary decision of a document belonging to a category or not. The resulting document collection forms an issue of a subject report. Linked data can be used to further generalize the basic application, and encourage the re-use of the applications' results.
Collection-level description consists of metadata pertaining to a collection as a whole, in contrast to item-level description (manifestation description in terms of FRBR) which pertains to the individual members of a collection. The definition and publication of such collection descriptions as Linked Data, coming from multiple and heterogeneous sources, and its linkage to other datasets (geo, universities, commodities, etc.) could provide a rich but diverse set of services that can be customized by service providers and end users.
Academic organizations of varying sizes (research groups, university departments, scholarly societies, special interest groups, etc.) have a strong interest in maintaining awareness and quality of information in their domain, and in openly publishing this information to the broader academic community and to the general public. A Linked Data approach could be combined with an open license which allows its re-use for such purposes, and support the APIs, data standards, and client software to lower the barrier to participation in information curation and sharing.
Finding a digital copy of an item can be complex. Availability and access to items varies based on library subscription holdings, geographic rights restrictions, and national copyright laws. Individuals may have rights to multiple library collections which are sometimes supplemented by personal subscriptions and personal memberships. Linked Data technologies could support making access to digital resources user-centric, rather than institution-centric: determining whether the user has access to an item, given her personal subscriptions and library privileges.
Libraries are also resources in the Semantic Web sense of the term. They need to be identified by URIs just as books and authors are identified. The HBZ (Hochschulbibliothekszentrum des Landes Nordrhein-Westfalen) has created a Linked Data set of library institution identifiers that can be used for linking in different Linked Data scenarios such as linkage of holdings data to bibliographic data or mobile applications based on geolocation that provide opening hours and information about the building or the library staff.
A search for bibliographic resources in a union catalog may identify a specific physical manifestation that is exemplified by multiple items held in more than one collection. It is useful to determine which of these collections is the nearest to a specified location, for convenience and to shorten the time taken to obtain access to one of the items. Current practice in recording the locations of libraries and other information service organizations varies widely. Linked Data technologies could help to provide a standard and more structured way to navigate through the catalog to find specific items and their geolocation.
Academic organizations of varying sizes (research groups, university departments, scholarly societies, special interest groups, etc.) want to curate and re-publish information to promote the awareness and quality of information in their domain, and to publicize it to the broader academic community and to the general public. What is needed is a simple and easily replicable architecture to enable community data curation services of various sizes to develop and interoperate. A Linked Data approach could help in several ways, such as providing machine-readable licenses (e.g. to allow matching against various open licenses which allows different levels of re-use), and supporting appropriate APIs, data standards and client software that would lower the barriers to participation in information curation and sharing.
Linked Data technology provides the means to cross-link bibliographical information with other related data. The Federal Environment Agency of Germany (UBA) has a long tradition of knowledge organization combining library resources with many Web-based information systems that present observation data and results of analyses. Until now, the library and the data representations have been kept separate. Using Linked Data, their classification system — enhanced by a reference vocabulary which consists of a thesaurus, a gazetteer, and a chronicle — can be interlinked to library data. UBA plans to cross-link bibliographical information with related environmental observation data, and to link each with the reference vocabulary.
Linked Data technologies could help supply bibliographic data to volunteers for the purpose of improving and enhancing records or even creating new records. Such initiatives could benefit the local catalog and in addition the resulting records could be designated as Open Data and therefore be available for wider use.
In the scientific publication process, a print publication is placed in a separate context from the data and links that were in the researcher's information environment. Mendeley proposes to add context back to publications through crowd-sourced social and attention-based metadata as well as with algorithmic approaches to linking documents. These functions will make use of Linked Data and graph analysis of links (e.g. of readership, co-citation, etc.).
The Open Library is a large bibliographic database (approximately 25 million items) with metadata for books and linked full-text for over one million digitized books. The system clusters different editions for the same work, and thus makes it easy to find a digitized edition for the work if one exists among the editions. Each property and entity in the database has a URI, APIs provide export of entities (works, editions, and authors) in RDF/XML.
Library catalogs have limited success in ranking search results. Better ranking could be obtained if libraries could use circulation data gathered from many libraries to determine which item is the most popular. If libraries would publish their circulation data as Linked Data and link it to bibliographic data, the data could be loaded into a triple store and made searchable with SPARQL.
One of the key concerns of libraries is to make their data searchable through Web search engines. Yet library catalogs have been conceived as data silos. Libraries are aware that they need to make their data more accessible on the Web, both by adopting an architecture that is compatible with web crawling by bots, and by optimizing the available content so that search engines can process it efficiently. The Web's major actors have demonstrated an interest in the use of RDFa and other structured metadata formats to improve the presentation of data in results pages or other tools like social network pages. Adding such structured data to library online catalogs could increase the visibility and accessibility of their data.
A mechanism similar to a Facebook "Like" button, the "Support this book" button would connect pages from library online catalogs and library related websites to the Gluejar catalog, where these "votes" indicate support for releasing a CC-licensed version of the book as an ebook. When a user clicks a "support" button, the system would collect metadata from the webpage of interest. RDFa/BO+DC would be a likely format for such metadata. The system would take the metadata and use it to link to a supported work or list of supported works in the Gluejar catalog. Linked Data technologies could speed up the implementation and adoption of such a system.
People often create annotations for resources they are reading, like books or research papers. In the past these annotations were recorded on paper but increasingly they are being created electronically and published online. Linked Data enables users to describe and publish annotation metadata so that it can be reused by others.
Currently, books are usually shared by people who know each other. However, there are some mechanisms for organized sharing and exchange of books by individuals outside of libraries. Sharing ebooks with unlimited distribution (such as those under CC and GNU licenses, or those in the public domain) is as easy as sharing the URI. Opportunities and options for legally sharing restricted ebooks are more limited. The exploration of the limits of existing ebook lending options could lead to improved business models that could result in an increased sharing of culture. Linked data technologies would allow the use of machine-readable CC-like licenses and would permit aggregation of data from distributed sources like social networks or individual Web pages.
Currently, data resources and APIs are rarely exposed by library services, and may require special permissions. Linked data technologies and principles can help to expose data to be used in third-party applications. Users can then provide their own applications on top of the data, which can be shared and disseminated without fear of the data disappearing or of rights issues.
Current library catalogs allow users to search library holdings by keywords or catalog field values and to sort search results by some of these fields (e.g., a title and a year). They usually do not offer ranking of search results by relevancy to the user. Library online catalogs could provide users with recommendations or search rankings based on information available about item popularity and user activity. In this scenario, Linked Data could be used to connect social data representing ranking and recommendations to the search application.
Credit is due to the following Library Linked Data Incubator Group members for their work on organizing the use case survey, clustering the use cases gathered and describing the resulting clusters: Mark van Assem, Asaf Bartov, Emmanuelle Bermès, Uldis Bojārs, Karen Coyle, Ray Denenberg, Monica Duke, Gordon Dunsire, Kai Eckert, Alexander Haffner, Antoine Isaac, Martin Malmsten, András Micsik, Tod Matola, Peter Murray, Joachim Neubert, Michael Panzer, Felix Sasaki, Jodi Schneider, Anette Seiler, Ed Summers, Bernard Vatant, William Waites, Stu Weibel, Jeff Young, and Marcia Zeng.
We also gratefully acknowledge contributions of individual cases from the community: Xavier Agenjo, Tomas Baiget, Thomas Bandholtz, Sam Coppens, Klaas Dellschaft, William Gunn, Jan Hannemann, Eric Hellman, Francisca Hernández, Patrick Hochstenbach, Anne Isomursu, Phil John, Leslie Johnston, Johannes Keizer, Thomas Krichel, Kathy MacDougall, Steven Morris, Guenter Muehlberger, Jim Pitman, David Smith, Owen Stephens, Adrian Stevenson, Jane Stevenson, Maurice Vanderfeesten, Jon Voss, Romain Wenz, and Anne-Lena Westrum.
We also gratefully acknowledge information Owen Stephens provided about the JISC Open Bibliographic Data Guide, as well as the data guide itself, which provided the basis for several cases.