Draft Benefits

From Library Linked Data
Jump to: navigation, search

FROZEN -- EDITING CONTINUES AT Benefits

Draft

This page is a draft for the final report intended to capture the high level benefits of Library Linked Data.

Scope of "Library Linked Data"

To be written by the "Problems and Limitations" group (Karen, Tom, Jodi, Gordon, Peter)

  • define (or reference) what we mean by "library": oriented to being inclusive of archives, museums, online collections, etc.
  • characterize the types of library data that could be shared: authority data, controlled vocabularies, catalog data, metadata schemas (value vocabularies, element sets) ; link to the full lists of available data sets elsewhere in the report.

Benefits of Library Linked Data

General

  • Natural extension of collaborative sharing models in libraries, museums and archives
  • capitalize on collaboration between cultural heritage organizations (not just libraries), which is especially important in a time when there are dwindling resources.
  • Linked Data provides "Infinitely expandable description" <- this is a bit of a combo. Librarians would be attracted to this (I hope), but it's of interest to "Researchers" and "Organizations", as well
  • How do we get there from here?
  • Linked Data is based on grammar for "meaningful" statements. Not just describing a document format (like XML) or a database structure.
  • Use of URIs. URIs are the best method we currently have for citing and cross-referencing things across silos and for anchoring identifiers for things, concepts, and predicates in a regulated ownership and maintenance context -- "footnotes" for the Web.
  • Note that the main benefits of Linked Data will be "under the hood": not directly visible to end users except in the form of improved searching through following rich linkages between resources.

Librarians, Archivists, Curators

  • Emerging role as data curators: http://weibel-lines.typepad.com/weibelines/2011/03/principles-of-linked-data-recast.html
  • Cross-domain resource sharing: librarians aren't biographers/genealogists, for example, and should be able to piggyback on the work of domain experts (do one thing and do it well)
  • Identifiers for library entities: works, people, places, subjects, etc
  • "Open world assumption": not everything needs to be known upfront. Say what you know, add more as it's discovered. Allow others to say what they know too. Good for both Librarians/Archivists/Curators and Researchers/Students/Patrons?
  • Associations between the bibliographic description and other "library value add": subject guides/course guides, etc.
  • "producers of data" (Antoine)

Researchers, Students, Patrons

  • integration with other segments of the web: Wikipedia, Geonames, Freebase/Google, Musicbrainz (benefits for patrons/researchers: find library resources. Benefits for institutions: raise the visibility of library resources). This becomes obvious in the OCLC report "Online Catalogs: What Users and Librarians Want" [1] where the possibility to become e. g. more subject or author information was among the most helpful changes that could be made to an online catalogue. With Linked Data, libraries don't need to curate this extra information.
  • reference and citation management: resources on the web can be referenced with URLs, tools like Zotero, Mendeley, etc benefit from having metadata on the web
  • search engine optimization: search tools like Google Scholar, Facebook rely on metadata expressed and referenced in HTML.

Developers (both inside and outside institutions)

  • cross-domain technical solution, compared with MARC, EAD, etc
  • mechanics for expressing the relationships between resources
  • RDF provides a consistent data model, us to actually remix domain models, instances, and to evolve vocabularies web scale information retrieval
  • resources can be created/presented in the language of the searcher, without the need for an entirely different "record" (i18n)
  • managing change: JPW's blog post: Open Bibliographic Data: How Should the Ecosystem Work? http://blog.okfn.org/2010/11/29/open-bibliographic-data-how-should-the-ecosystem-work/
  • provenance at the statement level: RDF/DC folks are working on it (named graphs), "everyone can say anything about everything"
  • the web and linked data offers techical solutions for the integration of data stuffed away in "silos"

vocabulary evolution: mixability of vocabularies

  • statement oriented rather than record oriented
  • leveraging the scalability and ubiquity of the Web as a globally distributed publishing platform: web architecture.

Organizations

  • identifier focus (as opposed to all-encompassing 'records') allows the description to be tailored to the various interested communities, while still ensuring that everyone is talking about the same thing
  • Linked Data builds on the digitization work that has put resources on the web, American Memory, HathiTrust, OpenLibrary, Archival Finding Aids, Text Encoding Initiative. Create better links between the resources themselves and * their machine-readable metadata.
  • ROI on metadata generation (Eric Miller's presentation: http://wikis.ala.org/annual2009/index.php/Grassroots_Programs)
  • open world as an opportunity and not a threat
  • opportunity to clarify licensing issues.
  • JISC's Open Bibliographic Data Guide http://obd.jisc.ac.uk/ has lots of practical examples of the benefits of openness. Perhaps we could reuse some of them.
  • deduplication and convergence
  • Bottom Up instead of Top Down
  • Unanticipated re-use of library data: "the best thing to do with your data/idea will be thought of by someone else" (JISC) in a machine processable way <- seems "Developers", but the actual immediate beneficiary is the library
  • Integration with cultural heritage community: museums, archives, galleries, audiovisual archives... Example: the German museum portal [2] uses the DNB linked data service to pull in information about e. g. artists and then piggy-back on the DBPedia links in there to pull in Wikipedia information in several languages including images. Cf. [3], select _info_-link at the right of text "Otto Mueller (1874-1930)"
  • Linked Data is useful in the enterprise, where information is not completely open.
  • profit and non-profit collaborations

Rough Notes From Skype Call (March 28, 2011)

Present: Emmanuelle Bermes, Kai Eckert, Corey Harper, Richard Light, Ross Singer, Ed Summers

Benefits for Whom?

  • librarians: catalogers, administrators
  • archivists
  • curators
  • researchers, students, general public
  • developers
  • remixers
  • both closed and open (enterprise and open web)

Social Benefits

  • Natural extension of collaborative sharing models in libraries, museums and archives
  • Integration with cultural heritage community: museums, archives, galleries, audiovisual archives...
  • Integration with other segments of the web: Wikipedia, Geonames, Freebase/Google, Musicbrainz (benefits for * patrons/researchers: find library resources. Benefits for institutions: raise the visibility of library resources)
  • Unanticipated re-use of library data: "the best thing to do with your data/idea will be thought of by someone else" (JISC) in a machine processable way
  • Publishing of actual library resources as well as metadata about resources: institutional repositories, HathiTrust, OpenLibrary, Archival Finding Aids, Text Encoding Initiative. Create better links between the resources themselves and their machine-readable metadata.
  • An opportunity to clarify licensing issues. JISC's Open Bibliographic Data Guide http://obd.jisc.ac.uk/ has lots of practical examples of the benefits of openness. Perhaps we could reuse some of them.
  • Bottom Up instead of Top Down
  • webscale information retrieval
  • Adds value to the individual components within specialized collection (archives, etc.): e.g. reuse the digitized image/OCR of a Seamus Heaney manuscript that was otherwise obscured in an archival finding aid - Chronicling America, another good example

Technical Benefits

  • leveraging the scalability and ubiquity of the Web as a globally distributed publishing platform: web architecture.
  • cross-domain technical solution, not a technology scoped to a particular domain, e.g. MARC
  • the web and linked data offers techical solutions for the integration of data stuffed away in "silos"
  • identifiers for library entities: works, people, places, subjects, etc
  • mechanics for expressing the relationships between these resources: rdf triple
  • deduplication and convergence
  • vocabulary evolution: mixability of vocabularies
  • statement oriented rather than record oriented
  • provenance at the statement level: rdf/dc folks are working on it (named graphs), "everyone can say anything about everything".
  • managing change: JPW's blog post: Open Bibliographic Data: How Should the Ecosystem Work? http://blog.okfn.org/2010/11/29/open-bibliographic-data-how-should-the-ecosystem-work/
  • Resources can be created/presented in the language of the searcher, without the need for an entirely different "record" (i18n)
  • Cross domain resource sharing: librarians aren't biographers/genealogists, for example, and should be able to piggyback on the work of domain experts (do one thing and do it well)
  • Less focus on AACR's "prose", more focus on machine readable data
  • Forces the data to conform to a consistent model, forces us to actually make the models, vocabulary evolution
  • "Open world assumption", not everything needs to known upfront. Say what you know, add more as it's discovered. Allow others to say what they know too.
  • An identifier focus (as opposed to all-encompassing 'records') allows the description to be tailored to the various interested communities, while still ensuring that everyone is talking about the same thing
  • "Infinitely expandable description"
  • Associations between the bibliographic description and other "library value add": subject guides/course guides, etc. - internal remixing
  • By design, Linked Data approaches make it easy to merge data from many different sources -- an important new requirement in an age of mashups -- without explicitly planned coordination. Cooperation without coordination.

Summary

Folks on the call basically agreed that this list is too granular, and having a list of say 10 high level benefits would be a good thing to shoot for.

Benefits from Clusters

Here is the list of benefits in raw form extracted from the Use Case Clusters

Cluster BibData

Use cases identify the linked data environment as an opportunity to meet some of the barriers to the development of services:

   * Remove cultural bias in metadata format and content.
   * Replace aging standards and models that are not conducive to web-scale use.
   * Recommend a basic level of functionality and basic data requirements for metadata created by national bibliographic agencies.
   * Improve ability to re-use licensed metadata by shifting focus away from the metadata collection and record.
   * Improve matching and de-duplication of metadata.
   * Improve metadata aggregation from all sources.
   * Extend coverage to linked data from other communities.
   * Add functionality to services. 

Cluster VocAlign

Linked data technologies provide tools to express, share and exploit semantic mapping or merging of concepts across value vocabularies, e.g., represented using SKOS, as well as elements (classes, properties) from metadata element sets, as defined in (RDFS/OWL) ontologies. On the other hand, the ease of publishing data without bothering about these semantic connections in the first place is raising problems: Proliferation of URIs and Managing Coreference has been identified as one of the main "semantic elephants" in the room [1]. The issue of establishing connection (especially, equivalence) between entities that are semantically comparable is common to all Linked Data applications. The current number of links is an order of magnitude below the number of entities published on the LOD cloud, and one only dataset (DBPedia) serves as semantic mapping hub for almost the entire LOD cloud.

Cluster Archives

Linked data would provide an opportunity for archives to create links based not on the ownership of the same items but on topical relationships between materials held in different archives. Such links would allow the institution to provide additional context and detail to its users at the least effort and cost.

Linking from external materials would also raise the visibility of archives as users would discover the existence of primary source material during their research in other information sources. In particular, a connection between archives, libraries and museums, based on semantics inherent in their collections, would expand the general access to cultural heritage materials and would create new alliances for sharing materials and developing user services based on topics rather than institution type.

The Linked data and semantic web technologies are expected to facilitate not only the access of cultural collections to end users, but also the sharing and management of data between institutions. Linked Data is seen as a way to enable richer connections between different domains, hence improving interoperability.

Cluster Digital Objects

Semantic Web technology and the Linked Data networks built with it have proven to be a flexible means to publish and link data on the Web. Especially the ability to link to content elsewhere on the Web is a key ability in the context of this use case cluster. The links can be used to integrate disparate data sources, and also to define new relationships between the data. For this cluster this results in the ability to represent not only groups of related sources within one data source, but also across data sources.

The convergence of several communities of practice towards Semantic Web technology has led to specification of existing standards in RDF and OWL, making them available and reusable for others. For example, the availability of Dublin Core in RDF allows others to use Dublin Core in their own data, which is then immediately readable to any processor that understands the Dublin Core RDF Schema. In the context of this cluster, the OAI-ORE standard for representing groups of resources - now available in RDF - is particularly relevant.

Cluster Collections

The fine granularity of linked data can improve interoperability between heterogenous collection-level metadata created by multiple institutions with different functional requirements. Linked data from non-library sources, such as transportation networks, can enrich user-centred information about obtaining items from within a collection. Linked data from multiple libraries can provide information about alternate access to the same item held in different collections. Linked data bridges different levels of granularity; in particular, it is easy to integrate metadata about a collection as a whole into metadata about each item in the collection, using "is part of/contains" links between item and collection.