Warning:
This wiki has been archived and is now read-only.

Cluster Social Uses

From Library Linked Data
Jump to: navigation, search

Authors: Uldis Bojārs and Jodi Schneider

Background

The social web is becoming more important. One of the functions of a library is to build community, and there is potential for building community around books and other knowledge objects online. Further, taking advantage of social information about library materials has the potential to improve services around these materials.

Meanwhile, there are other relatively new or emergent uses of and environments for library data, such as for search engines and use on mobile devices. This cluster covers both the social and the emergent / new uses of library information.

Topic in the Context of Linked Data

Linked Data makes it easy to consolidate information about an object. Social web makes it easy for regular users to publish; linked data makes it easy to publish structured information/data. User activity and social network information can be used to give recommendations to users.

Extant data sets can be mashed up to provide more information -- e.g. locations of libraries on a map.

Scenarios (Case Studies)

We have broken these into two categories as follows:

Social

Collaboratively curate metadata, in order to announce recently published books, articles, photographs, videos, etc. which would be of interest to a designated community. This involves selection, remixing, and republication.
Distributed individuals (possibly volunteers) improve and enhance catalog records, and create new ones
Collect a crowdsourced scientific publication data base, including user annotations, tags and usage data. Use this information for discovering new metadata (e.g., relations). Provide API to this data.
Make Open Library bibliographic metadata reusable by publishing as linked data. Use this information in other applications (e.g., for locating full text of items). In consequence, users who encounter references to books on the Internet, in a variety of environments, are able to link to a source of access to the book, e.g. a full text version available at the Internet Archive.
Facilitate the swapping of e-books and paper books (Uldis to amend the use case re paper books) among members of a community. Use linked data for aggregating distributed catalog holdings made by individuals.
Publish circulation data as linked data. Use the open, aggregated circulation data for ranking search results (e.g., in OPACs).
Use linked data to publish and share user annotation to works and fragments of works (e.g., paragraphs). Use this crowdsourced information in new and existing applications.
Provide users with recommendations or search rankings based on information available about item popularity and user activity. Help the exchange of user activity [and item circulation] data by expressing it in a machine-readable form (RDF).
Websites use a button which users can click to indicate interest in helping bring a book into the public domain. This distributed information is collected and centralized by Gluejar, who negotiate with publishers based on the interest.

Other Emergent and New Uses

A non-library system which makes extensive use of a classification system (a thesaurus, a gazetteer and a chronicle) is interlinked with a library classification system.
Linked data is used to support mashups, with less upfront understanding of data formats required.
  • Use Case SEO (elements of this may belong in other clusters, too)
Structured data is exposed from library catalogs to ensure that library data is searchable through Web search engines.

Where does this belong?

Scenarios (Extracted Use Cases)

Make a list of about 4 main things these use cases need to do.

Use as a model: Cluster_Archives#Scenarios_(Case_Studies)

These are the scenarios from the LLD XG that were incorporated to create this document, along with their goals.

Relevant technologies

  • SPARQL
  • Content Management Systems that support linked data
  • Social web systems that support linked data (e.g. semantic wikis)
  • Person-centered publishing -- e.g. lifestreaming
  • Frameworks for publishing linked data
  • Annotation frameworks & tools

Relevant vocabularies

  • FOAF
  • SIOC
  • CC licenses
  • CiTO
  • BIBO
  • BibJSON
  • Dublin Core
  • review and tagging vocabularies

Problems and Limitations

  • There is a tension between the speed of libraries and the Web in terms of applications, services, etc.
  • Privacy, for instance avoiding reidentification of individuals based on recommendations

Missing vocabularies

  • Recommendations and Activity Data
  • lack of consensus on review and tagging vocabularies
  • Represention of Activity Streams
  • Documentation of how to specify fragments of works
  • “relatedDataRecord” and “relatedPublication”
  • circulation-ontology (how often a specific title was borrowed for how long. It would be even better if users could give information about the usefulness of the item)
  • standard for exchange of lightweight bibliographic metadata

Data incompatibilities or lack of full compatibility

  • Mapping between library and non-library sources of information for using authorities, etc.
  • Or matching between different libraries (e.g. that use different bibliographic identifiers)
  • Lightweight versus rich & robust data -- tension depending on the application
  • No existing standards for social usage data

Community guidance/organization issues

  • What vocabularies to use
  • How to organize resource sharing around aggregation from multiple sources
  • How to organize crowdsourcing
    • “Personal value precedes network value” -- need incentives for individuals to contribute
    • Need sufficient density of crowdsourced information so that the costs outweigh the benefits, i.e. want algorithms to yield useful results for the majority of the content.
  • Publicizing availability of data to interested developers
  • Organizational commitment is needed: can’t only rely on developer's side/pet projects
  • Historically, abstracting and indexing services have been commercial entities, hence there is a lack of metadata under free and open (or otherwise suitably licensed metadata) since commercial publishers, universities and conservative scholarly societies may refuse to release their metadata with an open license

Technology availability/questions

  • Risk of building production-level environments on new and still-developing technologies such as triple stores
    • lack of maturity of RDF support for popular languages
    • open source editions of triple stores are difficult to handle, missing support for content negotiation based on user-designed URI patterns, and they may not scale well.
  • Need for quality assurance for crowdsourced catalog and authority records, and suitable technology to support quality assurance

References