Use Case Ranking Search Results by Popularity using Circulation Data

From Library Linked Data
Revision as of 13:25, 29 July 2011 by Jschneid4 (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Back to Use Cases & Case Studies page

Name

Use Case Ranking Search Results by Popularity using Circulation Data

Owner

Anette Seiler

Background and Current Practice

Library catalogs are notoriously bad in ranking search results. One possibility to rank search results could be popularity of the item working on the premise, that an item that is borrowed more often than others is more useful to other users and should be higher up in a ranking. Online book stores like Amazon provide the possibility to order search results in topseller-order.

Better results can be obtained if libraries do not only use their own circulation data to determine which item is the most popular, but also use the data of other libraries as well.

Goal

Use circulation data published by libraries to rank search results according to popularity.

Use circulation data published as RDF data by other libraries and the own library, linked to bibliographic data.

Target Audience

Library users, especially in public libraries and also in academic libraries.

Librarians concerned with collection development.

Use Case Scenario

A library user does a search e.g. a topic or genre search. The OPAC gives the possibility to rank the results according to popularity. Popularity is determined by the number of loans of a title in the library and other (similar) libraries. In this way, the user finds items that other users found useful. On the other hand it could be used to find obscure items nobody ever has read.

The information could also be used by libraries in collection development, e.g. determining if a particular author is popular or an item is never used anywhere and could be weeded.

Application of linked data for the given use case

If libraries would publish their circulation data as linked data (it must be linked to bibliographic data to make sense and different identifiers of the same resource described by bibliographic data must be linked to each other), the data could be loaded in a triple store and made searchable with SPARQL.

Existing Work (optional)

Except for COUNTER[1], where usage of networked resources is determined and Sushi[2], I don't know of any existing work. This idea for a use case was given at a lod-meeting in Cologne in August and was thought of to be difficult to implement (in the German context)

Dave Pattern has done some work with circulation data, see e.g. free, CC0-licensed usage data from University of Huddersfield and the usagedata category from his blog.

See also

Related Vocabularies (optional)

I don't know of a circulation-ontology. Linked data should record how often a specific title was borrowed for how long. It would be even better if users could give information about the usefulness of the item.

Problems and Limitations

The biggest obstacle is a political one - personal data should not be published. It should not be possible to make connections between a person and a specific item.

Another problem is the Matthew-effect[3]: popular items will be borrowed more often as they appear on top of the ranking and become more popular.

Libraries have experience in exporting and sharing bibliographic data, but not circulation data. Interfaces to do so will have to be developed.

As circulation data is not stored centrally, each library system must be able to share circulation data as linked open data. (With bibliographic data sharing can often happen at an central institution. This lessens the burden on single institutions.)

Identifying matching bibliographic data, especially if circulation data of many libraries is used. Different libraries will use different identifiers for their bibliographic data (and link circulation data to their own bibliographic identifiers). To use circulation data in many libraries it must be possible to match different bibliographic identifiers to each other.

Related Use Cases and Unanticipated Uses (optional)

The Regional Catalog described in another use case could be a great data foundation for this scenario.

Library Linked Data Dimensions / Topics

Dimensions:

  • Users needs
    • Browse / explore / select
  • Systems
    • Library systems
      • MARC Catalogs
    • Non library information systems
      • Online bookstores


  • Social uses
    • Social bibliography
    • User logs


  • Information assets
    • Books
    • Journal articles
    • Open Web resources
  • Information lifecycle
    • collect:
      • browse / explore / find / retrieve entities
      • to select an entity appropriate to the user’s needs

Topics:

    • Types of library data other than bibliographic and authority
      • administrative: circulation, statistics (COUNTER, SUSHI)
    • Status of library-related vocabularies in development (inventory: RDA, FR family, ISBD ...)
  • Linking across datasets
    • How much linking is there? What links to what?
    • Alignment (cross-linking) of vocabularies
    • Alignment of real-world-resource identifiers
  • Rights
    • Licenses, IP, DRM, other availability/rights/access restriction info

References