Use Case Community Information Service
Back to Use Cases & Case Studies page
Community Information Service
Jim Pitman http://www.stat.berkeley.edu/~pitman/
Background and Current Practice
Most university departments and universities are unable to extract from their university library catalogs a list of all publications of their own faculty. Even if they could, they are typically not be allowed to publish it without renegotiating license agreements with bibliographic metadata suppliers. A typical subject-specific interest group may be able to extract subject-specific bibliographic metadata from a variety of sources. But again, there is a high barrier to cross before the group can obtain clear rights to republish or remix such material. Essentially, the group has to acquire some legal identity, capable of making licensing agreements, before it can do so legally. Then the group has to find a business model capable of supporting some individual whose job it is to manage such agreements. This organizational overhead is unnecessary in a universe of linked data.
Make libary catalog and other publisher-genetated bibliographic metadata freely available to community data curators so it is easily filtered by author/affiliation/subject/... to allow large numbers of small to medium sized academic communities to easily extract what data is of particular interest to them, with minimal technical and legal overhead, and to openly republish that data in ways they find worthwhile. For example, by selecting, ranking or classifying the data, and providing simple searches and faceted displays over bibliographic collections of special interest to the community.
How to use linked data technology to achieve this goal: provide the data with an open license which allows its reuse for such purposes, and support the APIs, data standards and client software to lower the barrier to participation in information curation and sharing.
Scholars as service providers: all those who edit, curate and arrange scholarly information for the purpose of making it openly accessible to a wide audience. Indirectly, the general public which may find subject-specific resources curated by scholars more informative than generic search services or Wikipedia. Computer programs, inasmuch as these may be used for tasks of filtering, deduplication, selection, ... to save the time of expert curators.
Use Case Scenario
Curator of a community information service selects data from input sources to determine what books, articles, photographs, videos, .... were published recently which would be of interest to the community. Curator has input data available in such a way that they can easily control what is piped through to their information service.
Make it easy for data providers (publishers, libraries, other aggregators) to provide linked data with suitable API and client software for community data curators to use. Curators should expect that bibliographic records come equipped with identifiers for all entities (editions, people, subjects, journals, publishers, .... ) and that this information is easily loaded into some community managed CMS to allow remixing with whatever ranking/selection/faceting/... the community service may wish to provide.
Existing Work (optional)
Most A&I services maintain some data ingest systems for these purposes. But they are usually proprietary, and not readily available for use by smaller agents with interests in biblio data curation. These mostly rely on converting raw publisher data into proprietary biblio formats for internal use, and licensing data to libraries in degraded formats for use by supplicant scholars. These services add no value to the universe of linked data, but rather compete with it. Some examples of software systems for open display of community curated bibliographic collections are BibSonomy, BibServer, BibApp, Open Scholar. All of these systems would benefit from easy availability of comprehensive linked library and publisher data via API. An example of a typical community website which would benefit greatly from integration with linked data is the Probability Web. See especially the lists of Books, People, and the link to the Probability Abstract Service, all of which could be recreated to both import and export linked data. There are more advanced services in other fields, especially RePEc (laudably open, but with large amounts of data whose license status is indeterminate) and SSRN (free but not open to reuse). Such large community services are typically built with an architecture that is difficult to replicate. What is needed is a simple and easily replicable architecture for community data curation services of various sizes to develop and interoperate. BKNpeople and VIVO are starts in this direction at the level of identifying people and their interests. Integation of such systems with the ORCID initiative will be important. See also the BKN Project.
BIBO, CiTO, ...
Problems and Limitations
Reasons why this scenario is or may be difficult to achieve:
Social/Economic/Legal -- vested interests in A&I services -- lack of suitably licensed metadata -- commercial publishers, universities and conservative scholarly societies refusing to release their metadata with an open license
Technical obstacles: Lack of convergence towards a simple widely adopted standard for exchange of bibliographic metadata suitable for the community information service use case. The necessary data fields are little more than traditional bibtex fields, plus some conventions for handling entity identifiers and links. BibJSON is an attempt at an adequate lightweight data exchange standard, compatible with linked data principles, and influenced by the success of BibTeX and RePEc's Academic Metadata Format. This standard is easily managed and understood by typical community data service managers, even without advanced software tools. Providing and managing/adapting/maintaining good UIs for non-technical curators to manage BibJSON or similar record structures is the biggest technical challenge. Also, supporting the necessary CMS over which these UIs can operate. Needlebase shows promise of providing an adequate UI over a graphical datastore. This is proprietary software, but it should be configurable to import and export linked data. Such systems for managing simple editorial workflows over linked data are greatly needed.
Related Use Cases and Unanticipated Uses
If simple and easily affordable editorial systems are developed for managing collections of biblio data, it is hard to anticipate which agents will emerge to provide the best services on various scales. Communities nest and overlap with each other. They compete for the attention of their members. If communities export their enhancements as linked data, this data may be consumed again by larger aggregators, especially Google and other big players, in ways which which should greatly improve current means of search and discovery of academic information.
- Academic Metadata Format http://amf.openlib.org/doc/ebisu.html
- arXiv http://arxiv.org/
- BibServer http://bibserver.berkeley.edu/cgi-bin/bibs7?source=http://www.stat.berkeley.edu/users/pitman/bibserver.bib
- BibApp http://www.bibapp.org/
- BibJSON http://www.bibkn.org/bibjson/index.html
- BibTeX http://en.wikipedia.org/wiki/BibTeX
- BibSonomy http://www.bibsonomy.org/
- BIBO http://bibliontology.com/
- BKNpeople http://people.bibkn.org/
- BKN Project: http://www.bibkn.org/
- CiTO, the Citation Typing Ontology, by David Shotton. http://dx.doi.org/10.1186/2041-1480-1-S1-S6
- Google Scholar http://scholar.google.com/
- Needlebase http://www.needlebase.com/
- Open Scholar http://scholar.harvard.edu/
- ORCID http://www.orcid.org/
- Probability Abstract Service http://pas.imstat.org/
- RePEc http://repec.org/
- SSRN http://www.ssrn.com/
- The Probability Web http://www.mathcs.carleton.edu/probweb/probweb.html
- VIVO http://www.vivoweb.org/