Cluster Citations

From Library Linked Data
Jump to: navigation, search

Citation Cluster

Authors: Kai Eckert, Peter Murray, Ed Summers


First, it is important to consider what a citation is:

Broadly, a citation is a reference to a published or unpublished source (not always the original source). More precisely, a citation is an abbreviated alphanumeric expression (e.g. “[Newell84]”) embedded in the body of an intellectual work that denotes an entry in the bibliographic references section of the work for the purpose of acknowledging the relevance of the works of others to the topic of discussion at the spot where the citation appears. Generally the combination of both the in-body citation and the bibliographic entry constitutes what is commonly thought of as a citation (whereas bibliographic entries by themselves are not). A prime purpose of a citation is intellectual honesty; to attribute to other authors the ideas they have previously expressed, rather than give the appearance to the work's readers that the work's authors are the original wellsprings of those ideas. Wikipedia

Based on that definition, citation can be seen as the activity of relating one piece of content with another by an author, using bibliographic information. It is interesting to note that Wikipedia’s definition is based on traditionally cited materials (books, articles, etc), however we consider citation of resources like datasets to be in scope for this cluster.

Topic in the Context of Linked Data

A citation in linked data context is usually seen as a typed link to a referenced resource. Traditionally, as seen in the Wikipedia definition, a citation is a description of the referenced resource. A citation may not include a URI to create a link in the linked data context. Indeed, referenced resources often do not have a URI, or it is a non-trivial task to discover what an appropriate URI is.

We consider a citation to be a description of another resource, embedded in a particular context (an article, a book, etc) that enables the resource to be identified located and retrieved. A complete description of the referenced resource is considered out of scope, and more appropriate for other topic groups such as the Bibliographic Data Cluster.

We see several levels of granularity (or specificity) of a citation between documents, each level building on the previous with increased amounts of information about the connection between citing and cited documents. We want to be able to express citations as we find them in documents (e.g. entries in a bibliography, as extracted from a scanned book) and expand the amount of data when available (a URI for the resource). At a fundamental level, bringing a citation into the linked data realm means that a URI is assigned to it for others to build on, as outlined in these levels, an example would be

  • Level 1: Citations as string as found in the text of a publication (Book, Article, etc)
  • Level 2: Structured as a result of automatic/manual parsing: e.g. Dublin Core, Bibliontology
  • Level 3: Linked with bibliographic records, i.e. the URI of the cited resource.
  • Level 4: Further citation context: What is cited, Position in the cited document, Position in the citing document, further qualifications (agrees, disagrees, ...)

Scenarios (Case Studies)

The following scenarios from the LLD XG were incorporated to create this document:

The following scenario was later added, but also contains use-cases regarding citations:

Extracted Use Cases

In this section, we list use cases in a very narrow sense that were extracted from the above mentioned scenarios or made up additionally. A use case in this narrow sense means a specific action that an end-user might want to perform that includes the citation data as we have defined it here.

The purpose of such use cases typically includes the extraction of requirements that then can be fulfilled by the underlying implementation. In turn these use cases also provide a rationale for each requirement and explain, why this requirement is needed. To illustrate this, we added a notion of some requirements in italics.

  • Creation of an enhanced representation of publications, where the cited reference is directly accessible from the citation (Position in the cited/citing document, What was cited)
  • Make it possible for the user to click from a citation directly to the location in the references publication (URI or other resolver mechanism, like OpenURL)
  • Determine the value of a resource (easily and automatically) by analyzing the content of citations to that work (backlinks, optional: further qualifications like agrees/disagrees)
  • Find other publications that build upon the same cited resource to include them in my “Related work” section (backlinks, optional: qualifications like “Extends”, “builds upon”, ...).
  • tbc...

Vocabularies and Technologies

Illustrative Example

The following example was created in the hope that it iluustrates the text above. It is by no means very elaborated.

Temporary Questions and Notes

  • Does the institutional context for OpenURL mean that it doesn’t necessarily fit with global identifiers as deployed in Linked Data? Does it call for the need for a global URL space/service that redirects to an institutional context (e.g. like the Shibboleth Where-Are-You-From service)?
  • Is citation different from identification?
  • What about citing things that aren’t publications, e.g. datasets.
  • formats for citations: very closely related to Bibliography Cluster
  • JAR 20110304: I checked the OED and their definition of 'citation' is quite different from Wikipedia's. OED says it's an act of citing, summoning, or quoting, not just a short string. The short string might be a participant in such an act but is not the whole thing. On the other hand van Leunen's Handbook for Scholars uses the word in the short-string sense, probably due to the lack of any better word for that sense.
  • JAR 20110304: You might want to check out SWAN and the more recent Annotation Ontology, which cover this territory.