Draft Vocabularies Datasets As Current Situation

From Library Linked Data
Jump to: navigation, search


Data availability

The success of linked library data relies on the ability of its practitioners to identify, re-use or connect to existing datasets and data models. Linked datasets and vocabularies that are essential in the library and related domains, however, have previously been unknown or unfamiliar to many. The LLD XG has thus initiated an inventory of available library-related linked data, which is presented in further detail in Section @@TODO when the doc is put in HTML@@ and has led to the observations below.

Bibliographic datasets have received less attention than value vocabularies and element sets

Many metadata element sets and value vocabularies have been released as linked data over the past couple of years, including some flagship value vocabularies already used by many libraries, such as the Library of Congress Subject Headings, or the Dewey Decimal Classification. It is also encouraging to see that reference metadata frameworks are also provided in a linked data-compatible form, including Dublin Core or various FRBR implementations.

However, there are not yet many bibliographic datasets available in the linked data space. More linked data is also needed for other types of resources (metadata for journal articles, citation-level data, circulation information, etc.), which can be relevant in an environment where all this data can be (re-)used seamlessly across contexts. Examples such the release of the British National Bibliography show that there is considerable work tackling challenges such as licensing, data modeling, handling legacy data and collaboration with (multiple) user communities. But they also point at considerable benefit involved in releasing bibliographic databases as Linked Data. As the community's experience increases, the number of datasets released as linked data keeps increasing at a fast pace.

Quality and support for available data varies greatly

The level of maturity or stability of available resources varies greatly. Many resources we found are the result of ongoing project work, or the result of individual initiatives, and describe themselves as prototypes rather than mature offerings. The abundance of such efforts is a sign of activity around and interest in library linked data. This type of agile prototyping is definitely compatible with the agile development process that linked data supports. At the same time, this jeopardizes the long-term availability and support for library linked data resources.

From this perspective, we find it encouraging that more and more established institutions are committing resources to linked data projects, from the national libraries of Sweden, Hungary, Germany, France, the Library of Congress and the British Library, to the Food and Agriculture Organization of the United Nations, not to mention OCLC. These institution can provide a stable base on which library linked data will build over time.

Linking across datasets has begun but requires further effort and coordination

Establishing connections across datasets is a core aspect of linked data technology, and a key condition to its success. A quick look at available data (see Section @@TODO) shows that many semantic links are already available across published value vocabularies, which is a great achievement for the nascent library linked data community as a whole. But more can -- and should -- be done to resolve the issue of data redundancy in the various authority resources that library and related organizations maintain. A similar statement can be made about other datasets as well and the metadata element sets used to structure linked data descriptions. The main bottlenecks are the rather low level of long-term vocabulary support, the limited communication between vocabulary developers, and the lack of mature tools to lower the cost for data publishers to produce the large amount of semantic links needed between datasets. However, there is an initiation of efforts to facilitate sharing knowledge among participants in this area as well as the production and sharing of relevant links. We discuss this specific issue further in a section on "the linking issue" at the end of this report.