Draft Vocabularies Datasets As Current Situation

From Library Linked Data

(Difference between revisions)
Jump to: navigation, search
(More has been done on reference vocabularies and element sets than on bibliographic datasets)
(Data availability)
(One intermediate revision not shown)
Line 1: Line 1:
==Data availability==
==Data availability==
-
=== More has been done on reference vocabularies and element sets than on bibliographic datasets ===
+
 
 +
The success of linked library data relies on the ability of its practitioners to identify, re-use or connect to existing datasets and data models. Linked datasets and vocabularies that are essential in the library and related domains, however, have previously been unknown or unfamiliar to many. The LLD XG has thus initiated an inventory of available library-related linked data, which is presented in further detail in [http://www.w3.org/2005/Incubator/lld/wiki/DraftReportWithTransclusion-Multiple_Reports#Available_Vocabularies_and_Datasets Section @@TODO@@] and has lead to the observations below.
 +
 
 +
=== More has been done on value vocabularies and element sets than on bibliographic datasets ===
Many metadata element sets and value vocabularies have been released as linked data over the past couple of years, including some flagship value vocabularies already used by many libraries, such as the [http://www.w3.org/2005/Incubator/lld/wiki/Vocabulary_and_Dataset#Library_of_Congress_Subject_Headings Library of Congress Subject Headings], or the [http://www.w3.org/2005/Incubator/lld/wiki/Vocabulary_and_Dataset#DDC_summaries Dewey Decimal Classification]. It is also encouraging to see that reference metadata frameworks are also provided in a linked data-compatible form, including [http://www.w3.org/2005/Incubator/lld/wiki/Vocabulary_and_Dataset#Dublin_Core Dublin Core] or various [http://www.w3.org/2005/Incubator/lld/wiki/Vocabulary_and_Dataset#FRBR_Ontologies FRBR implementations].
Many metadata element sets and value vocabularies have been released as linked data over the past couple of years, including some flagship value vocabularies already used by many libraries, such as the [http://www.w3.org/2005/Incubator/lld/wiki/Vocabulary_and_Dataset#Library_of_Congress_Subject_Headings Library of Congress Subject Headings], or the [http://www.w3.org/2005/Incubator/lld/wiki/Vocabulary_and_Dataset#DDC_summaries Dewey Decimal Classification]. It is also encouraging to see that reference metadata frameworks are also provided in a linked data-compatible form, including [http://www.w3.org/2005/Incubator/lld/wiki/Vocabulary_and_Dataset#Dublin_Core Dublin Core] or various [http://www.w3.org/2005/Incubator/lld/wiki/Vocabulary_and_Dataset#FRBR_Ontologies FRBR implementations].
-
However, there are not yet many bibliographic datasets available in the linked data space. Undertakings such the release of the [http://www.bl.uk/bibliographic/datafree.html British National Bibliography] show that the difficulties involved (many discussed in this report) are considerable. Also, one may argue that descriptions of individual books and other library-held items are slightly less important than metadata element sets and value vocabularies, as far as re-use is concerned. And indeed, tools like union catalogues already realize a significant level of exchange of book-level data. Yet it is crucial -- and it is  truly one of the expected benefits of linked data applied in our domain -- that library-related datasets get published and interconnected, rather than continue to exist in their own silos.
+
However, there are not yet many bibliographic datasets available in the linked data space. Examples such the release of the [http://www.bl.uk/bibliographic/datafree.html British National Bibliography] show that there are indeed considerable difficulties involved (many discussed in this report). However, this proves not deterring enough, and the number of datasets released as linked data keeps increasing at a fast pace.
=== Quality and support for available data varies greatly ===
=== Quality and support for available data varies greatly ===
-
The level of maturity or stability of available resources vary greatly. Many resources we found are the result of (ongoing) project work, or the result of individual initiatives, and advertise themselves as mere prototypes. The abundance of such efforts is a sign of healthy activity going on in the library linked data domain. In fact it should come as no surprise, when the whole linked data endeavor encourages a much more agile view on data than in any previous paradigm. Yet this somehow jeopardizes the long-term availability and support for library linked data resources.
+
The level of maturity or stability of available resources vary greatly. Many resources we found are the result of (ongoing) project work, or the result of individual initiatives, and advertise themselves as mere prototypes. The abundance of such efforts is a sign of healthy activity going on in the library linked data domain. It should come as no surprise, when the whole linked data endeavor encourages a much more agile view on data than in any previous paradigm. Yet this somehow jeopardizes the long-term availability and support for library linked data resources.
From this perspective, we find it encouraging that more and more established institutions are committing resources to linked data projects, from the national libraries of Sweden, Hungary, Germany, France, the Library of Congress and the British Library, to the Food and Agriculture Organization of the United Nations, not to mention OCLC.
From this perspective, we find it encouraging that more and more established institutions are committing resources to linked data projects, from the national libraries of Sweden, Hungary, Germany, France, the Library of Congress and the British Library, to the Food and Agriculture Organization of the United Nations, not to mention OCLC.
Line 14: Line 17:
=== Linking across datasets has begun but requires further effort and coordination ===
=== Linking across datasets has begun but requires further effort and coordination ===
-
Establishing connections across various datasets is a core aspect of linked data technology, and a key condition to its success.
+
Establishing connections across datasets is a core aspect of linked data technology, and a key condition to its success. A quick look at available data (see [http://www.w3.org/2005/Incubator/lld/wiki/DraftReportWithTransclusion-Multiple_Reports#Available_Vocabularies_and_Datasets Section @@TODO]) shows that many semantic links are already available across published value vocabularies, which is a great achievement for the nascent library linked data community as a whole. But more can -- and should -- be done to alleviate the issue of data redundancy in the various authority resources that library and related organizations maintain. A similar statement can be made about other datasets and the metadata element sets used to structure linked data descriptions. The two main bottlenecks are the rather low level of long-term vocabulary support and communication between vocabulary developers, and the lack of mature tooling to lower the cost for data publishers to produce a massive amount of semantic links across datasets. However, efforts are being carried out, which facilitate the exchange of experience, as well as the production and sharing of relevant links. We discuss this specific issue further at [http://www.w3.org/2005/Incubator/lld/wiki/DraftReportWithTransclusion-Multiple_Reports#The_linking_issue the end of this report].
-
Many semantic links across value vocabularies are already available, some of them obtained through high-quality manual work, like in the [http://www.d-nb.de/eng/wir/projekte/macs.htm MACS] or [http://www.d-nb.de/eng/wir/projekte/crisscross.htm CRISSCROSS] projects. And many value vocabulary publishers clearly strive to establish and maintain links to resources that are close to theirs. [http://viaf.org/ VIAF], for example, merges authority records from over a dozen national and regional agencies. And although quantitative evaluation was outside the scope of our effort, we hypothesize that many more such links are possible. Consumers of library linked data should be aware of the ''open world assumption'' that characterizes it, i.e., data cannot generally be assumed to be complete, and more data could always be released for any given entity.
+
-
 
+
-
A similar concern can be voiced regarding metadata element sets. As testified in the [http://labs.mondeca.com/dataset/lov/ Linked Open Vocabularies] inventory, practitioners generally follow the good practice of re-using existing element sets or building "application profiles" of them. And some projects, such as the [http://www.w3.org/2005/Incubator/lld/wiki/Library_Data_Resources#Vocabulary_mapping_framework Vocabulary Mapping Framework], aim at supporting that process. But the lack of long-term support for them threatens their enduring meaning and common understanding. Further, some reference frameworks, notably FRBR, have been implemented in different RDF vocabularies, which are not always connected together. Such situation lowers the semantic interoperability of the datasets expressed using these RDF vocabularies. The community should encourage the coordinated re-use of element sets for particular entity descriptions, their extension through, e.g., [http://dublincore.org/documents/singapore-framework/ application profiles], or their alignment using, e.g., semantic relations from [http://www.w3.org/TR/rdf-schema/#ch_subclassof RDFS] and [http://www.w3.org/TR/2009/REC-owl2-primer-20091027/#Ontology_Management OWL]. Here, we hope that better communication between the creators and maintainers of these resources, as encouraged by our own incubator group or the [http://lod-lam.net/summit/ LOD-LAM initiative], will help to consolidate the conceptual connections between them.
+
-
 
+
-
At the level of datasets, one may observe the same phenomenon as for the previous categories. For example, Open Library has started attaching OCLC numbers to its manifestations. We note however that efforts are being undertaken, and that the community is already well aware of challenges such as the [http://www.w3.org/2005/Incubator/lld/wiki/DraftReportWithTransclusion#Consider_migration_strategies "de-duplication"] one.
+
-
 
+
-
We also observe that links are being built between library-originated resources and resources originating in other organizations or domains, DBpedia being an obvious case. Again, VIAF provides an example by taking the merged authority records and linking them to DBpedia whenever possible. This illustrates one of the expected benefits of linked data, where data can be easily networked, irrespective of its origins. The library domain can thus benefit from re-using data from other fields, while library data can itself contributes to initiatives that do not strictly fall into the library scope. In the same vein, LLD efforts could benefit from the availability of generic tools for linking data such as [http://www4.wiwiss.fu-berlin.de/bizer/silk/ Silk - Link Discovery Framework], [http://code.google.com/p/google-refine/ Google Refine], or [http://code.google.com/p/google-refine/wiki/ReconciliationServiceApi Google Refine Reconciliation Service API]. However, the community needs to gain experience using them, sharing linking results, and possibly building more tools that are better suited to the LLD environment.
+

Revision as of 22:12, 12 August 2011

Contents

Data availability

The success of linked library data relies on the ability of its practitioners to identify, re-use or connect to existing datasets and data models. Linked datasets and vocabularies that are essential in the library and related domains, however, have previously been unknown or unfamiliar to many. The LLD XG has thus initiated an inventory of available library-related linked data, which is presented in further detail in Section @@TODO@@ and has lead to the observations below.

More has been done on value vocabularies and element sets than on bibliographic datasets

Many metadata element sets and value vocabularies have been released as linked data over the past couple of years, including some flagship value vocabularies already used by many libraries, such as the Library of Congress Subject Headings, or the Dewey Decimal Classification. It is also encouraging to see that reference metadata frameworks are also provided in a linked data-compatible form, including Dublin Core or various FRBR implementations.

However, there are not yet many bibliographic datasets available in the linked data space. Examples such the release of the British National Bibliography show that there are indeed considerable difficulties involved (many discussed in this report). However, this proves not deterring enough, and the number of datasets released as linked data keeps increasing at a fast pace.

Quality and support for available data varies greatly

The level of maturity or stability of available resources vary greatly. Many resources we found are the result of (ongoing) project work, or the result of individual initiatives, and advertise themselves as mere prototypes. The abundance of such efforts is a sign of healthy activity going on in the library linked data domain. It should come as no surprise, when the whole linked data endeavor encourages a much more agile view on data than in any previous paradigm. Yet this somehow jeopardizes the long-term availability and support for library linked data resources.

From this perspective, we find it encouraging that more and more established institutions are committing resources to linked data projects, from the national libraries of Sweden, Hungary, Germany, France, the Library of Congress and the British Library, to the Food and Agriculture Organization of the United Nations, not to mention OCLC.

Linking across datasets has begun but requires further effort and coordination

Establishing connections across datasets is a core aspect of linked data technology, and a key condition to its success. A quick look at available data (see Section @@TODO) shows that many semantic links are already available across published value vocabularies, which is a great achievement for the nascent library linked data community as a whole. But more can -- and should -- be done to alleviate the issue of data redundancy in the various authority resources that library and related organizations maintain. A similar statement can be made about other datasets and the metadata element sets used to structure linked data descriptions. The two main bottlenecks are the rather low level of long-term vocabulary support and communication between vocabulary developers, and the lack of mature tooling to lower the cost for data publishers to produce a massive amount of semantic links across datasets. However, efforts are being carried out, which facilitate the exchange of experience, as well as the production and sharing of relevant links. We discuss this specific issue further at the end of this report.

Personal tools