Difference between revisions of "Draft Relevant Technologies"

From Library Linked Data
Jump to: navigation, search
(Linked Data front-ends to existing data stores)
Line 6: Line 6:
  
 
== Linked Data front-ends to existing data stores==
 
== Linked Data front-ends to existing data stores==
 +
Related Use Case Cluster : [[Cluster VocAlign]]
 +
 
Unlike information represented hierarchically in typical XML documents, resources published as Linked Data allow information to be freed from use-case-specific hierarchies and thus available for unexpected reuse. This not only makes the information easier to mash up, it also makes tools and services easier to mash up. This is true for both producers and consumers of Linked Data. For example, an existing relational database can be mounted as Linked Data and [http://www.w3.org/2001/sw/wiki/SPARQL SPARQL] by using [http://www4.wiwiss.fu-berlin.de/bizer/d2r-server/ D2R Server]. The [http://www.w3.org/2001/sw/rdb2rdf/ W3C RDB2RDF Working Group] is currently working on standards for such mappings. Similarly, Linked Data can be produced from existing [http://code.google.com/p/oclcsrw/wiki/LinkedData SRU databases] with a few [http://en.wikipedia.org/wiki/Rewrite_engine rewrite rules]. If the information is already available from a SPARQL endpoint, then a Linked Data front-end like [http://www4.wiwiss.fu-berlin.de/pubby/ Pubby] can be used to automate the URIs. Last, but not least, XSLT can be useful for converting generic XML into RDF/XML.
 
Unlike information represented hierarchically in typical XML documents, resources published as Linked Data allow information to be freed from use-case-specific hierarchies and thus available for unexpected reuse. This not only makes the information easier to mash up, it also makes tools and services easier to mash up. This is true for both producers and consumers of Linked Data. For example, an existing relational database can be mounted as Linked Data and [http://www.w3.org/2001/sw/wiki/SPARQL SPARQL] by using [http://www4.wiwiss.fu-berlin.de/bizer/d2r-server/ D2R Server]. The [http://www.w3.org/2001/sw/rdb2rdf/ W3C RDB2RDF Working Group] is currently working on standards for such mappings. Similarly, Linked Data can be produced from existing [http://code.google.com/p/oclcsrw/wiki/LinkedData SRU databases] with a few [http://en.wikipedia.org/wiki/Rewrite_engine rewrite rules]. If the information is already available from a SPARQL endpoint, then a Linked Data front-end like [http://www4.wiwiss.fu-berlin.de/pubby/ Pubby] can be used to automate the URIs. Last, but not least, XSLT can be useful for converting generic XML into RDF/XML.
  
 
== Tools for data designers ==
 
== Tools for data designers ==
 +
Related Use Case Cluster : [[Cluster VocAlign]]
 +
 
Another boost for Linked Data is the growing use of [http://www.w3.org/TR/owl-guide/ OWL] for purposes of data design. Prior to OWL, domain experts could use [http://www.w3.org/2001/sw/wiki/RDFS RDFS] to create domain-specific vocabularies, but there was no way to map equivalencies across vocabularies. Among other features, OWL includes an upgrade to RDFS to support [http://www.w3.org/TR/2004/REC-owl-guide-20040210/#OntologyMapping ontology mapping]. This allows experts to describe their domain using community idioms, while still being interoperable with related or more common idioms. A variety of tools related to OWL can be found on the W3C's [http://www.w3.org/2001/sw/wiki/RDF RDF wiki] and [http://www.w3.org/2001/sw/wiki/OWL OWL wiki].
 
Another boost for Linked Data is the growing use of [http://www.w3.org/TR/owl-guide/ OWL] for purposes of data design. Prior to OWL, domain experts could use [http://www.w3.org/2001/sw/wiki/RDFS RDFS] to create domain-specific vocabularies, but there was no way to map equivalencies across vocabularies. Among other features, OWL includes an upgrade to RDFS to support [http://www.w3.org/TR/2004/REC-owl-guide-20040210/#OntologyMapping ontology mapping]. This allows experts to describe their domain using community idioms, while still being interoperable with related or more common idioms. A variety of tools related to OWL can be found on the W3C's [http://www.w3.org/2001/sw/wiki/RDF RDF wiki] and [http://www.w3.org/2001/sw/wiki/OWL OWL wiki].
  
 
== SKOS and related tools ==
 
== SKOS and related tools ==
 +
Related Use Case Cluster : [[Cluster VocAlign]]
 +
 
Yet another key technology boost is being provided by [http://www.w3.org/TR/skos-reference/skos.html SKOS], which is an OWL ontology for dealing with a broad base of conceptual schemes including the management of preferred and alternate labels. Many SKOS-related tools are listed on the W3C's [http://www.w3.org/2001/sw/wiki/SKOS SKOS community wiki].
 
Yet another key technology boost is being provided by [http://www.w3.org/TR/skos-reference/skos.html SKOS], which is an OWL ontology for dealing with a broad base of conceptual schemes including the management of preferred and alternate labels. Many SKOS-related tools are listed on the W3C's [http://www.w3.org/2001/sw/wiki/SKOS SKOS community wiki].
  
 
== Microformats, Microdata and RDFa ==
 
== Microformats, Microdata and RDFa ==
 +
Related Use Case Cluster : [[Cluster Social Uses]]
 +
 
Microformats, Microdata and RDFa all provide ways to embed structured data into web pages. As historically the emphasis on publishing information on the web has had to do with publishing web pages, these technologies provide ways to enhance what is already there rather than necessarily deploying separate infrastructure. RDFa supports expression of RDF data in this way and is therefore the most directly interoperable with other linked data infrastructure.
 
Microformats, Microdata and RDFa all provide ways to embed structured data into web pages. As historically the emphasis on publishing information on the web has had to do with publishing web pages, these technologies provide ways to enhance what is already there rather than necessarily deploying separate infrastructure. RDFa supports expression of RDF data in this way and is therefore the most directly interoperable with other linked data infrastructure.
  
Line 22: Line 30:
  
 
== Web Application Frameworks ==
 
== Web Application Frameworks ==
 +
Related Use Case Cluster : [[Cluster Archives]]
 +
 
As the Web has grown in popularity, the software development community has created a variety of software libraries that make it easier to create, maintain and reuse web applications. These libraries are often referred to as [http://en.wikipedia.org/wiki/Web_application_framework web application frameworks], and typically implement the [http://en.wikipedia.org/wiki/Model_View_Controller Model-View-Controller (MVC)] pattern in some fashion. In addition web application frameworks have typically encoded and encouraged best practices with respect to the [http://en.wikipedia.org/wiki/Representational_State_Transfer REST Architectural Style] and [http://en.wikipedia.org/wiki/Resource-oriented_architecture Resource Oriented Architecture] which have informed much of the standardization around web technologies.  
 
As the Web has grown in popularity, the software development community has created a variety of software libraries that make it easier to create, maintain and reuse web applications. These libraries are often referred to as [http://en.wikipedia.org/wiki/Web_application_framework web application frameworks], and typically implement the [http://en.wikipedia.org/wiki/Model_View_Controller Model-View-Controller (MVC)] pattern in some fashion. In addition web application frameworks have typically encoded and encouraged best practices with respect to the [http://en.wikipedia.org/wiki/Representational_State_Transfer REST Architectural Style] and [http://en.wikipedia.org/wiki/Resource-oriented_architecture Resource Oriented Architecture] which have informed much of the standardization around web technologies.  
  
Line 29: Line 39:
  
 
== Content Management Systems ==
 
== Content Management Systems ==
 +
Related Use Case Cluster : [[Cluster Social Uses]], [[Cluster Digital Objects]], [[Cluster Archives]]
 +
 
Just as web application frameworks have evolved as the Web has spread, so has the class of web applications known as content management systems (CMS). CMS are often built using a web application framework, but provide out-of-the-box functionality for easily creating/editing/presenting content (text, images, video) on the Web, and for managing workflows associated with the content. Since CMS are typically built using web frameworks, the same best practices for naming resources with http URIs are naturally followed. The [http://en.wikipedia.org/wiki/List_of_content_management_systems wide availability] of content management systems has led to heavy use in the cultural heritage sector. Some content management systems such as [http://drupal.org/node/1089804 Drupal] are starting to expose structured database information to machine clients by seamlessly layering it into their HTML using RDFa. As a result, data consumers such as Google Scholar, Google Maps, Facebook, etc. are starting to leverage this structured metadata in their own service offerings. Conversely, Drupal is also starting to make plugins available to consume RDF, such as [http://drupal.org/project/varql VARQL] and [http://drupal.org/project/sparql_views SPARQL Views].
 
Just as web application frameworks have evolved as the Web has spread, so has the class of web applications known as content management systems (CMS). CMS are often built using a web application framework, but provide out-of-the-box functionality for easily creating/editing/presenting content (text, images, video) on the Web, and for managing workflows associated with the content. Since CMS are typically built using web frameworks, the same best practices for naming resources with http URIs are naturally followed. The [http://en.wikipedia.org/wiki/List_of_content_management_systems wide availability] of content management systems has led to heavy use in the cultural heritage sector. Some content management systems such as [http://drupal.org/node/1089804 Drupal] are starting to expose structured database information to machine clients by seamlessly layering it into their HTML using RDFa. As a result, data consumers such as Google Scholar, Google Maps, Facebook, etc. are starting to leverage this structured metadata in their own service offerings. Conversely, Drupal is also starting to make plugins available to consume RDF, such as [http://drupal.org/project/varql VARQL] and [http://drupal.org/project/sparql_views SPARQL Views].
  
 
==Web Services for Library Linked Data==
 
==Web Services for Library Linked Data==
 +
Related Use Case Cluster : [[Cluster BibData]], [[Cluster Authority data]]
 +
 
Theoretically, most domain-specific Web Service API capabilities could be refactored as Linked Data URIs, OWL, SPARQL, and SPARQL/Update. But even though it should be possible to layer a Linked Data URI front-end on an existing back-end datastore, it may not be so easy for the back-end to support SPARQL and SPARQL/Update access. Security, robustness and performance considerations could also preclude supporting SPARQL in production situations. Furthermore, SPARQL endpoints and bulk RDF downloads can facilitate discovery and reuse of the published Linked Data greatly. Most web developers however face a steep learning curve before being able to exploit it, and for many application requirements this is too much of a burden.
 
Theoretically, most domain-specific Web Service API capabilities could be refactored as Linked Data URIs, OWL, SPARQL, and SPARQL/Update. But even though it should be possible to layer a Linked Data URI front-end on an existing back-end datastore, it may not be so easy for the back-end to support SPARQL and SPARQL/Update access. Security, robustness and performance considerations could also preclude supporting SPARQL in production situations. Furthermore, SPARQL endpoints and bulk RDF downloads can facilitate discovery and reuse of the published Linked Data greatly. Most web developers however face a steep learning curve before being able to exploit it, and for many application requirements this is too much of a burden.
  

Revision as of 18:01, 27 June 2011

Relevant Technologies

Linked Data is an emerging technology, so most tools are still developmental. Fortunately, the principles of Linked Data are not tied to any particular tool, rather they are tied to Web standards themselves. In many situations, production and consumption of Linked Data can be layered or interwoven with existing applications without the need for massive redevelopment efforts. The following examples are not exhaustive, but are intended to illustrate a few broad categories. From a non-technical perspective, these technologies are relevant because they support the creation and use of HTTP URIs that identify and describe discrete and recognizable individuals.

Discrete and bulk access to information

The Semantic Web has been around many years, but Linked Data gives it a major boost in the form of "Cool URIs". Linked Data http URIs are "Cool" because raw RDF can be easily and automatically negotiated and rendered into an HTML format for human (browser) consumption. The DBpedia resource for http://dbpedia.org/resource/Jane_Austen is a good example. This is great for diagnosing data and serendipitous discovery, but the atomic nature of Linked Data http URIs makes it impractical for high volume network access. Fortunately, more and more Linked Datasets are being published in bulk and consistently described using the VoID Vocabulary.

Linked Data front-ends to existing data stores

Related Use Case Cluster : Cluster VocAlign

Unlike information represented hierarchically in typical XML documents, resources published as Linked Data allow information to be freed from use-case-specific hierarchies and thus available for unexpected reuse. This not only makes the information easier to mash up, it also makes tools and services easier to mash up. This is true for both producers and consumers of Linked Data. For example, an existing relational database can be mounted as Linked Data and SPARQL by using D2R Server. The W3C RDB2RDF Working Group is currently working on standards for such mappings. Similarly, Linked Data can be produced from existing SRU databases with a few rewrite rules. If the information is already available from a SPARQL endpoint, then a Linked Data front-end like Pubby can be used to automate the URIs. Last, but not least, XSLT can be useful for converting generic XML into RDF/XML.

Tools for data designers

Related Use Case Cluster : Cluster VocAlign

Another boost for Linked Data is the growing use of OWL for purposes of data design. Prior to OWL, domain experts could use RDFS to create domain-specific vocabularies, but there was no way to map equivalencies across vocabularies. Among other features, OWL includes an upgrade to RDFS to support ontology mapping. This allows experts to describe their domain using community idioms, while still being interoperable with related or more common idioms. A variety of tools related to OWL can be found on the W3C's RDF wiki and OWL wiki.

SKOS and related tools

Related Use Case Cluster : Cluster VocAlign

Yet another key technology boost is being provided by SKOS, which is an OWL ontology for dealing with a broad base of conceptual schemes including the management of preferred and alternate labels. Many SKOS-related tools are listed on the W3C's SKOS community wiki.

Microformats, Microdata and RDFa

Related Use Case Cluster : Cluster Social Uses

Microformats, Microdata and RDFa all provide ways to embed structured data into web pages. As historically the emphasis on publishing information on the web has had to do with publishing web pages, these technologies provide ways to enhance what is already there rather than necessarily deploying separate infrastructure. RDFa supports expression of RDF data in this way and is therefore the most directly interoperable with other linked data infrastructure.

Microdata, which is defined with the new HTML5, provides another way of doing this. It has noteably gained prominence for Search Engine Optimisation purposes with the announcement of http://schema.org/ by Google, Microsoft and Yahoo. This particular type of microdata does not appear to be intended to represent arbitrarily complex data and the vocabulary that they have published places special emphasis on commerce and tourism. Though it is in principle extensible it would require a lot of extension to express library information in this way as most of the required vocabulary is lacking. There is some level of interoperability with linked data thanks to the efforts at http://schema.rdfs.org/ but at this time it seems like it would be difficult to cultivate the high level of interconnectedness between library and other datasets that is possible with linked data using this approach.

It should be noted that the http://schema.org/ protagonists do support harvesting of RDFa data and have pledged to continue doing so, therefore it does not appear to be the case that by publishing HTML pages marked up with RDFa one might somehow "miss out" on the opportunities afforded by microdata. Modulo bugs in the search engines' parsers it is even possible to do both in the same web page. If for some reason it is not possible to make use of the full expressive power of RDF with RDFa, some structured data is better than none.

Web Application Frameworks

Related Use Case Cluster : Cluster Archives

As the Web has grown in popularity, the software development community has created a variety of software libraries that make it easier to create, maintain and reuse web applications. These libraries are often referred to as web application frameworks, and typically implement the Model-View-Controller (MVC) pattern in some fashion. In addition web application frameworks have typically encoded and encouraged best practices with respect to the REST Architectural Style and Resource Oriented Architecture which have informed much of the standardization around web technologies.

A common component to web application frameworks is a URI routing mechanism, which allows software developers to define http URI patterns, and map them to controllers, which in turn generate an HTTP response using the appropriate views and models. This activity encourages best practices with respect to Cool URIs, and also forces the developer to think about the resources that she is making available on the Web. Linked Data's focus on naming resources with http URIs, and delivering representations of them (HTML for humans, and RDF for machines) makes it a natural fit for web application frameworks which already provide some of the scaffolding for these activities. The wide availability of web application frameworks in many different programming languages and operating system environments has led to them being heavily used in the cultural heritage sector.

However web developers are sometimes turned off Semantic Web (Linked Data) technologies because they feel like they would need to throw away their current application, to swap their database for a triplestore, and their database query language for SPARQL. This is simply not the case, since RDF serializations can be generated on the fly just as web application frameworks do fo HTML, XML and JSON representations. The use of http URIs to identify and link together resources in RDF's data model make it a natural choice for serializing and sharing entity state in a database neutral way--which has traditionally been of great interest to cultural heritage organizations and the digital preservation community.

Content Management Systems

Related Use Case Cluster : Cluster Social Uses, Cluster Digital Objects, Cluster Archives

Just as web application frameworks have evolved as the Web has spread, so has the class of web applications known as content management systems (CMS). CMS are often built using a web application framework, but provide out-of-the-box functionality for easily creating/editing/presenting content (text, images, video) on the Web, and for managing workflows associated with the content. Since CMS are typically built using web frameworks, the same best practices for naming resources with http URIs are naturally followed. The wide availability of content management systems has led to heavy use in the cultural heritage sector. Some content management systems such as Drupal are starting to expose structured database information to machine clients by seamlessly layering it into their HTML using RDFa. As a result, data consumers such as Google Scholar, Google Maps, Facebook, etc. are starting to leverage this structured metadata in their own service offerings. Conversely, Drupal is also starting to make plugins available to consume RDF, such as VARQL and SPARQL Views.

Web Services for Library Linked Data

Related Use Case Cluster : Cluster BibData, Cluster Authority data

Theoretically, most domain-specific Web Service API capabilities could be refactored as Linked Data URIs, OWL, SPARQL, and SPARQL/Update. But even though it should be possible to layer a Linked Data URI front-end on an existing back-end datastore, it may not be so easy for the back-end to support SPARQL and SPARQL/Update access. Security, robustness and performance considerations could also preclude supporting SPARQL in production situations. Furthermore, SPARQL endpoints and bulk RDF downloads can facilitate discovery and reuse of the published Linked Data greatly. Most web developers however face a steep learning curve before being able to exploit it, and for many application requirements this is too much of a burden.

Web Services for the most common uses should be be offered as an alternative. Most Web Service APIs tend to be domain-specific, though, and require custom-coded agents. This means they should be well-documented. More general approaches to web service interfaces include OpenSearch (which can be documented using a Description Document), the Linked Data API and ongoing work of the W3C RDF Web Applications Working Group on RDF and RDFa APIs. Some Linked Datasets could also benefit from syndicated access using Atom Syndication Format and/or RSS.

A few Linked Data implementations have endeavored to implement Web Services to enhance discovery and use of resources, often by providing some form of an application programming interface (API). Agrovoc and STW provide an API to discover resources based on relationships in the data, among many more web services. VIAF, Library of Congress, and STW offer autosuggest services for resources, delivering JSON responses ready for consumption in AJAX browser applications (In principle, though, JSON could be content-negotiable via the Linked Data URI, just like HTML and RDF.) Agrovoc and STITCH/CATCH include support for RDF responses Some services provide full-fledged SOAP APIs, while others support a RESTful approach.

By focusing on request parameters and response formats to provide enhanced discovery, Linked Data Web Services diminish, if not eliminate, the requirement that data be stored in a triplestore or be made searchable via SPARQL. And, because web service APIs are common, web services can lower the barrier to entry.