Draft Relevant Technologies

From Library Linked Data

(Difference between revisions)
Jump to: navigation, search
(Web Services for Library Linked Data)
(Microformats, Microdata and RDFa)
Line 24: Line 24:
Yet another key technology boost is being provided by [http://www.w3.org/TR/skos-reference/skos.html SKOS], which is an OWL ontology for dealing with a broad base of conceptual schemes including the management of preferred and alternate labels. Many SKOS-related tools are listed on the W3C's [http://www.w3.org/2001/sw/wiki/SKOS SKOS community wiki].
Yet another key technology boost is being provided by [http://www.w3.org/TR/skos-reference/skos.html SKOS], which is an OWL ontology for dealing with a broad base of conceptual schemes including the management of preferred and alternate labels. Many SKOS-related tools are listed on the W3C's [http://www.w3.org/2001/sw/wiki/SKOS SKOS community wiki].
-
=== Microformats, Microdata and RDFa ===
+
=== Microformats, Microdata, and RDFa ===
Related Use Case Cluster: [[Cluster Social Uses]]
Related Use Case Cluster: [[Cluster Social Uses]]
-
Microformats, Microdata and RDFa all provide ways to embed structured data into web pages. As historically the emphasis on publishing information on the web has had to do with publishing web pages, these technologies provide ways to enhance what is already there rather than necessarily deploying separate infrastructure. RDFa supports expression of RDF data in this way and is therefore the most directly interoperable with other linked data infrastructure.
+
Microformats, Microdata, and RDFa all provide ways to embed structured data into Web pages. As historically the emphasis on publishing information on the Web has meant publishing Web pages, these technologies provide ways to enhance what is already there rather than necessarily deploying additional infrastructure. RDFa supports the expression of RDF data embedded directly in Web pages; of the three, therefore, it is the most directly interoperable with other Linked Data infrastructure.
-
[http://www.w3.org/TR/microdata/ Microdata], which is defined with the new [http://www.w3.org/TR/html5/ HTML5], provides another way of doing this. It has noteably gained prominence for Search Engine Optimisation purposes with the announcement of http://schema.org/ by Google, Microsoft and Yahoo. This particular type of microdata does not appear to be intended to represent arbitrarily complex data and the vocabulary that they have published places special emphasis on commerce and tourism. Though it is in principle extensible it would require a lot of extension to express library information in this way as most of the required vocabulary is lacking. There is some level of interoperability with linked data thanks to the efforts at http://schema.rdfs.org/ but at this time it seems like it would be difficult to cultivate the high level of interconnectedness between library and other datasets that is possible with linked data using this approach.
+
[http://www.w3.org/TR/microdata/ Microdata], which is defined in new [http://www.w3.org/TR/html5/ HTML5 specification] under development, provides another way of doing this. Microdata has notably gained prominence for Search Engine Optimisation purposes with the announcement of [http://schema.org/ Schema.org] by Google, Microsoft, and Yahoo. This particular type of microdata does not appear to be intended to represent arbitrarily complex data, and the vocabulary that they have published places special emphasis on commerce and tourism. Though in principle extensible, the microdata schemas would require a lot of extension to express library information, as most of the required vocabulary is lacking. There is some level of interoperability with Linked Data thanks to the efforts of [http://schema.rdfs.org/ Schema.RDFS.org], but it currently seems like it would be difficult, using this approach, to cultivate the high level of interconnectedness between library and other datasets that is possible with Linked Data.
-
It should be noted that the http://schema.org/ protagonists do support harvesting of RDFa data and have pledged to continue doing so, therefore it does not appear to be the case that by publishing HTML pages marked up with RDFa one might somehow "miss out" on the opportunities afforded by microdata. Modulo bugs in the search engines' parsers it is even possible to do both in the same web page. If for some reason it is not possible to make use of the full expressive power of RDF with RDFa, some structured data is better than none.
+
It should be noted that the Schema.org protagonists do support harvesting of RDFa data and have pledged to continue doing so, so it does not appear to be the case that by publishing HTML pages marked up with RDFa one might somehow "miss out" on the opportunities afforded by microdata. But for bugs in the search engines' parsers, it should even be possible to do both in the same Web page. If for some reason it is not possible to make use of the full expressive power of RDF with RDFa, some structured data is better than none.
=== Web Application Frameworks ===
=== Web Application Frameworks ===

Revision as of 22:24, 4 September 2011

Contents

Appendix B: Relevant Technologies

Linked Data is an emerging technology, so most tools are still developmental. The principles of Linked Data are not tied to any particular tool; rather, they are tied directly to Web standards. In many situations, the production and consumption of Linked Data can be layered or interwoven with existing applications without requiring massive redevelopment efforts. The list of tools and technologies is not exhaustive, but are intended to illustrate a few broad categories. From a non-technical perspective, these technologies are relevant because they encourage the creation and discovery of reusable vocabularies and provide ways to combine those terms into reusable (syntactic) statements.

Using Uniform Resource Identifiers (URIs) to identify things that aren't located on the Web

In the early days of the Web, it was unclear whether "HTTP URIs" (also known as "URLs") should be used to identify things that are not "located" on the Web. That concern was the basis for defining new URI schemes such as URNs and "info" URIs. These uncertainties were eventually resolved by a report from the W3C Uniform Resource Identifier Interest Group (RFC 3305) and a resolution of the W3C Technical Advisory Group on the issue known as "httpRange-14". In the Linked Data paradigm, it is generally expected that HTTP URIs will also be used to identify "real world objects." Nevertheless, many applications have been built on the other identifier schemes. Using owl:sameAs is a good way to map these non-resolvable URI schemes to their HTTP URI equivalents. Even if this mapping is not done, non-resolvable URIs are still useful in RDF and SPARQL.

Discrete and bulk access to information

The principles of Linked Data were introduced around 2006 leading to the formalization of "Cool URIs for the Semantic Web" in 2008. What makes Linked Data identifiers special is the ability to help humans and machines understand, process, and link information across a wide-variety of use cases. The DBpedia resource for Jane Austen (http://dbpedia.org/resource/Jane_Austen) is a good example. Resolvable URIs are great for casual use, diagnosing data, and serendipitous discovery, but discrete HTTP GET requests may be impractical for dataset with a large number of individuals. Fortunately, more and more Linked Datasets are being published as RDF dumps and consistently described using the VoID Vocabulary.

Front ends for mapping existing data stores to Linked Data/RDF

Related Use Case Cluster: Cluster VocAlign

Unlike information represented hierarchically in typical XML documents, resources published as Linked Data allow information to be freed from use-case-specific hierarchies and thus available for unexpected reuse. This not only makes the information easier to mash up, it also makes tools and services easier to mash up. This is true for both producers and consumers of Linked Data. For example, an existing relational database can be mounted as Linked Data and SPARQL by using D2R Server. The W3C RDB2RDF Working Group is currently working on standards for such mappings. Similarly, Linked Data can be produced from existing SRU databases with a few rewrite rules. If the information is already available from a SPARQL endpoint, then a Linked Data front-end like Pubby can be used to automate the URIs. Last, but not least, XSLT can be useful for converting generic XML into RDF/XML.

Tools for data designers

Related Use Case Cluster: Cluster VocAlign

Application profiles provide a popular way to document how a community of practice defines a domain model and a pattern for re-using particular vocabularies with particular constraints in describing particular types of resources. The current version of OWL Web Ontology Language, which provides properties to represent alignments across vocabularies (ontology mappings), allows experts to describe their domain using community idioms while remaining interoperable with related or more common idioms. A variety of tools related to OWL can be found on the W3C's RDF wiki and OWL wiki. Unified Modeling Language (UML) tools are also value to help designers represent and manipulate domain models visually. The Ontology Definition Metamodel (ODM) specification should help bridge some of the gaps between UML and OWL.

SKOS and related tools

Related Use Case Cluster: Cluster VocAlign

Yet another key technology boost is being provided by SKOS, which is an OWL ontology for dealing with a broad base of conceptual schemes including the management of preferred and alternate labels. Many SKOS-related tools are listed on the W3C's SKOS community wiki.

Microformats, Microdata, and RDFa

Related Use Case Cluster: Cluster Social Uses

Microformats, Microdata, and RDFa all provide ways to embed structured data into Web pages. As historically the emphasis on publishing information on the Web has meant publishing Web pages, these technologies provide ways to enhance what is already there rather than necessarily deploying additional infrastructure. RDFa supports the expression of RDF data embedded directly in Web pages; of the three, therefore, it is the most directly interoperable with other Linked Data infrastructure.

Microdata, which is defined in new HTML5 specification under development, provides another way of doing this. Microdata has notably gained prominence for Search Engine Optimisation purposes with the announcement of Schema.org by Google, Microsoft, and Yahoo. This particular type of microdata does not appear to be intended to represent arbitrarily complex data, and the vocabulary that they have published places special emphasis on commerce and tourism. Though in principle extensible, the microdata schemas would require a lot of extension to express library information, as most of the required vocabulary is lacking. There is some level of interoperability with Linked Data thanks to the efforts of Schema.RDFS.org, but it currently seems like it would be difficult, using this approach, to cultivate the high level of interconnectedness between library and other datasets that is possible with Linked Data.

It should be noted that the Schema.org protagonists do support harvesting of RDFa data and have pledged to continue doing so, so it does not appear to be the case that by publishing HTML pages marked up with RDFa one might somehow "miss out" on the opportunities afforded by microdata. But for bugs in the search engines' parsers, it should even be possible to do both in the same Web page. If for some reason it is not possible to make use of the full expressive power of RDF with RDFa, some structured data is better than none.

Web Application Frameworks

Related Use Case Cluster: Cluster Archives

As the Web has grown in popularity, the software development community has created a variety of software libraries that make it easier to create, maintain and reuse web applications. These libraries are often referred to as web application frameworks, and typically implement the Model-View-Controller (MVC) pattern in some fashion. In addition web application frameworks have typically encoded and encouraged best practices with respect to the REST Architectural Style and Resource Oriented Architecture which have informed much of the standardization around web technologies.

A common component to web application frameworks is a URI routing mechanism, which allows software developers to define http URI patterns, and map them to controllers, which in turn generate an HTTP response using the appropriate views and models. This activity encourages best practices with respect to Cool URIs, and also forces the developer to think about the resources that she is making available on the Web. Linked Data's focus on naming resources with http URIs, and delivering representations of them (HTML for humans, and RDF for machines) makes it a natural fit for web application frameworks which already provide some of the scaffolding for these activities. The wide availability of web application frameworks in many different programming languages and operating system environments has led to them being heavily used in the cultural heritage sector.

However web developers are sometimes turned off Semantic Web (Linked Data) technologies because they feel like they would need to throw away their current application, to swap their database for a triplestore, and their database query language for SPARQL. This is simply not the case, since RDF serializations can be generated on the fly just as web application frameworks do fo HTML, XML and JSON representations. The use of http URIs to identify and link together resources in RDF's data model make it a natural choice for serializing and sharing entity state in a database neutral way--which has traditionally been of great interest to cultural heritage organizations and the digital preservation community.

Content Management Systems

Related Use Case Cluster: Cluster Social Uses, Cluster Digital Objects, Cluster Archives

Just as web application frameworks have evolved as the Web has spread, so has the class of web applications known as content management systems (CMS). CMSs are often built using a web application framework, but provide out-of-the-box functionality for easily creating/editing/presenting content (text, images, video) on the Web, and for managing workflows associated with the content. Since CMSs are typically built using web frameworks, the same best practices for naming resources with http URIs are naturally followed. The wide availability of content management systems has led to heavy use in the cultural heritage sector. Some content management systems such as Drupal are starting to expose structured database information to machine clients by seamlessly layering it into their HTML using RDFa. As a result, data consumers such as Google Scholar, Google Maps, Facebook, etc. are starting to leverage this structured metadata in their own service offerings. Conversely, Drupal is also starting to make plugins available to consume RDF, such as VARQL and SPARQL Views.

Web Services for library Linked Data

Related Use Case Cluster: Cluster BibData, Cluster Authority data

In theory, most domain-specific Web Service API capabilities could be refactored as Linked Data URIs, OWL, SPARQL, and SPARQL/Update. But even though it should be possible to layer a Linked Data URI front-end on an existing back-end datastore, it may not be so easy for the back-end to support SPARQL and SPARQL/Update access. Security, robustness, and performance considerations could also preclude supporting SPARQL in production situations. Furthermore, SPARQL endpoints and bulk RDF downloads can facilitate discovery and re-use of the published Linked Data greatly. Most Web developers, however, face a steep learning curve before being able to exploit this, and for many application requirements this imposes too heavy a burden.

Web Services for the most common uses should be be offered as an alternative. However, most Web Service APIs tend to be domain-specific, requiring custom-coded agents. This means they should be well-documented. More general approaches to Web Service interfaces include OpenSearch (which can be documented using a Description Document), the Linked Data API and ongoing work of the W3C RDF Web Applications Working Group on RDF and RDFa APIs. Some Linked Datasets could also benefit from syndicated access using the Atom Syndication Format or RSS.

A few Linked Data implementations have endeavored to implement Web Services to enhance discovery and use of resources, often by providing some form of API. For example, AGROVOC and the STW Thesurus for Economics provide APIs for discovering resources based on relationships in the data. VIAF, the ID.LOC.GOV service of the Library of Congress, and STW offer autosuggest services for resources, delivering JSON responses ready for consumption in AJAX browser applications. (In principle, though, JSON reponses could be content-negotiable via the Linked Data URI, as are responses in HTML and RDF.) AGROVOC and STITCH/CATCH include support for RDF responses. Some services provide full-fledged SOAP APIs, while others support a RESTful approach.

By focusing on request parameters and response formats to provide enhanced discovery, Linked Data Web Services diminish, if not eliminate, the requirement that data be stored in a triple store or be made searchable via SPARQL. And, because Web Service APIs are common, Web Services can lower the barrier to entry to adopting a Linked Data approach.

Personal tools