Draft Relevant Technologies

From Library Linked Data
Jump to: navigation, search

Appendix B: Relevant Technologies

Linked Data is an emerging technology, so most tools are still in development. The principles of Linked Data are not tied to any particular tool; rather, they are tied directly to Web standards. In many situations, the production and consumption of Linked Data can be layered or interwoven with existing applications without requiring massive redevelopment efforts. This list of tools and technologies is not exhaustive, but are intended to illustrate a few broad categories. From a non-technical perspective, these technologies are relevant because they encourage the creation and discovery of reusable vocabularies and provide ways to combine those terms into reusable (syntactic) statements.

Using URIs to identify things not actually located on the Web

In the early days of the Web, it was unclear whether "HTTP URIs" (also known as "URLs") should be used to identify things that are not "located" on the Web. That concern was the basis for defining new URI schemes such as URNs and "info" URIs. These uncertainties were eventually resolved by a report from the W3C Uniform Resource Identifier Interest Group (RFC 3305) and a resolution of the W3C Technical Advisory Group on the issue known as "httpRange-14". In the Linked Data paradigm, it is generally expected that HTTP URIs will also be used to identify "real world objects." Nevertheless, many applications have been built on the other identifier schemes. Using the owl:sameAs property is a good way to map these non-resolvable URI schemes to their HTTP URI equivalents. Even if this mapping is not done, non-resolvable URIs are still useful in RDF and SPARQL.

Discrete and bulk access to information

The principles of Linked Data were introduced circa 2006, leading to a formalized notion of "Cool URIs" in 2008. What makes Linked Data identifiers special is the ability to help humans and machines understand, process, and link information across a wide range of use cases; the DBpedia resource for (http://dbpedia.org/resource/Jane_Austen Jane Austen) is a good example. Resolvable URIs are great for casual use, for diagnosing data, and for serendipitous discovery, but discrete HTTP GET requests may be impractical for datasets with a large numbers of individuals. Fortunately, linked datasets are increasinly being published as RDF dumps and consistently described using the VoID Vocabulary.

Front ends for mapping existing data stores to Linked Data and RDF

Related Use Case Cluster: Cluster VocAlign

Unlike information represented hierarchically in typical XML documents, resources published as Linked Data allow information to be freed from use-case-specific hierarchies and thus available for unexpected reuse. This not only makes the information easier to mash up, it also makes tools and services easier to mash up. This is true for both producers and consumers of Linked Data. For example, an existing relational database can be mounted as Linked Data and SPARQL by using D2R Server. The W3C RDB2RDF Working Group is currently working on standards for such mappings. Similarly, Linked Data can be produced from existing SRU databases with a few rewrite rules. If the resources are already described from a SPARQL endpoint, then a Linked Data front end such as Pubby can be used to automate the content-negotiable Cool URI behavior for each individual. XSLT (Extensible Stylesheet Language Transformations) can be useful for converting generic XML into RDF/XML.

Tools for data designers

Related Use Case Cluster: Cluster VocAlign

Application profiles provide a popular way to document how a community of practice defines a domain model and a pattern for re-using particular vocabularies with particular constraints in describing particular types of resources. The current version of OWL Web Ontology Language, which provides properties to represent alignments across vocabularies (ontology mappings), allows experts to describe their domain using community idioms while remaining interoperable with related or more common idioms. A variety of tools related to OWL can be found on the W3C's RDF wiki and OWL wiki. Unified Modeling Language (UML) tools help designers represent and manipulate domain models visually. The Ontology Definition Metamodel (ODM) specification should help bridge some of the gaps between UML and OWL.

SKOS and related tools

Related Use Case Cluster: Cluster VocAlign

Yet another key technology boost is being provided by the Simple Knowledge Organization System (SKOS), which is an OWL ontology for expressing a broad range of concept schemes, with support for preferred and alternative labels. Many SKOS-related tools are listed on the W3C's SKOS community wiki.

Microformats, Microdata, and RDFa

Related Use Case Cluster: Cluster Social Uses

Microformats, Microdata, and RDFa all provide ways to embed structured data into Web pages. As historically the emphasis on publishing information on the Web has meant publishing Web pages, these technologies provide ways to enhance what is already there rather than necessarily deploying additional infrastructure. RDFa supports the expression of RDF data embedded directly in Web pages; of the three, therefore, it is the most directly interoperable with other Linked Data infrastructure.

Microdata, which is defined in new HTML5 specification under development, provides another way of doing this. Microdata has notably gained prominence for Search Engine Optimization purposes with the announcement of Schema.org by Google, Microsoft, and Yahoo. This particular type of microdata does not appear to be intended to represent arbitrarily complex data, and the vocabulary that they have published places special emphasis on commerce and tourism. Although in principle they are extensible, microdata schemes would need to be heavily extended in order to express library information since most of the required vocabulary is lacking. There is some level of interoperability with Linked Data thanks to the efforts of Schema.RDFS.org, but it currently seems like it would be difficult, using this approach, to cultivate the high level of interconnectedness between library and other datasets that is possible with Linked Data.

It should be noted that the Schema.org protagonists do support harvesting of RDFa data and have pledged to continue doing so, so it does not appear to be the case that by publishing HTML pages marked up with RDFa one might somehow "miss out" on the opportunities afforded by microdata. Excluding bugs in the search engines' parsers, it should even be possible to do both in the same Web page. Ultimately, the conclusion is that some structured data is better than none.

Web Application Frameworks

Related Use Case Cluster: Cluster Archives

As the Web has grown in popularity, the software development community has created a variety of software libraries that make it easier to create, maintain, and re-use Web applications. These libraries are often referred to as Web application frameworks, and typically implement the Model-View-Controller (MVC) pattern in some fashion. In addition, Web application frameworks have typically encoded and encouraged best practices with respect to the Representational State Transfer (REST) Architectural Style and Resource Oriented Architecture which have informed much of the standardization around Web technologies.

A common component to Web application frameworks is a URI routing mechanism that allows software developers to define HTTP URI patterns and map them to controllers which, in turn, generate an HTTP response using the appropriate views and models. This activity encourages best practices with respect to Cool URIs and also forces developers to think about the resources that they are making available on the Web. Linked Data's focus on naming resources with HTTP URIs, and on delivering representations of those resources -- in HTML for humans and RDF for machines -- makes it a natural fit for Web application frameworks, which already provide some of the scaffolding for these activities. The wide availability of Web application frameworks in many different programming languages and operating system environments has led to their wide use in the cultural heritage sector.

Web developers are sometimes turned off by Semantic Web (Linked Data) technologies because they feel compelled to throw away their current applications, swap their databases for triple stores and their database query languages for SPARQL. This is simply not the case, as RDF serializations can be generated on-the-fly just as Web application frameworks do for HTML, XML, and JSON representations. The use of HTTP URIs to identify and link together resources using the RDF data model make it a natural choice for serializing and sharing entity state in a database-neutral way -- a goal traditionally of great interest to cultural heritage organizations and the digital preservation community.

Content Management Systems

Related Use Case Cluster: Cluster Social Uses, Cluster Digital Objects, Cluster Archives

Just as Web application frameworks have evolved with the spread of the Web, so has the class of Web applications known as Content Management Systems (CMS). CMSs are often built using a Web application framework but provide out-of-the-box functionality for easily creating, editing, and presenting content such as text, images, and video on the Web, and for managing workflows associated with the content. Since CMSs are typically built using Web frameworks, the same best practices for naming resources with HTTP URIs are naturally followed. The wide availability of Content Management Systems has led to their heavy use in the cultural heritage sector. Some content management systems such as Drupal are starting to expose structured database information to machine clients by seamlessly layering it into their HTML using RDFa. Data consumers such as Google Scholar, Google Maps, and Facebook are starting to leverage this structured metadata in their own service offerings. Conversely, Drupal is also starting to provide plug-ins for consuming RDF, such as VARQL and SPARQL Views.

Web Services for library Linked Data

Related Use Case Cluster: Cluster BibData, Cluster Authority data

In theory, most domain-specific Web Service API capabilities could be refactored as Linked Data URIs, OWL, SPARQL, and SPARQL/Update. But even though it should be possible to layer a Linked Data URI front-end on an existing back-end datastore, it may not be so easy for the back-end to support SPARQL and SPARQL/Update access. Security, robustness, and performance considerations could also preclude supporting SPARQL in production situations. Furthermore, SPARQL endpoints and bulk RDF downloads can facilitate discovery and re-use of the published Linked Data greatly. Most Web developers, however, face a steep learning curve before being able to exploit this, and for many application requirements this imposes too heavy a burden.

Web Services for the most common uses should be be offered as an alternative. However, most Web Service APIs tend to be domain-specific, requiring custom-coded agents. This means they should be well-documented. More general approaches to Web Service interfaces include OpenSearch (which can be documented using a Description Document), the Linked Data API and ongoing work of the W3C RDF Web Applications Working Group on RDF and RDFa APIs. Some Linked Datasets could also benefit from syndicated access using the Atom Syndication Format or RSS.

A few Linked Data implementations have endeavored to implement Web Services to enhance discovery and use of resources, often by providing some form of API. For example, AGROVOC and the STW Thesurus for Economics provide APIs for discovering resources based on relationships in the data. VIAF, the ID.LOC.GOV service of the Library of Congress, and STW offer autosuggest services for resources, delivering JSON responses ready for consumption in AJAX browser applications. (In principle, though, JSON reponses could be content-negotiable via the Linked Data URI, as are responses in HTML and RDF.) AGROVOC and STITCH/CATCH include support for RDF responses. Some services provide full-fledged SOAP APIs, while others support a RESTful approach.

By focusing on request parameters and response formats to provide enhanced discovery, Linked Data Web Services diminish, if not eliminate, the requirement that data be stored in a triple store or be made searchable via SPARQL. And, because Web Service APIs are common, Web Services can lower the barrier to entry to adopting a Linked Data approach.