Difference between revisions of "Draft Relevant Technologies"

From Library Linked Data
Jump to: navigation, search
m
(Using URIs to identify things that aren't located on the Web)
(One intermediate revision by one user not shown)
Line 2: Line 2:
 
[http://www.w3.org/standards/semanticweb/data Linked Data] is an emerging technology, so most tools are still developmental. [http://www.w3.org/DesignIssues/LinkedData.html The principles of Linked Data] are not tied to any particular tool, however, rather they are tied to Web standards themselves. In many situations, production and consumption of Linked Data can be layered or interwoven with existing applications without the need for massive redevelopment efforts. The following tools and technologies are not exhaustive, but are intended to illustrate a few broad categories. From a non-technical perspective, these technologies are relevant because they encourage the creation and discovery of reusable vocabularies and provide ways to combine those terms into reusable (syntactic) statements.
 
[http://www.w3.org/standards/semanticweb/data Linked Data] is an emerging technology, so most tools are still developmental. [http://www.w3.org/DesignIssues/LinkedData.html The principles of Linked Data] are not tied to any particular tool, however, rather they are tied to Web standards themselves. In many situations, production and consumption of Linked Data can be layered or interwoven with existing applications without the need for massive redevelopment efforts. The following tools and technologies are not exhaustive, but are intended to illustrate a few broad categories. From a non-technical perspective, these technologies are relevant because they encourage the creation and discovery of reusable vocabularies and provide ways to combine those terms into reusable (syntactic) statements.
  
== Using HTTP URIs to identify things that aren't located on the Web ==
+
== Using URIs to identify things that aren't located on the Web ==
The [http://www.w3.org/People/Berners-Lee/Weaving/Overview.html initial vision] for the [http://www.w3.org/2001/sw/wiki/Main_Page Semantic Web] was published in 1999 and efforts to standardize it in detail have been incremental. The principles of [http://www.w3.org/DesignIssues/LinkedData.html Linked Data] were introduced around 2006 leading to the formalization of "[http://www.w3.org/TR/cooluris/ Cool URIs for the Semantic Web]" in 2008. What makes Linked Data identifiers special is the ability to identify things that aren't located on the web without confusing them with various descriptions of them that are on the Web using techniquest that helps humans and machines understand and process information across a wide-variety of use cases. The [http://en.wikipedia.org/wiki/DBpedia DBpedia] resource for http://dbpedia.org/resource/Jane_Austen is a good example.  
+
In the early days of the Web, it wasn't clear if http URIs should be used to identify things that aren't "located" one the Web. That uncertainty was the basis for inventing some new URI schemes like [http://tools.ietf.org/html/rfc2141 URNs] and [http://tools.ietf.org/html/rfc4452 "info" URIs]. These ambiguities and uncertainties were eventually resolved by [http://tools.ietf.org/html/rfc3305 RFC 3305] and [http://www.w3.org/2001/tag/issues.html#httpRange-14 httpRange-14]. Now that the concern is resolved, it is reasonable to use [http://www.w3.org/TR/owl-ref/#sameAs-def owl:sameAs] to map these non-resolvable URI schemes to http URIs. Even if this mapping isn't done, the non-resolvable URIs are still useful in RDF and SPARQL.
  
Linked Data is great for diagnosing data and serendipitous discovery, but discrete HTTP GET requests may be impractical for dataset with a large number of individuals. Fortunately, more and more Linked Datasets are being published as [http://www.w3.org/TR/void/#dumps RDF dumps] and consistently described using the [http://www.w3.org/TR/void/ VoID Vocabulary].
+
The principles of [http://www.w3.org/DesignIssues/LinkedData.html Linked Data] were introduced around 2006 leading to the formalization of "[http://www.w3.org/TR/cooluris/ Cool URIs for the Semantic Web]" in 2008. What makes Linked Data identifiers special is the ability to help humans and machines understand and process information across a wide-variety of use cases. The [http://en.wikipedia.org/wiki/DBpedia DBpedia] resource for http://dbpedia.org/resource/Jane_Austen is a good example.
  
 
== Front ends for mapping existing data stores to Linked Data/RDF==
 
== Front ends for mapping existing data stores to Linked Data/RDF==

Revision as of 14:57, 12 August 2011

Relevant Technologies

Linked Data is an emerging technology, so most tools are still developmental. The principles of Linked Data are not tied to any particular tool, however, rather they are tied to Web standards themselves. In many situations, production and consumption of Linked Data can be layered or interwoven with existing applications without the need for massive redevelopment efforts. The following tools and technologies are not exhaustive, but are intended to illustrate a few broad categories. From a non-technical perspective, these technologies are relevant because they encourage the creation and discovery of reusable vocabularies and provide ways to combine those terms into reusable (syntactic) statements.

Using URIs to identify things that aren't located on the Web

In the early days of the Web, it wasn't clear if http URIs should be used to identify things that aren't "located" one the Web. That uncertainty was the basis for inventing some new URI schemes like URNs and "info" URIs. These ambiguities and uncertainties were eventually resolved by RFC 3305 and httpRange-14. Now that the concern is resolved, it is reasonable to use owl:sameAs to map these non-resolvable URI schemes to http URIs. Even if this mapping isn't done, the non-resolvable URIs are still useful in RDF and SPARQL.

The principles of Linked Data were introduced around 2006 leading to the formalization of "Cool URIs for the Semantic Web" in 2008. What makes Linked Data identifiers special is the ability to help humans and machines understand and process information across a wide-variety of use cases. The DBpedia resource for http://dbpedia.org/resource/Jane_Austen is a good example.

Front ends for mapping existing data stores to Linked Data/RDF

Related Use Case Cluster : Cluster VocAlign

Unlike information represented hierarchically in typical XML documents, resources published as Linked Data allow information to be freed from use-case-specific hierarchies and thus available for unexpected reuse. This not only makes the information easier to mash up, it also makes tools and services easier to mash up. This is true for both producers and consumers of Linked Data. For example, an existing relational database can be mounted as Linked Data and SPARQL by using D2R Server. The W3C RDB2RDF Working Group is currently working on standards for such mappings. Similarly, Linked Data can be produced from existing SRU databases with a few rewrite rules. If the information is already available from a SPARQL endpoint, then a Linked Data front-end like Pubby can be used to automate the URIs. Last, but not least, XSLT can be useful for converting generic XML into RDF/XML.

Tools for data designers

Related Use Case Cluster : Cluster VocAlign

Another boost for Linked Data is the growing use of OWL for purposes of data design. Prior to OWL, domain experts could use RDFS to create metadata element sets, but there was no way to map equivalencies across vocabularies. Among other features, OWL includes an upgrade to RDFS to support ontology mapping. This allows experts to describe their domain using community idioms, while still being interoperable with related or more common idioms. A variety of tools related to OWL can be found on the W3C's RDF wiki and OWL wiki. Unified Modeling Language (UML) tools are also value to help designers represent and manipulate domain models visually. The Ontology Definition Metamodel (ODM) specification should help bridge some of the gaps between UML and OWL.

SKOS and related tools

Related Use Case Cluster : Cluster VocAlign

Yet another key technology boost is being provided by SKOS, which is an OWL ontology for dealing with a broad base of conceptual schemes including the management of preferred and alternate labels. Many SKOS-related tools are listed on the W3C's SKOS community wiki.

Microformats, Microdata and RDFa

Related Use Case Cluster : Cluster Social Uses

Microformats, Microdata and RDFa all provide ways to embed structured data into web pages. As historically the emphasis on publishing information on the web has had to do with publishing web pages, these technologies provide ways to enhance what is already there rather than necessarily deploying separate infrastructure. RDFa supports expression of RDF data in this way and is therefore the most directly interoperable with other linked data infrastructure.

Microdata, which is defined with the new HTML5, provides another way of doing this. It has noteably gained prominence for Search Engine Optimisation purposes with the announcement of http://schema.org/ by Google, Microsoft and Yahoo. This particular type of microdata does not appear to be intended to represent arbitrarily complex data and the vocabulary that they have published places special emphasis on commerce and tourism. Though it is in principle extensible it would require a lot of extension to express library information in this way as most of the required vocabulary is lacking. There is some level of interoperability with linked data thanks to the efforts at http://schema.rdfs.org/ but at this time it seems like it would be difficult to cultivate the high level of interconnectedness between library and other datasets that is possible with linked data using this approach.

It should be noted that the http://schema.org/ protagonists do support harvesting of RDFa data and have pledged to continue doing so, therefore it does not appear to be the case that by publishing HTML pages marked up with RDFa one might somehow "miss out" on the opportunities afforded by microdata. Modulo bugs in the search engines' parsers it is even possible to do both in the same web page. If for some reason it is not possible to make use of the full expressive power of RDF with RDFa, some structured data is better than none.

Web Application Frameworks

Related Use Case Cluster : Cluster Archives

As the Web has grown in popularity, the software development community has created a variety of software libraries that make it easier to create, maintain and reuse web applications. These libraries are often referred to as web application frameworks, and typically implement the Model-View-Controller (MVC) pattern in some fashion. In addition web application frameworks have typically encoded and encouraged best practices with respect to the REST Architectural Style and Resource Oriented Architecture which have informed much of the standardization around web technologies.

A common component to web application frameworks is a URI routing mechanism, which allows software developers to define http URI patterns, and map them to controllers, which in turn generate an HTTP response using the appropriate views and models. This activity encourages best practices with respect to Cool URIs, and also forces the developer to think about the resources that she is making available on the Web. Linked Data's focus on naming resources with http URIs, and delivering representations of them (HTML for humans, and RDF for machines) makes it a natural fit for web application frameworks which already provide some of the scaffolding for these activities. The wide availability of web application frameworks in many different programming languages and operating system environments has led to them being heavily used in the cultural heritage sector.

However web developers are sometimes turned off Semantic Web (Linked Data) technologies because they feel like they would need to throw away their current application, to swap their database for a triplestore, and their database query language for SPARQL. This is simply not the case, since RDF serializations can be generated on the fly just as web application frameworks do fo HTML, XML and JSON representations. The use of http URIs to identify and link together resources in RDF's data model make it a natural choice for serializing and sharing entity state in a database neutral way--which has traditionally been of great interest to cultural heritage organizations and the digital preservation community.

Content Management Systems

Related Use Case Cluster : Cluster Social Uses, Cluster Digital Objects, Cluster Archives

Just as web application frameworks have evolved as the Web has spread, so has the class of web applications known as content management systems (CMS). CMS are often built using a web application framework, but provide out-of-the-box functionality for easily creating/editing/presenting content (text, images, video) on the Web, and for managing workflows associated with the content. Since CMS are typically built using web frameworks, the same best practices for naming resources with http URIs are naturally followed. The wide availability of content management systems has led to heavy use in the cultural heritage sector. Some content management systems such as Drupal are starting to expose structured database information to machine clients by seamlessly layering it into their HTML using RDFa. As a result, data consumers such as Google Scholar, Google Maps, Facebook, etc. are starting to leverage this structured metadata in their own service offerings. Conversely, Drupal is also starting to make plugins available to consume RDF, such as VARQL and SPARQL Views.

Web Services for Library Linked Data

Related Use Case Cluster : Cluster BibData, Cluster Authority data

Theoretically, most domain-specific Web Service API capabilities could be refactored as Linked Data URIs, OWL, SPARQL, and SPARQL/Update. But even though it should be possible to layer a Linked Data URI front-end on an existing back-end datastore, it may not be so easy for the back-end to support SPARQL and SPARQL/Update access. Security, robustness and performance considerations could also preclude supporting SPARQL in production situations. Furthermore, SPARQL endpoints and bulk RDF downloads can facilitate discovery and reuse of the published Linked Data greatly. Most web developers however face a steep learning curve before being able to exploit it, and for many application requirements this is too much of a burden.

Web Services for the most common uses should be be offered as an alternative. Most Web Service APIs tend to be domain-specific, though, and require custom-coded agents. This means they should be well-documented. More general approaches to web service interfaces include OpenSearch (which can be documented using a Description Document), the Linked Data API and ongoing work of the W3C RDF Web Applications Working Group on RDF and RDFa APIs. Some Linked Datasets could also benefit from syndicated access using Atom Syndication Format and/or RSS.

A few Linked Data implementations have endeavored to implement Web Services to enhance discovery and use of resources, often by providing some form of an application programming interface (API). Agrovoc and STW provide an API to discover resources based on relationships in the data, among many more web services. VIAF, Library of Congress, and STW offer autosuggest services for resources, delivering JSON responses ready for consumption in AJAX browser applications (In principle, though, JSON could be content-negotiable via the Linked Data URI, just like HTML and RDF.) Agrovoc and STITCH/CATCH include support for RDF responses Some services provide full-fledged SOAP APIs, while others support a RESTful approach.

By focusing on request parameters and response formats to provide enhanced discovery, Linked Data Web Services diminish, if not eliminate, the requirement that data be stored in a triplestore or be made searchable via SPARQL. And, because web service APIs are common, web services can lower the barrier to entry.