LLOD aware services
Introduction
This document recommends best practices for building Linguistic Linked Open Data (LLOD) aware web services. LLOD denotes the representation of linguistic resources in accordance with linked data principles. These principles include that entities are to be identified by HTTP URIs providing RDF-based information about the entity including links to related entities. LLOD-aware services consume, process and produce such resources.
LLOD-aware services are services that consume resources available as Linked Data as Input and output an RDF resource that can in turn be published as Linked Data.
Use Case
Assume that we want to develop a service that consumes two terminologies that have been published as linguistic linked data using the guidelines published at []. The services is invoked by pointing it to two resources and is expected to return a set of links between the two terminological resources indicating which terminological concepts of the two terminologies should be regarded as equivalent. In our example, we will assume that the two terminological resources to be linked are
- The Interactive Terminology of Europe (IATE), available at DataHub as RDF
- The European Migration Network (EMN) terminology, also available at DataHub as RDF
Recommendations
We recommend building LLOD-aware web services using RESTful interfaces. Each resource provided by a REST-service is given its own URI. HTTP methods operate on resources in the following way:
- POST asks the service to create/index a new resource available as Linked Data by pointing the service to its DataHub entry for instance
- PUT asks the service to update the index with the new version of a resource
- GET asks to deliver the results of the algorithm implemented by the algorithm
We assume that each LLOD-aware service has a certain persistence layer in which snapshots of LD resources can be stored. This is because downloading resource just-in-time when the algorithm is actually invoked using a GET-method can take too long. Therefore, it is recommended to download and update relevant LD datasets asynchronously. There are multiple possible ways of providing RDF data to a service. For now, we assume that there is RDF-based metadata containing the location of an RDF dump in a dcat:accessURL field. This metadata should be the input to the service.
Results of the service should be returned in RDF/XML, Turtle-RDF or JSON-LD. Ideally, all three return types would be supported through content negotiation.
An example service for linking terminological resources
Our example service that we have implemented as a proof-of-concept to illustrate the best practice described here induces matches between two terminological resources in RDF and creates links between the associated concepts.
Terminologies are expected to use the lemon vocabulary for representing lexical resources in RDF. Comparison of entries is performed based on the entries' lemon:writtenRep property. The concepts associated with entries as lemon:reference are regarded as equivalent if the concepts' written representations match. A linking between the concepts is then created using skos:exactMatch.
Three basic operations are supported:
- Adding a new resource to the database
- Updating a resource in the database
- Retrieving links between resources in the database
We recommend building web services using a RESTful HTTP interface. The interface for our linking service is described in the following.
Note: Support for adding and updating datasets is deactivated in out implementation. Datasets currently available for linking are IATE and EMN.
Adding a new resource to the database
HTTP request
A new resource is added by sending a HTTP POST request to
http://sc-lider.techfak.uni-bielefeld.de/LinkingWebService/resource/{resourceURL}
- {resourceURL} needs to point to RDF metadata giving the resource location as dcat:accessURL.
HTTP response
- The service returns status code 202 for a valid request.
- If the database already contains a resource for the given url, the service returns status code 400.
- If no valid resource could be found on the given url, the service returns status code 422.
Example Call
The call to ask the service to index the IATE dataset would be as follows:
Correspondingly, the call to ask the service to index the EMN dataset would be as follows:
POST http://sc-lider.techfak.uni-bielefeld.de/LinkingWebService/resource/http://datahub.io/dataset/emn
Note: Support for this operation is deactivated.
Updating a resource in the database
HTTP request
An existing resource is updated by sending a HTTP PUT request to
http://sc-lider.techfak.uni-bielefeld.de/LinkingWebService/resource/{resourceURL}
- {resourceURL} needs to point to RDF metadata giving the resource location as dcat:accessURL.
HTTP response
- The service returns status code 202 for a valid request.
- If no valid resource could be found on the given url, the service returns status code 422.
Example call
The call to ask the service to update the current version of the IATE dataset in the index is as follows:
PUT http://sc-lider.techfak.uni-bielefeld.de/LinkingWebService/resource/http://datahub.io/dataset/iate-rdf
Note: Support for this operation is deactivated.
Retrieving links between resources in the database
HTTP request
A linking for all concepts in a dataset is retrieved by sending a HTTP GET request to
http://sc-lider.techfak.uni-bielefeld.de/LinkingWebService/linking/dataset/{sourceURL}?target={targetURL}
A linking for a single concept is retrieved by sending a HTTP GET request to
http://sc-lider.techfak.uni-bielefeld.de/LinkingWebService/linking/concept/{sourceURL}?target={targetURL}
- The optional parameter target can be specified in order to retrieve links to a single target resource. Otherwise, the entire database will be searched for matches.
- The RDF serialization format can be specified by setting the request's Accept header.
HTTP response
- The set of generated will be returned in an N-Triples file or any other RDF serialization format if specified by the request.
- If the given resource wasn't found in the index, the service returns status code 404.
Example Call
The call to retrieve all the links between IATE and EMN would be as follows:
GET http://sc-lider.techfak.uni-bielefeld.de/LinkingWebService/linking/dataset/http://datahub.io/dataset/emn?target=http://datahub.io/dataset/iate-rdf
And would return as result:
http://ec.europa.eu/dgs/home-affairs/what-we-do/networks/european_migration_network/glossary/index_a_en.htm#absconding http://www.w3.org/2004/02/skos/core#exactMatch http://tbx2rdf.lider-project.eu/data/iate/IATE-3544259 . http://ec.europa.eu/dgs/home-affairs/what-we-do/networks/european_migration_network/glossary/index_a_en.htm#accommodationcentre http://www.w3.org/2004/02/skos/core#exactMatch http://tbx2rdf.lider-project.eu/data/iate/IATE-878245 . http://ec.europa.eu/dgs/home-affairs/what-we-do/networks/european_migration_network/glossary/index_a_en.htm#acquisitionofcitizenship http://www.w3.org/2004/02/skos/core#exactMatch http://tbx2rdf.lider-project.eu/data/iate/IATE-3549121 . http://ec.europa.eu/dgs/home-affairs/what-we-do/networks/european_migration_network/glossary/index_a_en.htm#actofpersecution http://www.w3.org/2004/02/skos/core#exactMatch http://tbx2rdf.lider-project.eu/data/iate/IATE-3549123 . http://ec.europa.eu/dgs/home-affairs/what-we-do/networks/european_migration_network/glossary/index_a_en.htm#actorofprotection http://www.w3.org/2004/02/skos/core#exactMatch http://tbx2rdf.lider-project.eu/data/iate/IATE-3549124 . …
Note: N-Triples is the only output format currently supported by our implementation.