Feature:ServiceDescriptions

From SPARQL Working Group
Jump to: navigation, search


Feature: Service Description

Given the variety of SPARQL implementations, and differences in datasets and extension functions, a method of discovering a SPARQL endpoint's capabilities and summary information of its data is needed.

Feature description

Many SPARQL implementations support a variety of SPARQL extensions (many proposed here for standardization), extension functions (for use in FILTERs), and different entailment regimes. Moreover, the differences in datasets provided by SPARQL endpoints is often hard to grasp without some existing knowledge of the underlying data. This proposal suggests that these differences may be described by the endpoints themselves, detailing both the capabilities of the endpoint and the data.

Such a service description mechanism might be implemented in several ways. Four options are:

  • Introducing new SPARQL syntax for requesting the description
  • Adding this information to a SPARQL protocol query response (e.g. in an HTTP header field)
  • Adding a new protocol operation (e.g. HTTP OPTIONS) for returning the description
  • Having this information available as a named graph (where the graph name is related or equal to the endpoint's URI)

Some servers may support SPARQL requests via protocols other than HTTP, notably ODBC, UDBC, IODBC and JDBC. Service descriptions are useful even for that "closed" environments, improvind interoperability and reducing risk of misconfiguration. While these protocols are out of scope of the spec, a non-normative part may recommend a uniform way of requesting service escriptions, such as a recommended name and the signature of a procedure to call for fetching the description.

Additionally, SPARQL endpoint descriptions can be collected or served by third-party services such as search engines or repositories.

Example

RDF::Query (and its related endpoint code) provides service descriptions (based primarily on the DARQ and SADDLE vocabularies) referenced in the HTTP response headers of a query:

% curl -H "Accept: text/plain" -i http://kasei.us/sparql -F "query=ASK {}"
HTTP/1.1 200 OK
mod_perl/2.0.2 Perl/v5.8.8
X-endpoint-description: http://kasei.us/sparql?about=1
Content-Type: text/plain; charset=utf-8

<?xml version="1.0"?>
<sparql xmlns="http://www.w3.org/2005/sparql-results#">
<head></head>
<results><boolean>true</boolean></results>
</sparql>

With http://kasei.us/sparql?about=1 returning the service description RDF:

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sd: <http://darq.sf.net/dose/0.1#> .
@prefix saddle: <http://www.w3.org/2005/03/saddle/#> .
@prefix sparql: <http://kasei.us/2008/04/sparql#> .
@prefix void: <http://rdfs.org/ns/void#> .
[] a sd:Service ;
	rdfs:label "SPARQL Endpoint for kasei.us" ;
	sd:url <http://kasei.us/sparql> ;
	sd:totalTriples 12729 ;
	saddle:queryLanguage [ rdfs:label "SPARQL" ; saddle:spec <http://www.w3.org/TR/rdf-sparql-query/> ] ;
	saddle:queryLanguage [ rdfs:label "RDQL" ; saddle:spec <http://www.w3.org/Submission/RDQL/> ] ;
	saddle:resultFormat [
		rdfs:label "SPARQL Query Results XML" ;
		saddle:mediaType "application/sparql-results+xml" ;
		saddle:spec <http://www.w3.org/TR/rdf-sparql-XMLres/>
	] ;
	saddle:resultFormat [
		rdfs:label "RDF/XML" ;
		saddle:mediaType "application/rdf+xml" ;
		saddle:spec <http://www.w3.org/TR/rdf-syntax/>
	] ;
	saddle:resultFormat [
		rdfs:label "SPARQL Query Results JSON" ;
		saddle:mediaType "application/sparql-results+json" ;
		saddle:spec <http://www.w3.org/TR/rdf-sparql-json-res/>
	] ;

	sparql:extensionFunction <java:com.hp.hpl.jena.query.function.library.sha1sum> ;
	sparql:extensionFunction <java:com.ldodds.sparql.Distance> ;

	sparql:sparqlExtension <http://kasei.us/2008/04/sparql-extension/service> ;
	sparql:sparqlExtension <http://kasei.us/2008/04/sparql-extension/unsaid> ;
	sparql:sparqlExtension <http://kasei.us/2008/04/sparql-extension/federate_bindings> ;

Existing Implementation(s)

As mentioned, RDF::Query (and associated endpoint code) implement this.

Virtuoso has support for DBPedia VoiD data using a special default-graph of <http://dbpedia.org/stats/void#>, which can be queried using the SPARQL endpoint. Virtuoso also can provede access to multiple "storages" via same web service endpoint: "storage" is a named set of named RDF data sources; each RDF data source is either "traditional" Quad Store or an RDF View. One of storages can be used for self-description of the database without any technical difficulties. Right now the configuration of RDF storage is kept as an RDF graph and configuration of storages is made by SPARUL operations over that graph; that graph could be extended to keep any additional data for the service description.

Garlik's JXT implements something similar to this feature, differences from as described are, the MIME header is "X-Endpoint-Description:" and the URI given is relative to the endpoint: /description. It is desirable to make the description relative as the server does not neccesarily know it's routable address from the p.o.v. of the HTTP client.

Strawman Proposals for Requesting Service Descriptions

HTTP OPTIONS: From RFC 2616:

The OPTIONS method represents a request for information about the communication options available on the request/response chain identified by the Request-URI. This method allows the client to determine the options and/or requirements associated with a resource, or the capabilities of a server, without implying a resource action or initiating a resource retrieval.

OPTIONS can return response content, and this is where the SPARQL endpoint could return its description.

Strawman Proposal for Service Description Vocabulary and URIs

The working group should define a vocabulary and instance URIs for the following:

  • URIs for the SPARQL 1.1 language and its subsets (query, update, safe, ...).
  • An rdfs:Class for SPARQL endpoints (the DARQ vocabulary defines sd:Service for this purpose).
  • URIs for built-in functions (both filter/project and aggregate functions) that lack existing URIs.
  • URIs for supported entailment regimes.
  • A rdf:Property relating an endpoint to its supported entailment regime(s).
  • A rdf:Property for supported extension functions (both filter/project and aggregate functions). We may want two different properties to distinguish between aggregate and non-aggregate functions.
  • A rdf:Property for supported extensions to SPARQL itself.
  • A rdf:Property for relating an endpoint to a description of its data. This property will be agnostic as to which vocabulary is used to describe the data, but we should work with those currently working on VoiD 2.0 to ensure compatability as it is an obvious choice for this.

Several other properties would be nice to have, but more discussion would be necessary before seriously considering these:

  • saddle:resultFormat declares acceptable mime-types that query results can be returned in.
  • saddle:queryLanguage declares query languages the endpoint supports. This might fit in with the idea of subsets of SPARQL 1.1 (query, update, safe, ...), having been previously used to indicate support for, e.g., both SPARQL and RDQL (less relevant these days).
  • A rdf:Property to declare supported background rules. This was mentioned in notes from DAWG F2F4, but it is unclear whether there is enough existing work to support such a property.
  • A rdf:Property to mark an endpoint as definitive. Again, mentioned in the DAWG F2F4 notes, but unclear usefulness w.r.t. granularity of the definitiveness.

Proposed service description schema:

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix sd: <http://www.w3.org/2009/sparql/service-description#> .

sd:Language a rdfs:Class .

sd:sparql11 a sd:Language .

sd:Endpoint a rdfs:Class ;
	dc:description "The class of SPARQL endpoints" .

sd:url a rdf:Property ;
	rdfs:domain sd:Endpoint ;
	dc:description "Relates a SPARQL endpoint to the URL used to access it with the SPARQL Protocol." .

sd:Function a rdfs:Class ;
	dc:description "Extension functions that may be called from a SPARQL query." .

sd:ScalarFunction a rdfs:Class ;
	rdfs:subClassOf sd:Function ;
	dc:description "Functions that return a scalar RDF term." .

sd:AggregateFunction a rdfs:Class ;
	rdfs:subClassOf sd:Function ;
	dc:description "Functions that may be used as aggregators." .

sd:EntailmentRegime a rdfs:Class ;
	dc:description "The class of entailment regimes that may be supported by an endpoint." .

sd:supportedEntailment a rdf:Property ;
	rdfs:domain sd:Endpoint ;
	rdfs:range sd:EntailmentRegime ;
	dc:description "Relates a SPARQL endpoint to the entailment regime used for BGP matching." .

sd:extensionFunction a rdf:Property ;
	rdfs:domain sd:Endpoint ;
	rdfs:range sd:Function ;
	dc:description "Indicates that a SPARQL endpoint supports a specific extension function." .

sd:languageExtension a rdf:Property ;
	rdfs:domain sd:Endpoint ;
	dc:description "Indicates that a SPARQL endpoint supports a specific language extension." .

### @@ should we have a property for linking an endpoint to a description of a dataset, or a property that can be used several times to describe a (possibly named) graph?
sd:datasetDescription a rdf:Property ;
	rdfs:domain sd:Endpoint ;
	dc:description "Relates a SPARQL endpoint to a description of the endpoint's RDF dataset." .

Existing Specification / Documentation

Not aware of any documentation on SPARQL syntax extensions.

Relevant vocabularies:

Compatibility

Introduction of a new SPARQL query form (with a new keyword) would not introduce any incompatability with existing SPARQL queries.

The use of an out-of-band method for conveying the service description (such as HTTP headers in a SPARQL protocol response) would not interfere with existing queries.

The use of special named graphs for service description data might cause incompatibilities where the graph name is already in use as the name of a non-service-description RDF graph. The use of the same URI to name a graph that has different content in different datasets is at odds with global naming by URI. (c.f. use of host name "localhost").

Links to postponed Issues

The serviceDescription issue was previously postponed by the DAWG:

"whereas the serviceDescription designs aren't maturing in the timescale of the current schedule, and implementation experience is somewhat thin, RESOLVED to postpone serviceDescriptions"

Related Features

Support for service descriptions may be useful in the presence of Feature:BasicFederatedQuery, allowing a query optimizer to make use of knowledge of the remote endpoint's supported features and/or available data. In addition, service description may contain logical identifier of the service and provide two distinct sets of logical identifiers of available data sets: one for locally stored data sets and one for remote data sets (so the service acts as a proxy for remotes). If proxies will sign requests they forward by their logical names then it will be possible to detect configuration errors such as ring of proxies that infinitely send requests to each other.

Service descriptions would benefit Feature:ReturnFormatKeyword by allowing the discovery of supported result formats.

The Service description may also report some basic facts about cost model and supported pragmas.

Champions

  • Greg Williams (Rensselaer Polytechnic Institute)
  • Ivan Mikhailov / OpenLink seconds if 1) the feature leave SPARQL syntax unchanged, 2) no special graph names are reserved for the feature and 3) in case of HTTP, service description requests differ from plain SPARQL requests only in &key=value pairs and/or HTTP header lines, but not require an additional web-page.

Use cases

A description of one or more use cases, the solution of which requires this feature. Multiple use cases can be added to each feature.

References