SparqlEndpointDescription

From W3C Wiki

This page is pretty much obsolete, most of what's discussed here is addressed by voiD: http://rdfs.org/ns/void-guide

This page collects ideas and proposals for describing SPARQL endpoints. There are a growing number of SPARQL endpoints and tools that access their data. Endpoint descriptions can be used to announce endpoint capabilities and contents, support discovery through service directories, supply browsing and federation hints.

The initial version of this page was based on discussions between MaxVölkel, BastianQuilitz, DaveBeckett and RichardCyganiak.

Note: OrriErling proposed other list of properties to be described (SparqlEndpointDescription2).

Retrieving endpoint self-descriptions

HTTP GET <endpointurl>

The simplest way to obain an RDF/XML file which includes all meta-data about the "endpointurl". RDF Forms (or something like it) would be a possible format.

This is a proposed method for retrieving a self-description from a SPARQL endpoint. To retrieve an RDF graph describing the endpoint, this SPARQL query is submitted to the endpoint:


DESCRIBE <servicename>


where `<servicename>` is a URI representing the service. In the resulting RDF graph, `<servicename>` represents the endpoint. Clients must be aware that the result triples may or may not be part of the regular dataset that is queried by `SELECT`, `CONSTRUCT` and `ASK` queries.

The service name URI should be the service endpoint URL. In situations where this is not feasible (e.g. the endpoint is accessed locally through a Java API and therefore doesn't have an obvious service URL), we need a SPARQL extension:


SELECT SERVICENAME


The result is a SPARQL result with one binding of one variable:


?servicename
-------------
<servicename>


where `<servicename>` is the URI representing the service. Clients can use this extension to retrieve the service name and then submit a `DESCRIBE` query with this URI as an argument.

@@@ Issue: Capitalization of query and variable?

Pro

  • can be implemented on all existing SPARQL servers that support `DESCRIBE` (except the `SELECT SERVICENAME` part, which is not necessary in the common case)
  • simple implementation
  • easy to remember
  • works regardless of protocol because on query language level

Con

  • uses a data query to retrieve metadata -- violates separation of concerns
  • needs language extension for the `SELECT SERVICENAME` part
  • clients may expect that the description graph's triples are accessible over `SELECT` as well
  • `DESCRIBE` is icky, in part because it doesn't follow the TAG's recommendations on when to use GET; "Use GET if [...] The interaction is more like a question (i.e., it is a safe operation such as a query, read operation, or lookup)"

Design alternatives

  • extend SPARQL protocol: <endpoint_url?meta> (but what about non-HTTP bindings and can't be faked on existing servers)
  • HTTP `MGET` on endpoint URL (extending HTTP is heavyweight, and what about non-HTTP bindings? --RC)
  • `DESCRIBE SERVICE` (nice, but not much nicer than DESCRIBE <url>, and requires extension of QL --RC)
  • `DESCRIBE <this>` where `<this>` is some “magic” URI (but magic URIs cause trouble when processing with normal RDF tools, e.g. merging serveral service descriptions)
  • `DESCRIBE ?x WHERE { ?x rdf:type foo:SparqlEndpoint }` (rather complex, can't mention other endpoints in description)
  • store description in a special named graph
  • don't bother with all this and just HTTP GET the description from some URL that may or may not be managed by the server

Vocabularies for endpoint descriptions

The method described above returns an RDF graph containing a resource that is known to represent the endpoint. This section is a collection of things one could say about an endpoint.

Basic metadata

  • `rdfs:label`, `rdfs:comment`
  • Dublin Core metadata, e.g. `dc:title`, `dc:description`, `dc:creator`, `dc:publisher`, `dc:rights`
  • copyright -- `cc:license`
  • last updated
  • contents -- `saddle:dataSet`
  • signatures, SWP warrants (see SWP vocabulary, PDF)

Endpoint capabilities

  • Kendall's SADDLE stuff
  • level of inference (RDFS, OWL-DL etc)
  • supported SPARQL extension functions: `sl:extensionFunctions`
  • supported non-standard SPARQL extensions (e.g. in Andy Seaborne's extended ARQ query language)
  • features not implemented, e.g. "I can't do OPTIONAL"
  • other supported query languages: `saddle:queryLanguage`
  • supported result formats: `saddle:resultFormat`
  • Using (maybe data-mined) schemas for this purpose: Describing SPARQL source contents

Issue: capabilities of named graphs might differ from each other, e.g. `:graph2` might be `:graph1` plus inference

Browsing / Rendering hints

for generic SPARQL endpoint browsers/visualizers

  • known classes and properties; links to schemas/ontologies -- `owl:imports`, `saddle:vocabulary`
  • Fresnel lenses
  • class/property instance counts
  • property selectivities (low-selectivity properties make better facets)
  • good namespace prefixes
  • human-readable representation of the endpoint's contents -- `saddle:humanInterface`

Query performance and federation

  • what kinds of queries can the endpoint answer quickly?
    • `saddle:vocabulary`
  • selectivities and instance counts
  • query federation stuff by BastianQuilitz -- DARQ service descriptions
  • overall # of triples
  • schedule/frequency of changes (small stores that don't change often can be cached and queried locally)
  • mirrors of this store

See also: WebDescriptionProposals