Feature:Query by reference
Feature: Query by reference
Extend the SPARQL protocol to allow queries to be referenced by URL rather than being included in the query string.
The query would be found by dereferencing the supplied URL and parsing the results.
Just as an endpoint may decide to not process a query if its dataset is unacceptable (e.g. if query is over arbitrary data and endpoint used a fixed data set), the endpoint might also choose to not process queries of this form. This allows developers to address concerns over the protocol being used as a vector for potential DoS attacks. Caching of retrieved queries would also mitigate this.
The server mentioned by URL SHOULD be cache-frienly and provide all the required logic for current and expiration datetime, ETag etc.
Existing Specification / Documentation
This is backwards compatible.
There are some immediate security concerns:
This feature may be used as a vector for DOS attacks, in two ways. Primarily by attacking the endpoint itsself, eg ?query-uri=http://largefile.example/massive.rq encouraging the endpoint to request a very large file unless it can quickly acertain that it is not an acceptable SPARQL query, posing an implementation burden, and possibly significant bandwidth/hostsing costs.
Also it can be used as a secondary vector on public endpoints implementing this feature - the attacker issuing a small request, resulting in escalation by the SPARQL endpoint host issuing a larger request. For example the attacker could issue 1000 requests to public SPARQL endpoints of the form ?query-uri=http://largefile.example/dvd.img, resulting in a DOS attack on largefile.example, for very little effort on behalf of the attacker.
Additionally to this, SPARQL endpoints are often run from inside corporate firewalls, giving them access to internal data that would not normally be available, for example ?query-uri=http://internal.example/employees.rdf might cause the SPARQL endpoint to emit a syntax error that revealed sensitive data. Alternatively normally firewalled internal services could be invoked, such as ?query-uri=http://internal.example/services/erase-records%34id=*.
Links to postponed Issues
Query by reference was postponed by the DAWG.
Who would be willing to push/draft the specification/advocate for this feature (name of organization and WG members)?
Originally suggested by Leigh Dodds in a message of 2009-Mar-06.