comments on SPARQL 1.1 Query Last Call WD: builtin functions

Hello WG,

I have a couple of comments on the current Last Call working draft for 
SPARQL 1.1 query, mainly about the set of built-in functions (section 
17.4). These comments came about as part of me implementing these 
functions in Sesame's SPARQL processor.

1. String functions

The current set of built-in functions on strings seems rather 
arbitrarily chosen, with little evident use case requirements backing 
them up.

For example, while both fn:string-length and fn:substring are included, 
fn:substring-before and fn:substring-after are not, nor is there any 
form of 'indexOf'-function. This makes it currently not possible in 
SPARQL to determine the substring of a string based on a character match.

My comment is not that these functions should or should not be included 
per se, but rather a question: what criteria did the WG use to decide 
which functions 'make the cut'?

2. Hash functions

Perhaps my strongest problem with the current Working Draft is the 
inclusion of 6 variations for calculating a hash. Arguably calculating a 
hash is a _very_ outlying use case that comes up rarely in practical 
applications of SPARQL. I'm not denying there are valid use cases for 
it, but adding six different varieties seems, frankly, outlandish.

There is a practical consideration for me in this as well: on the Java 
platform, SHA-224 in particular is not supported by the default 
cryptography architecture. The fact that SPARQL includes it forces me to 
add a third-party dependency to my SPARQL implementation for a feature 
that very few users will ever need. I find this wasteful and an 
unncessary burden, both on implementors and on users of the software.

Given that the SPARQL specification supports the adding of custom 
functions, so that any vendor who needs to can extend the language, I 
would suggest that this kind of niche functionality has no place in the 
core spec and should be removed, or at the very least only a minimal set 
of hash functions (2 or 3, tops) should be required. In picking this 
subset, the WG should IMHO consider which algorithms are most commonly 
used and supported on various platforms.

Regards,

Jeen Broekstra

Received on Tuesday, 2 August 2011 00:24:45 UTC