Re: "Simple Lists" (was Re: ISSUE-77: Should we mark rdf:Seq as archaic (cf ISSUE-24)) from Andy Seaborne on 2011-10-17 (public-rdf-wg@w3.org from October 2011)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Mon, 17 Oct 2011 11:47:43 +0100
To: public-rdf-wg@w3.org
Message-ID: <4E9C07CF.5010901@epimorphics.com>
> Thinking about it today, I wonder if we can define a "Simple List" (or
> "Proper List", or something) as a list that can be losslessly
> transmitted via turtle's (...) syntax.  That means its structure is
> b-nodes, with no extra arcs, etc.   (Interestingly, it also means you
> can't include rdf:type rdf:List arcs....)   Then, we encourage tools to
> read/write simple lists, and to work with them efficiently.

> Simple lists have the advantage over Seq, I believe, that in the face of
> truth-preserving RDF operations (subsetting, merging, various sound
> inferences), they never produce wrong data.

Seq from bNodes similarly.

> In the worst case, they no
> longer provide data -- the simple list is mangled in some way -- but
> it'll never just tell you the wrong thing.   I think this is a big win.
>
>> Ivan:
>>> But it is a bit of a problem that SPARQL 1.1 still does not cover list handling fully:-(
>>
>> SPARQL 1.2 will not solve anything I'm afraid.  SPARQL 1.1 Query has
>> gone as far as it can, except maybe a little extra syntactic sugar with
>>
>> { ?list rdf:rest*/rdf:first ?member }
>>
>> It's much better than handling Seqs.
>
> I'm trying to brainstorm ways to shoe-horn list handling into SPARQL.  I
> don't know if there's any elegant way, but maybe there's a hack that's
> not too bad.

http://openjena.org/ARQ/rdf_lists.html
http://openjena.org/ARQ/library-propfunc.html

> One approach is to update the results format to allow lists of values
> where it currently allows single values.And then offer some way to
> signal you want Simple Lists to be returned as list values instead of
> b-nodes.    One way to do that would be a LISTRESULT function that takes
> a simple list's starting bnode and returns something that the results
> format serializes as a list.  Essentially, it's just a way of saying you
> want a list result here.
>
> So...
>
>    SELECT ?x ?y LISTRESULT(?z) WHERE...

(Don't forget subqueries - this puts lists into the SPARQL data model as 
first-class items and I think doing it in SPARQL, and not RDF, is not 
going to work)

There are several aspects of lists:

1/ Accessing lists
2/ Returning lists
3/ Naming lists
4/ Manipulating lists (in SPARQL Update)

(much of the on-the-ground driver for skolemization is being able to 
deal with lists)

It might be possible address some of these without needing fundamental 
respec'ing.

> would require ?z to be bound to a simple list and would pack the list
> elements into the result format in a manner specific to that results
> format (XML, JSON, etc).

There is a trap here - it's a relatively simple change to put lists into 
the results formats, but it then affects every consumer of results. 
Small(ish) spec change, big consequences.

> Other, normal list builtins like SUBLIST(list, startpos, stoppos) could
> be used to make sure the size of the returned list is manageable.
>
> Another approach, perhaps, would be some kind of dis-aggregator, a pair
> of builtins that work together to make a list appear like many different
> results:
>
> Data:
>     eg:Alice eg:likes ( eg:Bob eg:Charlie ).
>
> Query:
>     SELECT ?who ITERINDEX(?list) ITERVALUE(?list)
>     WHERE ?who eg:likes ?list.
>
> would return results:
>      eg:Alice 0 eg:Bob
>      eg:Alice 1 eg:Charlie
>
> although not necessarily in that order.

See property functions above.

By the way - see GROUP_CONCAT as an example of wanting to construct 
lists as results.

And F&O sequences.

> That's pretty messy to spec and implement, but might be pretty nice to
> use.
>
>
>> SPARQL Update can manuipuate lists but it's ugly:
>> http://lists.w3.org/Archives/Public/public-rdf-dawg/2011JanMar/0389.html
>>
>>
>> The fundamental problem in SPARQL is that any order is lost; so this
>> list access works for some cases, where the order does not matter.
>>
>> Even if a special order preserving construct were available, order is
>> lost in the rest of the query.  An order-presering QL would not be
>> SPARQL 1.2 - it would be have completely different basis, (e.g. no
>> chance of implementing use hash joins), would be very hard to have
>> parallel implementation (see "big data" graph languages), and still does
>> not work when two ordered different subresults need combining.
>>
>> Fundamentally, there are two problems:
>>
>> 1/ Encoding in triples
>> 2/ Lists aren't the only datastructure.
>>
>> Reification, containers and collections encode data structure in triples
>> but if the app can see "triples" then this leaks through to the
>> application.  It also means there can be the possibility of 'bad' data
>> as Jeremy says.  Seeing the triples is confusing at best.
>
> Yes, seeing the triples is a problem, but I'm hoping it's not that bad,
> and that mostly people pull what they want out of a graph and ignore the
> rest.
>
>> The structure we have may not say what you want:
>>     List(1 2 3) != List(3 2 1)
>> but if a list is being used to express an unordered collection, a higher
>> level convention has to be communicated.
>>
>> I think the only complete solution will involve putting structural
>> literals into RDF itself, so they are not triple-encoded and can't be
>> 'bad'.  When treated as first-class literals with equality rules,
>> accessors, and combining rules, then implementations can store them
>> specially, provide good APIs, and application programmer won't have to
>> learn about the encoding rules.
>
> That sounds pretty hard.  Do you have some design in mind...?

RDF 2.  Not this WG.

Add "list" to  IRI,BNode and literal
Or subtype of literal but as it has it's own syntax etc it feels different.

IF this is to advance, I think it needs serious scoping and 
investigation with all the stakeholders involved.  RDF-WG isn't that 
place either by our current timeline, nor by the constituency of people 
involved.

An XG perhaps?

You could add into the change mix adding graph literals to RDF 2+ -- all 
the changes need to be considered, weighted, and prioritized.

----------------------------
@prefix op: <http://www.w3.org/2005/xpath-functions#> .

(op:numeric-add 2 3) .
----------------------------

>
>       - Sandro

 Andy
Received on Monday, 17 October 2011 10:48:26 UTC