Re: "Simple Lists" (was Re: ISSUE-77: Should we mark rdf:Seq as archaic (cf ISSUE-24))

On Sun, 2011-10-16 at 19:13 +0100, Steve Harris wrote:
> On 15 Oct 2011, at 23:35, Sandro Hawke wrote:
> 
> > On Sat, 2011-10-15 at 19:35 +0100, Andy Seaborne wrote:
> >> 
> >> On 15/10/11 19:09, Jeremy Carroll wrote:
> >>> 
> >>> I think both the Seq and the List constructs present technical issues.
> >>> Basically it is because both present the possibility of 'bad' data and
> >>> no clarity about what one should do in the face of it.
> >> 
> >> +1
> >> 
> >>> We can easily form ill-formed lists with rdf:first or rdf:rest either
> >>> missing or multiple.
> >>> We can easily form ill-formed sequences with duplicate or missing rdf:_2
> >> 
> >> although Seq are very fragile and lists are merely fragile.  The 
> >> duplicate rdf:_2 by merging is really nasty.
> >> 
> >>> The consumer of such ill-formed data is in a bind
> >>> And what's worse is that formally the ill-formed data is not ill-formed,
> >>> it is just triples.
> >>> 
> >>> We could label both with a health warning ...
> >> 
> >> Sandro said that:
> >>> I think Turtle makes RDF Collections seem quite
> >>> nice, and hopefully that will quickly set the tone (perhaps with a
> >>> little help from us) for APIs and SPARQL 1.2 (?) having nice list
> >>> handling functions that are as efficient as native (non-destructive)
> >>> list handling functions. (I hope some APIs do this already.)
> >> 
> >> and the point about Turtle syntax, and the convenience of writing, is 
> >> important.
> >> 
> >> Jena has container and collection APIs to make working with containers, 
> >> collections easier but the details leak out if you can take a triple view.
> > 
> > Thinking about it today, I wonder if we can define a "Simple List" (or
> > "Proper List", or something) as a list that can be losslessly
> > transmitted via turtle's (...) syntax.  That means its structure is
> > b-nodes, with no extra arcs, etc.   (Interestingly, it also means you
> > can't include rdf:type rdf:List arcs....)   Then, we encourage tools to
> > read/write simple lists, and to work with them efficiently.
> 
> To be honest, I think the ref:Collection structure is just a dead duck.
> 
> It works OK for lists on the order of 10 items, but is pretty impractical for thousands, or tens of thousands.

My claim is that in the typical or best cases, Simple Lists can be
handled as efficiently as any general List structure, using your choice
of single-linked list, doubly-linked lists, arrays, or possibly even
something more complicated.

Basically, I'm suggesting systems only turn them into something visible
as triples when someone really wants that, and in general, no one should
want that very much.    I think it will happen in toy systems and
debugging systems, where people do ?p ?o queries; but in deployed apps,
where the software is going to take some action based on particular
predicates, it's not going to be querying for rdf:first or rdf:rest
triples.

Certainly this would be more work for triple stores, but I think it's
it's probably a better option than coming up with something completely
new.

> I would be mildly opposed to anything which promoted it's use further.
> 
> > Simple lists have the advantage over Seq, I believe, that in the face of
> > truth-preserving RDF operations (subsetting, merging, various sound
> > inferences), they never produce wrong data.  In the worst case, they no
> > longer provide data -- the simple list is mangled in some way -- but
> > it'll never just tell you the wrong thing.   I think this is a big win.
> 
> True, but detecting if a Collection is mangled is computationally expensive.

I'm picturing Simple List handling being done using native list
structures, which can't get mangled.   So the only detection is when
someone creates a Simple List *not* using provided list machinery, which
should be a rare case.

   -- Sandro

> - Steve
> 
> >> Ivan:
> >>> But it is a bit of a problem that SPARQL 1.1 still does not cover list handling fully:-(
> >> 
> >> SPARQL 1.2 will not solve anything I'm afraid.  SPARQL 1.1 Query has 
> >> gone as far as it can, except maybe a little extra syntactic sugar with
> >> 
> >> { ?list rdf:rest*/rdf:first ?member }
> >> 
> >> It's much better than handling Seqs.
> > 
> > I'm trying to brainstorm ways to shoe-horn list handling into SPARQL.  I
> > don't know if there's any elegant way, but maybe there's a hack that's
> > not too bad.   
> > 
> > One approach is to update the results format to allow lists of values
> > where it currently allows single values.   And then offer some way to
> > signal you want Simple Lists to be returned as list values instead of
> > b-nodes.    One way to do that would be a LISTRESULT function that takes
> > a simple list's starting bnode and returns something that the results
> > format serializes as a list.  Essentially, it's just a way of saying you
> > want a list result here.
> > 
> > So...
> > 
> >  SELECT ?x ?y LISTRESULT(?z) WHERE...
> > 
> > would require ?z to be bound to a simple list and would pack the list
> > elements into the result format in a manner specific to that results
> > format (XML, JSON, etc).  
> > 
> > Other, normal list builtins like SUBLIST(list, startpos, stoppos) could
> > be used to make sure the size of the returned list is manageable.
> > 
> > Another approach, perhaps, would be some kind of dis-aggregator, a pair
> > of builtins that work together to make a list appear like many different
> > results:
> > 
> > Data:
> >   eg:Alice eg:likes ( eg:Bob eg:Charlie ).   
> > 
> > Query:
> >   SELECT ?who ITERINDEX(?list) ITERVALUE(?list)
> >   WHERE ?who eg:likes ?list.
> > 
> > would return results:
> >    eg:Alice 0 eg:Bob
> >    eg:Alice 1 eg:Charlie
> > 
> > although not necessarily in that order.
> > 
> > That's pretty messy to spec and implement, but might be pretty nice to
> > use.
> > 
> > 
> >> SPARQL Update can manuipuate lists but it's ugly:
> >> http://lists.w3.org/Archives/Public/public-rdf-dawg/2011JanMar/0389.html
> >> 
> >> 
> >> The fundamental problem in SPARQL is that any order is lost; so this 
> >> list access works for some cases, where the order does not matter.
> >> 
> >> Even if a special order preserving construct were available, order is 
> >> lost in the rest of the query.  An order-presering QL would not be 
> >> SPARQL 1.2 - it would be have completely different basis, (e.g. no 
> >> chance of implementing use hash joins), would be very hard to have 
> >> parallel implementation (see "big data" graph languages), and still does 
> >> not work when two ordered different subresults need combining.
> >> 
> >> Fundamentally, there are two problems:
> >> 
> >> 1/ Encoding in triples
> >> 2/ Lists aren't the only datastructure.
> >> 
> >> Reification, containers and collections encode data structure in triples 
> >> but if the app can see "triples" then this leaks through to the 
> >> application.  It also means there can be the possibility of 'bad' data 
> >> as Jeremy says.  Seeing the triples is confusing at best.
> > 
> > Yes, seeing the triples is a problem, but I'm hoping it's not that bad,
> > and that mostly people pull what they want out of a graph and ignore the
> > rest. 
> > 
> >> The structure we have may not say what you want:
> >>   List(1 2 3) != List(3 2 1)
> >> but if a list is being used to express an unordered collection, a higher 
> >> level convention has to be communicated.
> >> 
> >> I think the only complete solution will involve putting structural 
> >> literals into RDF itself, so they are not triple-encoded and can't be 
> >> 'bad'.  When treated as first-class literals with equality rules, 
> >> accessors, and combining rules, then implementations can store them 
> >> specially, provide good APIs, and application programmer won't have to 
> >> learn about the encoding rules.
> > 
> > That sounds pretty hard.  Do you have some design in mind...?
> > 
> >     - Sandro
> > 
> > 
> > 
> > 
> 
> 

Received on Sunday, 16 October 2011 18:26:50 UTC