Re: RDF-ISSUE-25 (Deprecate Reification): Should we deprecate (RDF 2004) reification? [Cleanup tasks]

On Fri, 2011-04-08 at 11:02 +0100, Richard Cyganiak wrote:
> On 8 Apr 2011, at 05:42, Sandro Hawke wrote:
> > <u1> { <a> <b> 1, 2 }
> > <u2> { <a> <c> 3, 4 }
> > 
> > would be:
> > 
> > <u1> eg:hasTriple [ rdf:subject <a>; rdf:predicate <b>; rdf:object 1 ],
> >                  [ rdf:subject <a>; rdf:predicate <b>; rdf:object 2 ].
> > 
> > <u2> eg:hasTriple [ rdf:subject <a>; rdf:predicate <c>; rdf:object 3 ],
> >                  [ rdf:subject <a>; rdf:predicate <c>; rdf:object 4 ].
> > 
> > So, why do SPARQL folks prefer TriG and N-Quads to these forms?  I don't
> > know.    
> 
> The second is about five times more verbose. 

I'm not sure what you mean by "the second", but yes, these forms *feel*
verbose.    But are they?

Compared to TriG, these forms use about 10 bytes more per graph and
about 45 bytes more per triple.   I expect when gzip'd those numbers
would drop to about 2 bytes and 5 bytes.   Probably not a big deal.

If you're storing the reified graphs in a g-box, then yes, you have a
~5x expansion of the original graphs, but is that a fair comparison?
With the other approaches, you CAN'T store the result in a g-box.  So
this is comparing apples to ... empty space.

> It is unsuitable for hand-writing.

Agreed, much like N-Triples or N-Quads.   Or RDFa or RDF/XML. 

>  To be even remotely readable and efficiently processable, it relies on something that is not significant in RDF: order of statements.

I don't agree with your characterization here.   This only depends on
the ordering of statements the way lots of RDF data which serializes
objects does -- if the parser hands you the triples with good locality,
the data can be streamed; otherwise it must be buffered or queried.
 
>  It is brittle because it raises the question of what to do with incomplete reified triples. 

It seems to me this is just like with any other error or omission in
the inputs.  You can ignore it, ask a human for help, or whatever, as
appropriate to your situation.    The semantics are clear enough: some
graph has some triples, but you haven't been told exactly what they are.

> Its verbosity explodes exponentially when one wants to say that Alice said that Bob said that Charlie said something.

I don't think so.   

How would you do this in TriG, which also doesn't support nesting?
Something like this:

<Alice> said <AlicesClaim>.
<AlicesClaim> { <Bob> said <BobsClaim> }
<BobsClaim>  {<Charlie> said <CharliesClaim> } 

That structure translates exactly to what I'm describing above, with
just the constant-factor expansion.

> > If you put that into N-Triples and sort it by predicate, performing the import is going to
> > require holding the entire structure in memory.  But a valid response might be, "don't do that".
> 
> "Don't do that" is not a practical response. The order of statements is not significant in RDF, and not maintained by many systems.

I disagree that it's not a practical response.   The point here is that
if you want to convey a multi-graph dataset in a format which allows it
to be parsed with constant memory and linear time, you will need to pay
attention to the order in which triples occur in that serialization.
And what you have to do is exactly what you have to do in serializing
turtle to avoid extraneous node ids -- using [] and () whenever possible
-- it's not a new or special or obscure thing.

This approach has the feature that you *can* store your multi-graph
dataset in a single graph, but that doesn't mean you ever have to do
that.  Just as you parse TriG with a special parser and load the data
into a quad-store, if you're doing serious multi-graph work, you would
presumably keep using a special parsing system and special store, for
efficiency, even though with this approach you don't really have to.
 
    -- Sandro (who can't believe he's defending RDF reification...)

Received on Friday, 8 April 2011 18:04:40 UTC