Re: Bound and Unbound Datasets

On 06/03/2013 09:54 AM, Gregg Kellogg wrote:
> On Jun 3, 2013, at 8:34 AM, Sandro Hawke <sandro@w3.org> wrote:
>
>> It looks to me like we have two very different camps concerning datasets.    ISSUE-131 has brought this to light again, but the camps long predate that issue.  The division is between the people who have been using datasets with application-dependent semantics for a long time and the people who want to build things which require standard interoperable semantics for datasets.    I'm in a latter camp, and was arguing for it for a long time, but I decided some months ago I could live without standard semantics via a very convoluted mechanism.  I agreed to document that mechanism, but as I have contemplated doing so, I've been dragging my feet because it's pretty weird and I think the group wont like it.    (Talking off-list to Pat about it yesterday, I think it's safe to say he hated it.)
>>
>> So I have an alternative proposal.  Let's have two kinds of datasets:
>>
>> * "Unbound" datasets are what's been in SPARQL and rdf-concepts so far.   According to the standard they are just structure, with no semantics.  In practice, their semantics are determined by the application in which they are used.
>>
>> * "Bound" datasets have the following semantics:
>>       (1) for the dataset to be true, the default graph must be true;
>>       (2) graph names denote the graphs they are paired with.
> I think many people have assumed that their applications could use this meaning. In JSON-LD, a named graph appears within an object which may have other properties asserted on the graph name. Certainly, from a pure semantics (unbound) perspective, there is no relationship, except that intuitively this is what people will expect (at least the 1% who end up using this feature). For the Payswarm use case, being able to sign a graph means asserting a signature on the graph name, which relates to the content of the graph itself. If you can't do this, then a big set of use cases become unavailable.
>
> Strictly speaking, a Bound dataset would be a subset of an Unbound dataset, AFAIKT.
>
>> I suggest we indicate a dataset is bound by putting the magic triple { <> a rdf:BoundDataset } in its default graph.   (This triple would be treated specially in the RDF semantics for any system which implements/recognizes bound datasets; to other systems (eg SPARQL) it's just another triple.)  If a dataset does not have this flag, it's unbound.   Of course, being unbound, it has application-specific semantics and so an application may choose to treat it as bound.
> My concern here is that if you mix bound and unbound datasets, you now either create a contradiction or all of a sudden, things which were previously unbound now seem to be bound.

Yeah, but if you merge datasets without paying attention to their 
intended semantics, you're going to get garbage anyway, given the 
variety of dataset semantics we know are out there.     So I'm thinking 
this wouldn't actually make things worse.

>   A more minimal mechanism could be to have a rdf:GraphIdentifier type that could be asserted in the default graph which would indicate that the subject does denote the graph which it is also used to name.

Yeah, that is an alternative.   It would still have to be magic, and I 
expect the semantics would be more awkward.  It looks like a class that 
you might infer, or something, but I suspect that would be very, very 
hard to specify and implement.

And my sense is that users will either want bound datasets or their own 
semantics.    If they want some kind of mix, well, that's just another 
kind of their own semantics.

> This still doesn't completely solve the issue, as dataset merging still creates a problem if the same name were coming from an Unbound dataset. It might actually be better to have an rdf:UnboundDataset class, which would then _infect_ a bound dataset with which it was merged, so that you would no longer be able to count on graph names denoting their graphs.

... except that we can't assume existing datasets, without either of 
these triples, are bound.   So I think we have to treat "old" datasets 
as unbound.    (Maybe my terminology isn't quite right. "unbound" should 
be "not-necessarily-bound".)

> For practical purposes, I think we really need a concept of a bound dataset, or at least a bound graph within a dataset.

Yep, that's my conclusion.

        -- Sandro

> Gregg
>
>> I think this would solve a lot of problems, and not raise too many.     I expect many of the folks who wanted us to standardize named graphs, fix reification, etc, when this group was chartered, would much prefer having this option to having only the half-solution that's in our specs now.
>>
>>       -- Sandro
>>
>>
>>
>>
>>
>>
>

Received on Monday, 3 June 2013 22:26:15 UTC