Re: Bound and Unbound Datasets

On Jun 3, 2013, at 8:34 AM, Sandro Hawke <sandro@w3.org> wrote:

> It looks to me like we have two very different camps concerning datasets.    ISSUE-131 has brought this to light again, but the camps long predate that issue.  The division is between the people who have been using datasets with application-dependent semantics for a long time and the people who want to build things which require standard interoperable semantics for datasets.    I'm in a latter camp, and was arguing for it for a long time, but I decided some months ago I could live without standard semantics via a very convoluted mechanism.  I agreed to document that mechanism, but as I have contemplated doing so, I've been dragging my feet because it's pretty weird and I think the group wont like it.    (Talking off-list to Pat about it yesterday, I think it's safe to say he hated it.)
> 
> So I have an alternative proposal.  Let's have two kinds of datasets:
> 
> * "Unbound" datasets are what's been in SPARQL and rdf-concepts so far.   According to the standard they are just structure, with no semantics.  In practice, their semantics are determined by the application in which they are used.
> 
> * "Bound" datasets have the following semantics:
>      (1) for the dataset to be true, the default graph must be true;
>      (2) graph names denote the graphs they are paired with.

I think many people have assumed that their applications could use this meaning. In JSON-LD, a named graph appears within an object which may have other properties asserted on the graph name. Certainly, from a pure semantics (unbound) perspective, there is no relationship, except that intuitively this is what people will expect (at least the 1% who end up using this feature). For the Payswarm use case, being able to sign a graph means asserting a signature on the graph name, which relates to the content of the graph itself. If you can't do this, then a big set of use cases become unavailable.

Strictly speaking, a Bound dataset would be a subset of an Unbound dataset, AFAIKT.

> I suggest we indicate a dataset is bound by putting the magic triple { <> a rdf:BoundDataset } in its default graph.   (This triple would be treated specially in the RDF semantics for any system which implements/recognizes bound datasets; to other systems (eg SPARQL) it's just another triple.)  If a dataset does not have this flag, it's unbound.   Of course, being unbound, it has application-specific semantics and so an application may choose to treat it as bound.

My concern here is that if you mix bound and unbound datasets, you now either create a contradiction or all of a sudden, things which were previously unbound now seem to be bound. A more minimal mechanism could be to have a rdf:GraphIdentifier type that could be asserted in the default graph which would indicate that the subject does denote the graph which it is also used to name.

This still doesn't completely solve the issue, as dataset merging still creates a problem if the same name were coming from an Unbound dataset. It might actually be better to have an rdf:UnboundDataset class, which would then _infect_ a bound dataset with which it was merged, so that you would no longer be able to count on graph names denoting their graphs.

For practical purposes, I think we really need a concept of a bound dataset, or at least a bound graph within a dataset.

Gregg

> I think this would solve a lot of problems, and not raise too many.     I expect many of the folks who wanted us to standardize named graphs, fix reification, etc, when this group was chartered, would much prefer having this option to having only the half-solution that's in our specs now.
> 
>      -- Sandro
> 
> 
> 
> 
> 
> 

Received on Monday, 3 June 2013 16:55:10 UTC