Re: ISSUE-41 - NULL handling - Proposal

David,

On 17 May 2011, at 20:46, David McNeil wrote:

> I propose that we resolve ISSUE-41 (for R2RML) as follows:
> 
> 1) by default R2RML will suppress triples when the subject, predicate, or object columns are NULL (this applies to any of the columns used in template expressions as well as direct column references)
> 
> 2) if the application needs other handling for NULL values then a SQLQuery can be defined in the mapping to convert NULL values to some other application specific value
> 
> 3) if there is consensus that a more convenient mechanism than a SQLQuery is required then we can add properties to be used in subject, predicate, or object maps that specify the value to use for NULLs. Something like:
> 
>   rr:nullValue "foo"
> 
>   This would mean to use the value "foo" anytime a NULL column value is encountered in the context of this subject, predicate, or object map.
> 
> 4) define a value to indicate that a unique blank node should be generated for each NULL value. Something like:
> 
>    rr:nullValue rr:blank
> 
>    This means that a different blank node should be generated for every NULL column encountered in the context of this subject, predicate or object map. Richard's email [1] points out that SQL NULL is handled differently in the case of aggregates. It seems like this is an important point to take into consideration if the mapper decides to generate blank nodes for NULLs because it seems to me this would make SPARQL aggregate results differ from their SQL equivalents.

+1 on 1 and 2.

I'd argue quite strongly against doing 3 or 4. It doesn't add any power above what we get with 2 already, and it's not worth the effort for such a rare modelling situation. We don't need it.

I'd be quite happy if we could resolve this quickly for R2RML. For the direct mapping it might be more difficult. The issues at stake are quite different. The direct mapping requires the “most correct” solution. R2RML doesn't need to always do the “most correct” thing out of the box, because one can always change the behaviour, so we should aim for the “most convenient” solution.

@Chairs: Can we perhaps break ISSUE-41 into two separate issues, one for R2RML and one for Direct Mapping, with the aim of quickly resolving the former?

Best,
Richard




> 
> Personally I would just do #1 & 2. I would omit #3 and #4 from the spec at this time in the interest of getting an initial version published. But if it was necessary to gain consensus then I think I could back 3 & 4 as well.
> 
> -David
> 
> [1] http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2011May/0053.html

Received on Tuesday, 17 May 2011 21:46:56 UTC