RDB2RDF Working Group Teleconference -- 24 May 2011

<trackbot> Date: 24 May 2011

mhausenblas, when will you join us?

<Ashok> scribe: cygri

Admin

<Ashok> scribenick: cygri

PROPOSAL: Accept the minutes of last meeting http://www.w3.org/2011/05/17-rdb2rdf-minutes.html

(no objections)

RESOLUTION: Accept the minutes of last meeting http://www.w3.org/2011/05/17-rdb2rdf-minutes.html

Dealing with RDB NULL values (ISSUE-41 and ISSUE-42)

ISSUE-41?

<trackbot> ISSUE-41 -- Define how rr:column, rr:template, etc. handle NULL column values -- open

<trackbot> http://www.w3.org/2001/sw/rdb2rdf/track/issues/41

ISSUE-42?

<trackbot> ISSUE-42 -- How is the direct mapping suppose to handle NULL values wrt Information Preserving -- open

<trackbot> http://www.w3.org/2001/sw/rdb2rdf/track/issues/42

Ashok: are we ready to decide on these or should we discuss some more?
... I think at least on ISSUE-42 there was a concise proposal from david
... ISSUE-41 might be more complicated

cygri: proposal from david was here: http://www.w3.org/2001/sw/rdb2rdf/wiki/RDBNullValues#R2RML

1) by default R2RML will suppress triples when the subject, predicate, or object columns are NULL (this applies to any of the columns used in template expressions as well as direct column references)

2) if the application needs other handling for NULL values then a SQLQuery can be defined in the mapping to convert NULL values to some other application specific value

<ericP> minor wording nit: i think you can s/by default//

<ericP> that is, r2rml will never encode a NULL value in an RDF graph

(discussion on R2RML vs direct mapping)

<Souri> select name, NVL(salary,0) from employees;

EFranconi: does R2RML have schema informaiton?

Ashok: it's not well spelled out. i argue yes because it's in the queries

ivan: whoever writes R2RML for a specifc DB has to know about the schema of that DB
... so in this case, the author of the mapping is in total control

Ashok: I agree

Souri: the mapping author knows about the schema, and specifies how to map it to RDF

<EFranconi> So, is this a sort of a programming language?

<EFranconi> got it

Ashok: so question comes up, does the translation spell out an RDF schema? i'd say yes

<Souri> +1 to R2RML author is in full control (knows the DB schema and the target RDF schema)

EFranconi: so we can use R2RML to generate a constant for the nulls?

Ashok: yes
... you can do anything you like

EFranconi: that's fine with me

<EFranconi> +

<EFranconi> 1

EFranconi: I'm ok with that as resolution to ISSUE-41

<Souri> +1

<dmcneil> +q

<ericP> IS NULL

dmcneil: is there something in the SQL standard to test for null?

<ericP> e.g. WHERE foo.bar IS NULL

ericP: yes, IS NULL

<Souri> what about => rr:graphTemplate "http://example.com/graph/{job}/{etype}"

<Seema> +1 to the proposed resolution of Issue 41

PROPOSAL: resolve ISSUE-41 per dmcneil's proposal in http://www.w3.org/2001/sw/rdb2rdf/wiki/RDBNullValues#R2RML

Souri: if a graph template is null, where does the triple go to?

cygri: let's treat it as a quad, so if graph goes null then we don't create a triple at all

dmcneil: that's my first reaction too

<Souri> If one or more of the columns used in a rr:graphTemplate, then corresponding triples will not be generated either.

<Souri> If one or more of the columns used in a rr:graphTemplate is NULL, then corresponding triples will not be generated either.

PROPOSAL: resolve ISSUE-41 per member:dmcneil's proposal in http://www.w3.org/2001/sw/rdb2rdf/wiki/RDBNullValues#R2RML

<Ashok> as extended by Souri : If one or more of the columns used in a rr:graphTemplate is NULL, then corresponding triples will not be generated either.

<Souri> +1

<ivan> +1

<nunolopes> +1

<boris> +1

<dmcneil> +1

<alexdeleon> +1

<Marcelo> +1

<EFranconi> +1

RESOLUTION: resolve ISSUE-41 per dmcneil's proposal in http://www.w3.org/2001/sw/rdb2rdf/wiki/RDBNullValues#R2RML incl. Souri's extension

ISSUE-42, NULL in direct mapping

<privera> +1

Ashok: why is 42 different?

ericP: because no opportunity to inject a SQL query to modify the mapping
... although you can create a view
... but this null-handling view is not in some external configuration

Ashok: from email my impression is that most of the WG said, don't generate a triple
... then there was back and forth ... i'm not sure where enrico stands now

EFranconi: the semantics for null is defined in sql
... my goal is that this WG generates something that respects the semantics of queries
... the WG proposed to drop nulls without showing how to preserve query semantics
... i'm happy if the WG can show how to preserve semantics
... i think it's easier to show this if the nulls are expressed as constants in the RDF
... easier or more convincing

Ashok: to make these special constants work, we'd have to change SPARQL, right?
... and we can't do that

<Marcelo> +q

Ashok: that's why we're a bit stuck
... so might be better not to generate triples

EFranconi: i think that's not the point
... no matter how we map them, naive queries will not preserve the semantics of nulls
... in all cases, we will have to prescribe a way how to write queries to keep the null semantics and get the right answers
... the RDF graph will never be correct with respect to null

ericP: we have two different query languages here
... so we can't just copy and paste
... have to look at user expectations
... querying for nulls in SQL is in two cases especially
... one, i query for missing information

<dmcneil> -q

<Zakim> ericP, you wanted to ask for use cases against which we can measure the preservation of semantics

<EFranconi> +q

ericP: i'd like to have test queries that we can use to measure if we keep semantics
... enrico, do you have specific use cases for queries that one can do in SQL but not in SPARQL

Marcelo, we talked a lot about null semantics in sql

scribe: but that's not well defined

<EFranconi> Sorry my phone crashed

scribe: so it's hard to talk about null semantics if the semantics of null is not well defined
... we have to agree on the null semantics

<EFranconi> I'm back

cygri: it's hard to show correctness if null semantics is not well defined. so we might have to lower our expectations w.r.t. showing correctness of the mapping

<EFranconi> The semantics of NULLs in SQL is *well* defined

EFranconi: semantics of null in SQL is well defined in the spec
... what's not known is the model-theoretic semantics
... we know the behaviour

<Marcelo> +q

EFranconi: (scribe fails to keep up)
... behaviour can be reproduced in SPARQL up to a certain expressivity
... the null-ignoring mapping makes the sql-to-sparql translation of some simple queries quite hard
... bottom line: by imitating what sql does, we can re-construct the behaviour up to a certain expressivity

Marcelo: i don't agree that the semantics is well-defined
... proving the correctness requires well-defined semantics
... hard e.g. with aggregates
... translation will be very hard with sparql 1.1

<Ashok> cygri: Two different notions of what is a correct translation

<Ashok> ... query answering notion of correctness

<Ashok> ... another way of defining correntness based on the meaning of a graph

<Ashok> ... model theory and semanatics ... model theory not well defined wrt NULLs

<Ashok> cygri: Ignoring nodel-theoretic approach is not good

Souri: from practical point of view ... let's consider a db with some sparse tables

<Ashok> ... we have concrete test cases on using RDF triples in OWL and RDF

Souri: if we can capture the schema, and the present data, but leave out the null triples, then we can still reconstruct the old database

<dmcneil> +q

Souri: and if we can guarantee that, then we will be able to translate queries
... so if the direct mapping always maps the schema too, and skips nulls, then we should be ok

<Ashok> Souri: DM should always generate the schema

<Ashok> ... then we know where the missing values are

EFranconi: i maintain that nulls are well-defined, by their behaviour
... but i agree we want to avoid complexity
... so that's why i would limit expressivity

<Souri> My proposal: maintain data equivalence (allowing converting either way, without loss of info) => this can be done by DM 1) always generating schema triples and 2) skipping generation of triples for NULL values

EFranconi: Souri is right, if we have schema we still have the same information
... the information is there, so we can define correct queries
... but it will be more complex, and i still have to see a concrete proposal
... and is the result compositional?

<Souri> ?x rdf:type :EMP . ?p rdfs:range :EMP . OPTIONAL (?x ?p ?val)

<Souri> correction:=> ?x rdf:type :EMP . ?p rdfs:domain :EMP . OPTIONAL (?x ?p ?val)

<Souri> correction again (syntax) => ?x rdf:type :EMP . ?p rdfs:domain :EMP . OPTIONAL { ?x ?p ?val }

dmcneil: if we had distinct null values for each property, would that solve the RIF/OWL problems?

Ashok: we're out of time
... who wants to take this further?

cygri: i'd like to know what spec changes this all implies

adjourned

<ericP> http://128.31.35.171/2001/sw/rdb2rdf/r2rml/

<Ashok> meeting: RDB2RDF

rssagent, generate minutes

<Ashok> regrests: Alexandre

- DRAFT -

RDB2RDF Working Group Teleconference

24 May 2011

Attendees

Contents

Admin

Dealing with RDB NULL values (ISSUE-41 and ISSUE-42)

ISSUE-42, NULL in direct mapping

Summary of Action Items

Scribe.perl diagnostic output