Meeting: RDB2RDF Working Group Teleconference
15:44:50 [trackbot]
Date: 24 May 2011
+ +49.133.6.aaaa
mhausenblas, when will you join us?
scribe: cygri
16:08:03 [privera]
present+ Percy
16:08:29 [cygri]
Topic: Admin
16:08:34 [Ashok]
scribenick: cygri
16:08:49 [cygri]
PROPOSAL: Accept the minutes of last meeting
16:09:15 [cygri]
(no objections)
16:09:23 [cygri]
RESOLUTION: Accept the minutes of last meeting
16:09:43 [cygri]
Topic: Dealing with RDB NULL values (ISSUE-41 and ISSUE-42)
16:09:49 [cygri]
16:09:49 [trackbot]
ISSUE-41 -- Define how rr:column, rr:template, etc. handle NULL column values -- open
16:09:49 [trackbot]
16:09:52 [cygri]
16:09:52 [trackbot]
ISSUE-42 -- How is the direct mapping suppose to handle NULL values wrt Information Preserving -- open
16:09:52 [trackbot]
16:10:11 [cygri]
Ashok: are we ready to decide on these or should we discuss some more?
16:10:38 [EFranconi]
EFranconi has joined #RDB2RDF
16:10:44 [cygri]
... I think at least on ISSUE-42 there was a concise proposal from david
16:10:50 [cygri]
... ISSUE-41 might be more complicated
16:11:49 [cygri]
cygri: proposal from david was here:
16:11:56 [Marcelo]
Marcelo has joined #rdb2rdf
16:12:17 [cygri]
1) by default R2RML will suppress triples when the subject, predicate, or object columns are NULL (this applies to any of the columns used in template expressions as well as direct column references)
16:12:24 [cygri]
2) if the application needs other handling for NULL values then a SQLQuery can be defined in the mapping to convert NULL values to some other application specific value
16:13:08 [ericP]
minor wording nit: i think you can s/by default//
16:13:44 [ericP]
16:18:35 [cygri]
(discussion on R2RML vs direct mapping)
16:19:09 [Souri]
select name, NVL(salary,0) from employees;
16:19:48 [cygri]
EFranconi: does R2RML have schema informaiton?
16:20:00 [cygri]
Ashok: it's not well spelled out. i argue yes because it's in the queries
16:20:13 [ivan]
16:20:40 [cygri]
ivan: whoever writes R2RML for a specifc DB has to know about the schema of that DB
16:21:10 [ivan]
ack iv_an_ru__
16:21:10 [cygri]
... so in this case, the author of the mapping is in total control
16:21:13 [ivan]
ack ivan
16:21:20 [cygri]
Ashok: I agree
16:21:45 [cygri]
Souri: the mapping author knows about the schema, and specifies how to map it to RDF
16:21:47 [EFranconi]
So, is this a sort of a programming language?
16:22:09 [EFranconi]
got it
16:23:04 [cygri]
Ashok: so question comes up, does the translation spell out an RDF schema? i'd say yes
16:23:22 [ivan]
16:23:23 [Ashok]
16:24:26 [Souri]
+1 to R2RML author is in full control (knows the DB schema and the target RDF schema)
16:25:00 [cygri]
EFranconi: so we can use R2RML to generate a constant for the nulls?
16:25:02 [cygri]
Ashok: yes
16:25:23 [cygri]
... you can do anything you like
16:25:24 [cygri]
EFranconi: that's fine with me
16:25:27 [EFranconi]
16:25:27 [EFranconi]
16:26:06 [cygri]
... I'm ok with that as resolution to ISSUE-41
16:26:13 [Souri]
16:27:05 [dmcneil]
16:27:50 [ericP]
16:28:14 [cygri]
dmcneil: is there something in the SQL standard to test for null?
16:28:17 [ericP]
16:28:19 [cygri]
ericP: yes, IS NULL
16:28:22 [Souri]
what about => rr:graphTemplate "{job}/{etype}"
16:28:25 [Seema]
+1 to the proposed resolution of Issue 41
16:28:54 [cygri]
PROPOSAL: resolve ISSUE-41 per dmcneil's proposal in
16:30:55 [cygri]
cygri: let's treat it as a quad, so if graph goes null then we don't create a triple at all
16:31:08 [cygri]
dmcneil: that's my first reaction too
If one or more of the columns used in a rr:graphTemplate, then corresponding triples will not be generated either.
16:32:28 [Souri]
If one or more of the columns used in a rr:graphTemplate is NULL, then corresponding triples will not be generated either.
16:33:00 [cygri]
PROPOSAL: resolve ISSUE-41 per member:dmcneil's proposal in
16:33:51 [Ashok]
as extended by Souri : If one or more of the columns used in a rr:graphTemplate is NULL, then corresponding triples will not be generated either.
16:34:08 [Souri]
16:34:09 [ivan]
16:34:10 [cygri]
16:34:12 [nunolopes]
16:34:14 [boris]
16:34:14 [dmcneil]
16:34:17 [alexdeleon]
16:34:19 [Marcelo]
16:34:24 [EFranconi]
16:34:41 [cygri]
RESOLUTION: resolve ISSUE-41 per dmcneil's proposal in incl. Souri's extension
16:34:55 [cygri]
Topic: ISSUE-42, NULL in direct mapping
16:35:01 [privera]
16:35:10 [cygri]
Ashok: why is 42 different?
16:35:26 [cygri]
ericP: because no opportunity to inject a SQL query to modify the mapping
16:35:34 [cygri]
... although you can create a view
16:35:59 [cygri]
... but this null-handling view is not in some external configuration
16:36:56 [cygri]
Ashok: from email my impression is that most of the WG said, don't generate a triple
16:37:12 [cygri]
... then there was back and forth ... i'm not sure where enrico stands now
16:37:30 [cygri]
EFranconi: the semantics for null is defined in sql
16:38:09 [cygri]
... my goal is that this WG generates something that respects the semantics of queries
16:38:22 [cygri]
... the WG proposed to drop nulls without showing how to preserve query semantics
16:38:41 [cygri]
... i'm happy if the WG can show how to preserve semantics
16:39:00 [cygri]
... i think it's easier to show this if the nulls are expressed as constants in the RDF
16:39:10 [cygri]
... easier or more convincing
16:39:22 [ericP]
q+ to ask for use cases against which we can measure the preservation of semantics
16:39:35 [cygri]
Ashok: to make these special constants work, we'd have to change SPARQL, right?
16:39:38 [cygri]
... and we can't do that
16:40:01 [Marcelo]
16:40:03 [cygri]
... that's why we're a bit stuck
16:40:10 [cygri]
... so might be better not to generate triples
16:40:24 [cygri]
EFranconi: i think that's not the point
16:40:48 [cygri]
... no matter how we map them, naive queries will not preserve the semantics of nulls
16:41:13 [cygri]
... in all cases, we will have to prescribe a way how to write queries to keep the null semantics and get the right answers
16:41:24 [cygri]
... the RDF graph will never be correct with respect to null
16:41:29 [cygri]
16:41:46 [cygri]
ericP: we have two different query languages here
16:41:52 [cygri]
... so we can't just copy and paste
16:41:57 [cygri]
... have to look at user expectations
16:42:16 [cygri]
... querying for nulls in SQL is in two cases especially
16:42:46 [ivan]
ack dmcneil
16:42:46 [cygri]
... one, i query for missing information
16:42:47 [dmcneil]
16:42:50 [ivan]
ack ericP
16:42:50 [Zakim]
ericP, you wanted to ask for use cases against which we can measure the preservation of semantics
... i'd like to have test queries that we can use to measure if we keep semantics
16:43:48 [cygri]
... enrico, do you have specific use cases for queries that one can do in SQL but not in SPARQL
16:44:30 [cygri]
Marcelo, we talked a lot about null semantics in sql
16:44:36 [cygri]
... but that's not well defined
16:44:36 [cygri]
16:44:37 [EFranconi]
Sorry my phone crashed
16:44:54 [cygri]
... so it's hard to talk about null semantics if the semantics of null is not well defined
16:45:06 [cygri]
... we have to agree on the null semantics
16:47:56 [EFranconi]
16:48:14 [Ashok]
ack next
16:48:49 [cygri]
cygri: it's hard to show correctness if null semantics is not well defined. so we might have to lower our expectations w.r.t. showing correctness of the mapping
16:48:56 [EFranconi]
The semantics of NULLs in SQL is *well* defined
16:48:57 [cygri]
16:49:22 [cygri]
EFranconi: semantics of null in SQL is well defined in the spec
16:49:35 [cygri]
... what's not known is the model-theoretic semantics
16:49:48 [cygri]
... we know the behaviour
16:50:41 [Marcelo]
16:50:57 [cygri]
... (scribe fails to keep up)
16:51:38 [cygri]
... behaviour can be reproduced in SPARQL up to a certain expressivity
16:51:49 [cygri]
16:52:18 [Souri]
16:52:32 [cygri]
... the null-ignoring mapping makes the sql-to-sparql translation of some simple queries quite hard
16:53:02 [Ashok]
ack next
16:53:08 [cygri]
... bottom line: by imitating what sql does, we can re-construct the behaviour up to a certain expressivity
16:53:35 [Ashok]
ack next
16:53:41 [cygri]
Marcelo: i don't agree that the semantics is well-defined
16:54:01 [cygri]
... proving the correctness requires well-defined semantics
16:54:07 [EFranconi]
16:54:17 [cygri]
... hard e.g. with aggregates
16:54:42 [cygri]
... translation will be very hard with sparql 1.1
16:55:19 [Ashok]
cygri: Two different notions of what is a correct translation
16:55:54 [Ashok]
... query answering notion of correctness
16:56:23 [Ashok]
... another way of defining correntness based on the meaning of a graph
16:57:07 [Ashok]
... model theory and semanatics ... model theory not well defined wrt NULLs
16:57:43 [Ashok]
cygri: Ignoring nodel-theoretic approach is not good
16:58:29 [cygri]
Souri: from practical point of view ... let's consider a db with some sparse tables
16:58:29 [Ashok]
... we have concrete test cases on using RDF triples in OWL and RDF
16:59:04 [Ashok]
ack next
16:59:13 [Ashok]
16:59:34 [cygri]
... if we can capture the schema, and the present data, but leave out the null triples, then we can still reconstruct the old database
16:59:42 [dmcneil]
16:59:50 [cygri]
... and if we can guarantee that, then we will be able to translate queries
17:00:30 [cygri]
... so if the direct mapping always maps the schema too, and skips nulls, then we should be ok
17:00:52 [Ashok]
Souri: DM should always generate the schema
17:01:06 [Ashok]
... then we know where the missing values are
17:01:18 [cygri]
EFranconi: i maintain that nulls are well-defined, by their behaviour
17:01:40 [cygri]
... but i agree we want to avoid complexity
17:02:03 [cygri]
... so that's why i would limit expressivity
17:02:49 [Souri]
My proposal: maintain data equivalence (allowing converting either way, without loss of info) => this can be done by DM 1) always generating schema triples and 2) skipping generation of triples for NULL values
17:02:51 [cygri]
... Souri is right, if we have schema we still have the same information
17:03:09 [cygri]
... the information is there, so we can define correct queries
17:03:30 [cygri]
... but it will be more complex, and i still have to see a concrete proposal
17:03:59 [cygri]
... and is the result compositional?
17:04:17 [Souri]
?x rdf:type :EMP . ?p rdfs:range :EMP . OPTIONAL (?x ?p ?val)
17:04:31 [Souri]
correction:=> ?x rdf:type :EMP . ?p rdfs:domain :EMP . OPTIONAL (?x ?p ?val)
17:05:01 [Souri]
correction again (syntax) => ?x rdf:type :EMP . ?p rdfs:domain :EMP . OPTIONAL { ?x ?p ?val }
17:05:32 [cygri]
dmcneil: if we had distinct null values for each property, would that solve the RIF/OWL problems?
17:05:44 [cygri]
Ashok: we're out of time
17:06:02 [cygri]
... who wants to take this further?
17:07:04 [Souri]
q+ for a one sentence
17:07:16 [EFranconi]
17:07:28 [dmcneil]
17:07:43 [cygri]
cygri: i'd like to know what spec changes this all implies
meeting: RDB2RDF
17:08:30 [cygri]
rssagent, generate minutes
17:08:35 [Ashok]
Chair: Ashok
17:08:37 [cygri]
rrsagent, generate minutes
rrsagent, make logs public
17:08:57 [privera]
privera has left #RDB2RDF
17:09:26 [Ashok]
regrests: Alexandre
17:09:42 [Ashok]
regrets: Alexandre
17:11:44 [mhausenblas]
oh hey Souri
17:23:34 [cygri]
are you ok to take an action to implement the ISSUE-41 changes?
17:23:51 [cygri]
(which may mean, change nothing ... not sure)
