15:53:33 RRSAgent has joined #RDB2RDF 15:53:33 logging to http://www.w3.org/2012/05/15-RDB2RDF-irc 15:53:49 zakim, this will be rdb2rdf 15:53:49 ok, Ashok; I see SW_RDB2RDF()12:00PM scheduled to start in 7 minutes 15:54:00 chair: Ashok 15:54:15 meeting: RDB2RDF Teleconference 15:58:30 nunolopes has joined #rdb2rdf 15:58:45 SW_RDB2RDF()12:00PM has now started 15:58:52 + +1.781.273.aaaa 15:59:02 Zakim, aaaa is OpenLink_Software 15:59:02 +OpenLink_Software; got it 15:59:08 Zakim, OpenLink_Software is temporarily me 15:59:08 +MacTed; got it 15:59:22 dmcneil has joined #RDB2RDF 16:00:05 +Ashok_Malhotra 16:00:15 + +1.314.395.aabb 16:00:26 zakim, who is on the phone? 16:00:26 On the phone I see MacTed, Ashok_Malhotra, +1.314.395.aabb 16:00:45 +??P26 16:00:56 Zakim, ??P26 is me 16:01:12 zakim, mhausenblas is temporarily me 16:01:12 sorry, cygri, I do not recognize a party named 'mhausenblas' 16:01:14 + +3539149aacc 16:01:18 juansequeda has joined #rdb2rdf 16:01:24 zakim, aacc is me 16:01:24 +cygri; got it 16:01:32 + +575737aadd 16:01:38 Zakim, aadd is me 16:01:38 +juansequeda; got it 16:02:14 present: Richard, David, Ted, Ashok, Juan, Nuno 16:02:41 regrets: Eric, Ivan, Michael, Boris 16:02:48 zakim, nunolopes is with me 16:02:48 +nunolopes; got it 16:03:13 -??P26 16:03:30 zakim, pick a victim 16:03:30 Not knowing who is chairing or who scribed recently, I propose +1.314.395.aabb 16:03:35 I can do it 16:03:50 scribenick: David 16:03:50 Zakim, aabb is dmcneil 16:03:50 +dmcneil; got it 16:03:52 joerg has joined #RDB2RDF 16:03:53 Zakim, who's here? 16:03:53 On the phone I see MacTed, Ashok_Malhotra, dmcneil, cygri, juansequeda 16:03:54 cygri has cygri, nunolopes 16:03:54 On IRC I see joerg, juansequeda, dmcneil, nunolopes, RRSAgent, Zakim, Ashok, cygri, LeeF, MacTed, betehess, trackbot, ericP 16:04:09 scribenick: dmcneil 16:04:27 Topic: 1. Admin PROPOSAL: Accept the minutes of last meeting http://www.w3.org/2012/05/08-RDB2RDF-minutes.html 16:05:06 minutes accepted 16:05:45 +??P6 16:05:47 Topic: 2. Implementability for tables w/o primary key 16:06:04 where we were: we spoke last time about what to do 16:06:16 + +1.603.897.aaee 16:06:40 one thing we spoke about was writing some text describing the disconnect between the DM and the R2RML 16:07:04 there was a sense that we should bite the bullet and address this issue by extending R2RML 16:07:13 Richard & Eric wrote up a proposal 16:07:23 proposal was here: http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2012May/0075.html 16:07:31 and he thought we had reasonable support for the proposal 16:07:57 but, then David argued that we should do nothing because you can use vendor-specific SQL in R2RML to accomplish the goal of mapping tables without primary keys 16:08:02 + +1.603.897.aaff 16:08:38 dmcneil: core issue is that vendor-specific sql is needed 16:08:54 ... i argue best approch is to do it with views 16:09:07 ... because that puts the burden on the DM implementer 16:09:18 David: I think the problem can be done with views, so no change is needed 16:10:09 ... whatever SQL query would be needed to actually compute the DM, just write that into a view 16:11:00 Which vender does not have a generate_series/rownum function? 16:11:10 ... example, postgres has a function for computing sequences. same with other DBs 16:11:33 ... then the usual R2RML mechanisms can be used to generate blank nodes 16:11:45 q? 16:12:15 +q 16:12:25 q+ 16:12:51 ashok: so we need to find out whether all the DBs have some mechanism like that? 16:12:53 Souri_ has joined #RDB2RDF 16:13:11 Need to find out whether SQL Server and DB2 has such a function 16:13:23 dmcneil: suppose DB2 doesn't have it. how does hiding this behind RowBlankNode help then? 16:14:52 cygri: once concern is that the way R2RML is defined in terms of core SQL 2008 16:15:10 ... we acknowledged that people would use vendor specific SQL 16:15:40 ... the way we dealt with that is that if you wnat to use db specific dialect of SQL then you are defining a vendor specific extension of R2RML 16:15:44 zakim, who is on the phone? 16:15:44 On the phone I see MacTed, Ashok_Malhotra, dmcneil, cygri, juansequeda, ??P6, +1.603.897.aaee, +1.603.897.aaff 16:15:47 cygri has cygri, nunolopes 16:16:03 ... taking that to it's logical conclusion, then we are saying that R2RML as specified cannot be used to implement the Direct Mapping 16:16:24 ... since the general assumption is it is not possible to support this case with generic SQL 16:16:35 ... so the user must immediately extend R2RML for this case 16:16:38 +q 16:16:52 ... this is not a particularly pleasant situation 16:16:58 ashok: neither of this is perfect 16:17:04 q+ 16:17:49 ack next 16:17:54 ack next 16:18:34 dmcneil: I think that is a specious argument, because we expect that most mappings will have vendor-specific SQL in them 16:18:55 cygri: correction, by default embedded SQL is expected to be SQL CORE 2008 16:19:21 macted: if we did that (not sure what "that" references) it was a serious mistake 16:19:43 http://www.w3.org/2001/sw/rdb2rdf/r2rml/#conformance 16:19:47 This specification defines R2RML for databases that conform to Core SQL 2008, as defined in ISO/IEC 9075-1:2008 [SQL1] and ISO/IEC 9075-2:2008 [SQL2]. Processors and mappings may have to deviate from the R2RML specification in order to support databases that do not conform to this version of SQL. 16:20:27 from the spec "The absence of a SQL version identifier indicates that no claim to Core SQL 2008 conformance is made." 16:20:52 ack next 16:20:54 +q 16:21:26 MacTed: suggests that he thought we had agreement last week 16:21:32 discussion of who dissented 16:21:49 cygri: eric strongly believes that preserving cardinality is important 16:21:59 macted: let them choose the DM variant that preserves it then 16:22:15 cygri: the argument was made that the cardinality preserving option is more correct 16:22:29 ... I argued that that is just an option, implementations are free to implement it 16:22:53 ... was also ok with putting in warnings about the non-cardinality preserving option 16:23:31 ... from last week we said if we had more time we would define a way to make this work in R2RML 16:23:54 ... the point was raised that for backwards compatibility we cannot remove it later 16:24:18 ... suggested wording that "we might remove this in the future" was not well received 16:24:35 Seema has joined #rdb2rdf 16:24:39 ... also, they don't want options in the DM, just a single monolithic approach 16:24:57 ashok: last week, we spoke about some text that ivan had crafted 16:25:07 ... are you speaking about that, or a previous position they has 16:25:10 s/has/had 16:25:19 cygri: I am speaking about some text that I drafted 16:25:37 ... ivan drafted some text proposing no change but saying they are incompatible 16:26:04 http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2012May/0054.html 16:26:16 PROPOSAL A: In the DM spec, replace the following text: [[ If the table has no primary key, the row node is a fresh blank node that is unique to this row. ]] with this: [[ If the table has no primary key, the row node is a blank node. Distinct blank nodes MUST be generated for rows with distinct column values. For duplicate rows with identical values, implementations SHOULD generate a fresh blank for each duplicate row (resulting in a non-lean RDF graph [R 16:27:00 I still support Proposal A :) 16:27:37 o In the DM, instead of "is intended to provide a default behavior for R2RML: RDB to RDF Mapping Language" say "is intended to provide a default behavior for R2RML: RDB to RDF Mapping Language for tables which have at least one unique key" o Add to the R2RML document (probably in the intro part): "R2RML implementations are encouraged to provide a default mapping equivalent to the Direct Mapping for tables which have at least one unique key" o Add a Note 16:27:44 MacTed has joined #RDB2RDF 16:28:54 cygri: last week we said we would explore adding something to R2RML 16:29:13 ... then the objection from david, which i think is reasonable, I can see where he is coming from 16:29:26 ... did eric come up with a use case? 16:29:27 http://www.w3.org/2001/sw/rdb2rdf/wiki/Non-unique_Tables 16:29:35 ashok: yes, it is on the wiki, but it is a bit complicated 16:31:03 ashok: do you support one of these two proposals 16:31:27 dmcneil: i think, that other than doing nothing, adding a rownum column to R2RML is the most interesting 16:31:54 cygri: the problem with that approach is that it leads too much to a particular implementation 16:32:05 ... implementations could choose something more efficient than rownum 16:32:42 ... for example, this mysql query pasted (not sure which query) numbers the rows, but doesn't use rownum 16:33:05 how about rr:genRowId ? 16:33:10 ... since rownum forces a particular approach that requires the user to spell out what the blank node identifier looks like 16:33:12 q+ 16:33:17 In a ROWNUM capable DB, the mapping processor implicitly converts it to the following R2RML mapping (the actual implementation may vary from DB to DB based upon how the equivalent of ROWNUM can be implemented) rr:logicalTable [ rr:sqlQuery """ Select ROWNUM AS "rr:rownum", t.* from Wonderland t order by "rr:rownum" """ ] rr:subjectMap [ rr:template "http://Wonderland/my_rownum={\"rr:rownum\"}" ] We can also say that rr:rownum cannot be used when log 16:33:36 ack next 16:34:35 present+: Souri 16:34:55 present+: Seema 16:34:56 A SQL query is a SELECT query in the SQL language that can be executed over the input database. The string must conform to the production in [SQL2] 16:34:57 dmcneil: since SQL defaults to not be SQL 2008 in an R2RML view, this means adding an R2RML view by default makes a vendor specific mapping 16:35:20 cygri: actually the SQL version identifier doesn't affect the processing at all, per the spec 16:37:48 souri: regarding the rownum discussion 16:37:50 ack next 16:37:57 ... for every row we need a unique id 16:38:27 ... if we knew the target database, then we could write vendor-specific SQL 16:38:51 ... but, if we present that to the user, then the user may not understand that and it is not portable to other database backends 16:39:00 ... therefore we want a logical representation 16:39:12 ... if "rownum" has too much meaning with it 16:39:26 ... then we can use "rowidentifier" or something 16:39:46 ... this could be used in blank node generation or URI generation 16:40:15 q+ 16:40:20 ... we need one R2RML construct, which provides a point of indirection for whatever vendor-specific mechanism it will be translated to 16:40:39 ashok: isn't the ability to add a column that gets it's values from a function part of SQL-2008? 16:41:06 souri: not sure, but this is very common 16:41:30 ashok: so we could add a column in the SQL query that gives distinct numbers 16:41:35 souri: a sequence generator, right? 16:41:49 ... in Oracle, access to the sequence generator is a DML operation 16:42:00 q? 16:42:09 ack next 16:42:24 cygri: regarding souri's point about semantics of the pseudo-rownum column 16:42:31 ... agree all we need is an identifier 16:42:47 ... understand that calling it rownum, does not mean it must be literally the db's rownum 16:43:06 ... just one tiny step from that to the RowBlankNode proposal 16:43:19 ... we leave it completely up to the implementation what the blank node template it 16:43:30 +q 16:43:51 ... since blank node identifiers have no semantics, they just must be unique 16:44:26 ... so i can't see the usefullness of letting the user put the rownum into a template 16:44:31 ack next 16:45:31 +EricP 16:45:53 dmcneil: we worked out how regular blank node ids work, we based that on whether the blank node template produces the same value 16:45:59 present+: Eric 16:46:20 ... would need that same capability for these new RowBlankNode things 16:46:37 MacTed: the question comes down to doing a dump of the data 16:46:39 +q 16:46:54 q+ to mention jena api 16:48:07 ack next 16:48:09 ack next 16:48:10 cygri, you wanted to mention jena api 16:48:19 dmcneil: I am talking about within the context of a query, not between queries 16:48:38 cygri: ted, what you say is completely correct for SPARQL queries 16:48:42 -??P6 16:48:51 ... can't tell if the blank node ids between queries are the same resource or not 16:48:59 ... per the spec 16:49:09 ... but, looking at the jena api 16:49:31 ... must look at constraints of jena api 16:49:42 ... which expects blank nodes to have a persistent identity 16:51:04 ... this leads to the need to go back to the database and graph properties for a specific blank node 16:51:18 ted: once you go back to the database, you cannot rely on the blank nodes IDs 16:51:20 yes 16:51:37 +q 16:52:23 cygri: the RDF working group has been arguing about this for a year 16:53:26 several people said they don't want to talk about blank node ID 16:53:31 dmcneil: this is relevant 16:53:44 dmcneil, i believe it is all worked out in the proposal. at least i thought about it hard. i need to answer your email, sorry i didn't get around to do that yet 16:53:59 ... because we worked out the blank node ID semantics carefully for the existing mechanism, not so much for the new RowBlankNode 16:54:00 +q 16:54:11 ashok: we have two options on the table, how to proceed 16:54:19 Is that all the disagreement about? Nothing else? 16:55:08 dmcneil: there is a third option, let the DM use R2RML views to implement this 16:55:22 ashok: I like that option, but I thought richard disagreed 16:55:38 cygri: my position is: it can be done, but it cannot be done in a way that conforms with the spec 16:55:46 ... because it requires vendor-specific SQL 16:55:54 ... it is going to be slow, and no way to make it fast 16:56:33 q? 16:59:26 ashok: some would argue that since SQL is such a sprawling spec, anytime someone writes SQL they are using vendor-specific SQL 17:01:16 cygri: if the argument is that the DM can be implemented on R2RML, then the question is how? 17:01:44 dmcneil: the DM is implemented on a specific database, so the DM generates an R2RML view that uses that database's features to implement the DM 17:02:02 cygri: that leads to very in-efficient implementations 17:02:54 Identical rows: We do not care whether they are assigned Ids <1,2,...,n> or (because their contents are identical) 17:03:38 cygri: stable identifiers are needed so the next query produces the same identifiers 17:03:44 macted: you are not going to get it 17:03:54 ... the database is free to change the ids 17:04:12 ashok: saying the identifiers are stable is going beyond the spec 17:04:32 ... three choices: 17:04:40 ... 1) do nothing 17:04:45 Two rows and may be assigned Ids _:b1 and _:b2 during one access and _:b2 and _:b1 during another access. Is this acceptable or not? 17:04:55 ... 2) Richard's proposal 17:05:00 q+ 17:05:05 ... 3) Souri's proposal 17:05:08 -q 17:05:09 souri, yes it is. but if they are _:b3 and _:b4 that's not acceptable 17:05:20 ... how do we come to agreement? 17:05:37 macted: do nothing is not an option because currently there is a "must" in there 17:05:45 cygri: "do nothing" means 17:05:57 ... DM and R2RML are two entirely separate beasts 17:06:05 ... just a violation of the basic premise of the DM 17:06:06 +q 17:06:30 eric: it could still be the default behavior for all but the case of duplicate keys 17:06:42 ack next 17:06:46 souri: still trying to understand the requirement 17:06:57 seema has joined #rdb2rdf 17:07:09 ... does the blank node ID need to be stable, or can it change? 17:07:27 cygri: yes, they can be scrambled but they cannot change to a different set of blank node IDs 17:07:34 macted: what!? 17:07:43 _blank1 a, b, c1 17:07:43 _blank2 a, b, c2 17:07:56 _blank2 a, b, c1 17:07:56 _blank1 a, b, c2 17:08:29 cygri: oh, then i misunderstood the example 17:08:50 ... still the case without a primary key, right? 17:09:11 ... in this case it would have to be a stable blank node label across queries 17:09:26 souri: inside the same translation, you may access the table twice, in two places 17:09:31 ... the access order may be different 17:09:43 ... based on that you may not be able to join the same rows 17:10:00 ... so this even applies within the scope of the same query process 17:10:11 macted: how is this relevant? 17:10:22 souri: if we generate a blank node ID for a row 17:10:37 ... then the same row from different parts of a query should generate the same blank node ID so they can be joined 17:13:31 cygri: yes, that is what is required, and that is part of what it makes it so difficult 17:14:18 macted: my model was that the blank node IDs are generated on the result set, not during the query 17:14:24 {?p :fname ?fnm} ... complex stuff ... {?p :lname ?lnm} 17:14:30 ... this group is not about translating SPARQL to SQL 17:15:45 cygri: the RDF concepts doc says "you don't know anything about blank nodes except whether they are the same" 17:16:00 macted: but that applies to query results, not the underlying data 17:17:27 q+ 17:18:33 ericP: counter examples: jena, 4store, … 17:18:36 +1 ericP 17:18:49 ack next 17:18:58 ack next 17:19:51 dmcneil: I think Richard's earlier statement that "doing nothing means DM and R2RML are completely separate" is quite overstated 17:20:12 souri: I am still trying to understand the target for generating blank node IDs 17:20:26 macted: for DM we need to maintain cardinality 17:20:35 ... i.e. every row in the result set 17:20:56 q+ ericP 17:22:01 ack next 17:22:29 eric: teh value of a bnode is not something that can be referenced later 17:22:33 s/teh/the/ 17:23:07 ... the jena API over a SPARQL endpoint does not allow bnode IDs to be submitted again in subsequent queries 17:23:16 q? 17:23:23 ... so it is ok that Jena over SQL does not provide persistent bnode IDs 17:23:34 ashok: we are over time 17:23:40 ... it is not clear how to make progress 17:24:33 ... need either new proposals, or someone to change their position 17:24:47 juan: can we summarize the current options and who supports them? 17:25:08 ashok: if we only talk about changes to R2RML, then yes there are 3 options 17:25:13 ... 1) do nothing 17:25:22 ... 2) Richard's idea - add RowBlankNode 17:25:42 ... 3) Souri's idea - add psuedo-column: "rowidentifier" 17:26:17 .... 4) add wording saying "the DM is different in this special case" 17:26:48 macted: I am less clear on these options than when we started 17:26:52 2) and 3) are variations of “fix R2RML” 17:26:59 4) is proposal A from last time 17:27:14 1) is B from last time 17:27:54 - +1.603.897.aaff 17:27:54 macted: what happened to last week's proposal to strike the word "should" 17:27:59 cygri: that is now option 4 17:28:44 macted: i still like 4 17:28:48 juan: me too 17:29:25 >>> [[ 17:29:25 >>> If the table has no primary key, the row node is a blank node. Distinct blank nodes MUST be generated for rows with distinct column values. For duplicate rows with identical values, implementations SHOULD generate a fresh blank for each duplicate row (resulting in a non-lean RDF graph [RDF Semantics]). However, if the underlying database system does not provide any means to reliably differentiate among the rows, then 17:29:25 implementations MAY re-use the same blank node for multiple duplicate rows (resulting in a lean RDF graph). Implementations SHOULD document and advertise their chosen behavior. 17:29:25 >>> ]] 17:29:27 The above replaces the following sentence in the current DM spec -- 17:29:30 >>> [[ 17:29:31 >>> If the table has no primary key, the row node is a fresh blank node that is unique to this row. 17:29:33 >>> ]] 17:31:02 eric: I object to losing cardinality on the basis of something that R2RML cannot do 17:31:15 ashok: would you be willing to word-smith it? 17:31:33 eric: no, because I disagree with the premise of losing cardinality 17:32:07 ... from the DMs perspective there is no reason the MUST should be relaxed to SHOULD 17:32:31 i don't know how to handle it *with acceptable performance*, i should say 17:33:11 eric: why are we breaking interop on the DM because R2RML cannot handle this case? 17:33:19 +q 17:33:51 q+ 17:34:02 ack next 17:34:31 dmcneil: but R2RML can handle it, use R2RML views 17:36:16 q+ 17:36:33 q+ to answer: because the DM is the default mapping for R2RML 17:36:40 eric: why don't we just tell R2RML users that they are losing cardinality in these cases? 17:37:15 q? 17:37:20 ack next 17:37:55 eric: there is no issue in DM, the issue is an interop issue in R2RML 17:38:17 cygri: I was working on the assumption that the DM is a default mapping for R2RML 17:38:32 ... that should answer the question of why I expect the DM to accomodate the capabilities of R2RML 17:39:01 ... if there are restrictions in R2RML, which is 1.0, then... 17:39:28 eric: we have a case where R2RML cannot preserve cardinality 17:39:38 ... we have a DM which provides default mapping for R2RML 17:40:03 ... the places where R2RML cannot preserve cardinality, should be identified as the places where problems will occur 17:40:57 cygri: R2RML as it stands, the user specifies the identities of the rows 17:41:02 +1 17:41:12 ... by specifying the columns or the templates 17:41:28 ... if they lose cardinality they lose it because of how it was mapped, it is transparent 17:41:29 +1 17:41:59 ... the other point is: what do you suggest for me as an R2RML implementor 17:42:10 ... push a button and get an automatic mapping 17:42:29 ... what should that mapping be in the case of a table without primary keys 17:42:44 eric: it should be what it is, just not promise to be the DM 17:42:55 cygri: how to communicate to users that it is not the DM? 17:43:10 eric: tell them R2RML does not have the ability to preserve cardinality in this case 17:43:24 cygri: how should we describe the default mapping we implement? 17:43:40 eric: say "it is similar to the DM except repeated rows will be collapsed into one" 17:43:49 cygri: can we write that into the R2RML spec? 17:43:58 eric: yes, that would be good 17:44:13 cygri: so remove the R2RML reference from DM 17:44:22 ... instead add a sentence to the R2RML spec 17:44:43 ... saying the default mapping is the "DM - repeated rows" 17:45:08 ashok: why remove R2RML ref from DM? 17:45:16 The Direct Mapping is intended to provide a default behavior for http://www.w3.org/TR/2012/CR-r2rml-20120223/ [R2RML]. 17:45:21 cygri: there is a sentence in DM saying DM is default behavior for R2RML 17:45:25 ... that sentence must go 17:45:33 eric: we could add a caveat to it 17:45:56 ashok: eric can you work with richard and david on this? 17:46:07 eric: yes, i think the wording is close to what ivan proposed 17:46:27 cygri: it would also have to address the repeated rows caveat 17:47:19 seems to be concensus that we will try to develop wording around this 17:47:40 ashok: we will try to work it out in email 17:47:44 thanks :) 17:48:18 -cygri 17:48:20 -dmcneil 17:48:20 - +1.603.897.aaee 17:48:21 -EricP 17:48:21 -Ashok_Malhotra 17:48:23 -MacTed 17:48:28 -juansequeda 17:48:29 SW_RDB2RDF()12:00PM has ended 17:48:29 Attendees were +1.781.273.aaaa, MacTed, Ashok_Malhotra, +1.314.395.aabb, +3539149aacc, cygri, +575737aadd, juansequeda, nunolopes, dmcneil, +1.603.897.aaee, +1.603.897.aaff, EricP 17:48:37 rrsagent, make logs public 17:48:50 rrsagent, make minutes 17:48:50 I have made the request to generate http://www.w3.org/2012/05/15-RDB2RDF-minutes.html Ashok