14:56:50 RRSAgent has joined #RDB2RDF 14:56:50 logging to http://www.w3.org/2009/10/22-RDB2RDF-irc 14:56:51 +??P4 14:56:52 RRSAgent, make logs world 14:56:54 Zakim, this will be 7322733 14:56:55 Meeting: RDB2RDF Working Group Teleconference 14:56:55 Date: 22 October 2009 14:56:55 ok, trackbot; I see SW_RDB2RDF()11:00AM scheduled to start in 4 minutes 14:56:56 me dialing in 14:58:11 Marcelo has joined #rdb2rdf 14:59:02 Ashok has joined #rdb2rdf 14:59:26 Zakim, who is here? 14:59:26 I notice SW_RDB2RDF()11:00AM has restarted 14:59:27 On IRC I see Ashok, Marcelo, RRSAgent, Ahmed, angela_UNITN, Seema, metatomix, mhausenblas, nunolopes, MacTed, hhalpin, iv_an_ru, Zakim, trackbot 14:59:29 On the phone I see Seema, ??P4, metatomix, +39.046.188.aaaa, ??P25, mhausenblas 14:59:44 whalb has joined #RDB2RDF 14:59:50 lma has joined #RDB2RDF 15:00:07 +OpenLink_Software 15:00:19 Zakim, OpenLink_Software is temporarily MacTed 15:00:19 Zakim, aaaa is angela_UNITN 15:00:23 Zakim, mute me 15:00:33 +MacTed; got it 15:00:34 +angela_UNITN; got it 15:00:37 MacTed should now be muted 15:00:55 +hhalpin 15:01:03 Zakim, who's on the phone? 15:01:11 On the phone I see Seema, ??P4, metatomix, angela_UNITN, ??P25, mhausenblas, MacTed (muted), hhalpin 15:01:36 +Ashok_Malhotra 15:01:44 +Souri 15:01:51 +whalb 15:02:02 Zakim, nunolopes is with mhausenblas 15:02:04 +nunolopes; got it 15:02:22 Chair: Ahmed 15:02:29 +LeeF 15:02:51 I'm trying to connect... 15:02:54 +??P39 15:03:00 cygri has joined #rdb2rdf 15:03:12 Zakim, cygri is with mhausenblas 15:03:12 +cygri; got it 15:03:23 scribe: hhalpin 15:03:23 cgi-irc has joined #RDB2RDF 15:03:36 Zakim, next agenda 15:03:36 agendum 3. ""WG Tools"" taken up [from hhalpin] 15:03:42 Souri has joined #rdb2rdf 15:04:03 Zakim, who's here? 15:04:03 On the phone I see Seema, ??P4, metatomix, angela_UNITN, ??P25, mhausenblas, MacTed (muted), hhalpin, Ashok_Malhotra, Souri, whalb, LeeF, ??P39 15:04:05 On IRC I see Souri, cgi-irc, cygri, lma, whalb, Ashok, Marcelo, RRSAgent, Ahmed, angela_UNITN, Seema, metatomix, mhausenblas, nunolopes, MacTed, hhalpin, iv_an_ru, Zakim, trackbot 15:04:07 mhausenblas has mhausenblas, nunolopes, cygri 15:04:38 Zakim, list agenda? 15:04:41 I see 5 items remaining on the agenda: 15:04:49 3. "WG Tools" [from hhalpin] 15:04:51 4. "Goals and Objectives" [from hhalpin] 15:04:56 5. "AOB" [from hhalpin] 15:04:59 6. "Convene RDB2RDF Meeting" [from hhalpin] 15:05:01 7. "Orri Erlang (OpenLink)" [from hhalpin] 15:05:43 Zakim, drop agendum 3 15:05:43 agendum 3, "WG Tools", dropped 15:05:50 Zakim, drop agendum 4 15:05:50 agendum 4, "Goals and Objectives", dropped 15:05:52 Zakim, drop agendum 5 15:05:52 agendum 5, "AOB", dropped 15:05:56 Zakim, next agendum 15:05:56 agendum 6. ""Convene RDB2RDF Meeting"" taken up [from hhalpin] 15:06:20 +??P52 15:06:21 Are we doing a roll call? 15:06:26 yes, Souri 15:06:27 Ahmed is doing a roll call. 15:06:30 Zakim, who's here? 15:06:30 On the phone I see Seema, ??P4, metatomix, angela_UNITN, ??P25, mhausenblas, MacTed (muted), hhalpin, Ashok_Malhotra, Souri, whalb, LeeF, ??P39, ??P52 15:06:33 On IRC I see Souri, cgi-irc, cygri, lma, whalb, Ashok, Marcelo, RRSAgent, Ahmed, angela_UNITN, Seema, metatomix, mhausenblas, nunolopes, MacTed, hhalpin, iv_an_ru, Zakim, trackbot 15:06:36 Zakim, ??P52 is me 15:06:37 mhausenblas has mhausenblas, nunolopes, cygri 15:06:39 +iv_an_ru; got it 15:06:47 wow, I've got the connection! 15:06:51 PROPOSED: to approve RDB2RDF Weekly -- 15th October 2009 as a true record 15:06:56 +1 15:07:04 (note that I fixed the angela's name!) 15:07:11 thanks 15:07:17 http://www.w3.org/2009/10/15-RDB2RDF-minutes.html 15:07:26 (need another +1) 15:08:24 +Stevef 15:08:39 Orri will be here momentarily. timezone mixup. 15:08:46 soeren has joined #RDB2RDF 15:09:27 RESOLVED: approved RDB2RDF Weekly -- 15th October 2009 as a true record 15:10:27 RESOLVED: RDB2RDF Weekly meets next Oct 29th. 15:10:27 Michael: background about DST see http://www.timeanddate.com/time/aboutdst.html 15:12:12 RESOLVED: To meet at same time for next 2 weeks just to prevent daylights saving errors, and then adopt whatever the best time from the poll. 15:12:41 zakim, unmute me 15:12:41 MacTed should no longer be muted 15:13:11 Zakim, mute me 15:13:13 MacTed should now be muted 15:13:22 mhausenblas: Everyone tried wiki? 15:13:32 ... your normal w3c login should work 15:13:40 ... you can put your thoughts there 15:14:35 hhalpin: any first? 15:14:51 Do you have the URL for the Wiki handy? 15:15:03 http://www.w3.org/2001/sw/rdb2rdf/wiki/Main_Page 15:15:53 hhalpin: is talking about scribing 15:16:04 ... and I'm taking more notes 15:16:06 ... and so on. 15:16:16 me: testing my username 15:16:24 (as a last resort, I can provide some comments to Orri's slides, finally, that's my SPARQL2SQL translator. The problem is that my connection is bad as usual) 15:16:39 this is ben 15:16:46 http://www.w3.org/2001/sw/rdb2rdf/track/actions/open 15:16:51 To see open actions. 15:16:59 Ben has joined #RDB2RDF 15:17:04 That's better 15:17:23 ACTION: hhalpin to double-check EricP's status and have him send in his proposals about mapping 15:17:23 Created ACTION-8 - Double-check EricP's status and have him send in his proposals about mapping [on Harry Halpin - due 2009-10-29]. 15:17:36 http://www.w3.org/2001/sw/rdb2rdf/track/actions/open 15:17:58 ACTION [DONE]: Contact Soeren for presenting Triplify and ask Richard to provide D2R documentation a week in advance of his presentation 15:17:58 Sorry, couldn't find user - [DONE] 15:18:09 ACTION [DONE]: mhausenblas Contact Soeren for presenting Triplify and ask Richard to provide D2R documentation a week in advance of his presentation 15:18:09 Sorry, couldn't find user - [DONE] 15:18:18 ACTION-2 15:18:20 ACTION-2? 15:18:20 ACTION-2 -- Michael Hausenblas to contact Soeren for presenting Triplify and ask Richard to provide D2R documentation a week in advance of his presentation -- due 2009-10-08 -- OPEN 15:18:20 http://www.w3.org/2001/sw/rdb2rdf/track/actions/2 15:18:39 close ACTION-2 15:18:39 ACTION-2 Contact Soeren for presenting Triplify and ask Richard to provide D2R documentation a week in advance of his presentation closed 15:19:02 ACTION-3? 15:19:02 ACTION-3 -- Michael Hausenblas to draft a proposal for presentation order on the Wiki -- due 2009-10-08 -- OPEN 15:19:02 http://www.w3.org/2001/sw/rdb2rdf/track/actions/3 15:19:05 A pointer to Zakim tutorial would be good. 15:19:07 Ben has joined #RDB2RDF 15:19:07 close ACTION-3 15:19:07 ACTION-3 Draft a proposal for presentation order on the Wiki closed 15:19:22 http://www.w3.org/2001/12/zakim-irc-bot.html 15:19:28 Ahmed has joined #RDB2RDF 15:19:35 http://www.w3.org/2002/01/UsingZakim 15:19:46 and also 15:19:47 http://www.w3.org/2002/03/RRSAgent 15:20:13 close ACTION-3 15:20:13 ACTION-3 Draft a proposal for presentation order on the Wiki closed 15:20:17 close ACTION-4 15:20:17 ACTION-4 Draft first invitation mail to WG closed 15:20:22 close ACTION-5 15:20:22 ACTION-5 Re-send Marcelo invited expert form closed 15:20:26 close ACTION-6 15:20:26 ACTION-6 Add material to the Wiki closed 15:20:28 close ACTION-7 15:20:29 ACTION-7 Send email out to europeans explaining daylight savings time. closed 15:21:42 http://www.w3.org/2001/sw/rdb2rdf/wiki/images/9/96/Relational2RDF.ppt 15:21:45 Zakim, next agendum 15:21:45 agendum 7. ""Orri Erlang (OpenLink)"" taken up [from hhalpin] 15:22:02 ACTION: mhausenb to put Ahmed's proposals regarding R2RML requirements onto the Wiki 15:22:02 Created ACTION-9 - Put Ahmed's proposals regarding R2RML requirements onto the Wiki [on Michael Hausenblas - due 2009-10-29]. 15:22:07 I think Ivan can give the presentation. 15:22:14 yes 15:22:16 ivan: I'll give the presentation 15:22:26 ivan: I'm Ivan from OpenLink 15:22:47 ivan: I'm responsible for all the SPARQL 15:22:55 ivan: at least the transformation parts. 15:23:18 (background about Ivan see http://www.linkedin.com/in/ivanmikhailov) 15:23:42 ivan: mapping from relational to RDF why? 15:24:01 ah hah -- Zakim says conference is full, so Orri cannot get in. 15:24:30 -MacTed 15:24:49 ivan: we want to save time 15:24:57 I've dropped phone 15:25:19 + +87875aabb 15:25:39 ... : we want to allow relational data of course to be accessed by RDF. 15:25:42 Orri? 15:25:54 Zakim, aabb is Orri 15:25:54 +Orri; got it 15:26:05 ACTION: hhalpin will add 10 more participants to telecon 15:26:05 Created ACTION-10 - Will add 10 more participants to telecon [on Harry Halpin - due 2009-10-29]. 15:26:20 Please mention the slide# or slide title 15:26:41 ivan - can you put the slide number in IRC? 15:26:52 orri: let's categorize structured data on the Web 15:27:01 ... : by exposing all content from any Web 2.0 application 15:27:23 ... : data-warehouse people who are deeply concerned with different identifiers, data not joining cleanly 15:27:36 orri: these two sites have different requirements 15:27:41 still slide #2 methinks 15:27:41 ... if you have a straightforward mapping 15:27:50 ... then you just extract RDF and then load it into RDF. 15:27:57 ... if you put more effort into mapping 15:28:00 ... you can do it both ways. 15:28:06 ... translate SPARQL into SQL 15:28:19 ... then you don't have to make a logically equivalent RDF store. 15:28:57 slide #4 15:29:19 orri: this involves converting everything 15:29:26 ... to RDF but it has pros and cons 15:29:46 ... the main pro is any query you can do a variable in predicate 15:29:56 ... high risk that it won't work well in SQL, big unions 15:30:12 ... another pro is if you want to materialize inference. 15:30:30 ... union and join across inferred and non-inferred triples, no end of trouble 15:30:36 ... lots of different data sources 15:30:46 ... pouring them all into bucket of RDF might be a good solution 15:30:56 ... so no a priori need to give an exact schema 15:31:11 ... if it does not link or havea an intersection, the benefit is neglible. 15:31:17 orri: the cons are updating 15:31:19 ... latency 15:31:28 ... large space, bigger than equivalent relational data. 15:31:38 ... lots of research into compression of RDF. 15:31:52 ... the task specific though will always be more compact and more efficient. 15:32:16 whose speaking there? 15:32:20 Ashok 15:32:32 Ashok: the queries will be slower against RDF, no? 15:32:38 orri: certainly the case right now 15:33:10 slide #5 15:33:17 orri: what are benefits of mapping on demand? 15:33:25 No synchronization to do. 15:33:35 orri: don't have to cross multiple databases 15:33:43 ... if you can push all work into single relational databases 15:33:47 ... you get all benefits of optimization 15:33:53 ... if you want to add sources 15:34:10 ... then you don't have to copy into RDF, saving space 15:34:25 orri: Cons are the same, non-SQL sources, inferences, but experience 15:34:34 ... has shown that inference can be done in mapping 15:34:40 ... in particular sub-classes and sub-properties 15:34:45 slide #6 15:34:58 orri: about virtuoso 15:35:20 ... we do mapping of SPARQL to SQL schema from any relational database 15:35:31 ... not just our own, but do the rest via data access drivers 15:35:53 ... this would allow us to join DB2 and Oracle for example 15:36:01 ... we map it all into Virtuoso SQL 15:36:08 ... then we can deal with it as a single SQL dialect 15:36:18 ... and so do RDF mapping across single SQL dialect 15:36:30 ... we store physical quads 15:36:43 ... we store up to 8 billion quads, with full text index 15:37:00 ... can do ranking of entities, and do other relational database functionality, transactions etc. 15:37:06 slide #7 15:37:09 ... for mapping to be useful 15:37:19 ... one of the large factors 15:37:26 ... SPARQL as a query language 15:37:32 ... lacked aggregaton and GROUP BY 15:37:52 ... but all vendors added, albeit in a vendor-specific manner 15:38:00 ... SPARQL 1.1 should standardize all of these. 15:38:18 ... SPARQL can without reliance on extensions 15:38:23 ... derive subqueries etc. 15:38:28 ... within the standard. 15:38:34 ... if you have this mapping. 15:38:49 ... some mappings are still difficult 15:39:07 ... we would prefer it if a single SQL statement were produced even if it was ugly 15:39:16 ... as to not have problems re exectuion plans 15:39:23 ... you must be fairly clever in mapping 15:39:27 ... to do mapping on fly 15:39:47 ... must know when some kind of RDF entity (person, article) can come from any table 15:40:06 ... union of customers across many tables, queries then become joins between unions, a mess to optimize 15:40:19 ... so what can be optimized in mapping layer! 15:40:33 ... do NOT do needless joins, this mus be done in mapping 15:40:35 slide #8 15:40:39 ... cases for integratin 15:40:56 ... similar but hetereogenous schemas combined 15:41:11 ... each application has users, comment, comment field 15:41:15 ... but each slightly different 15:41:20 ... so we map them all into SIOC vocabulary 15:41:34 ... six relational tables 15:41:57 ... so union of all types of comments to union of all types of posts 15:42:02 ... intelligence is needed there. 15:42:16 Ahmed: How is foreign key being handled? 15:42:26 ... better covered by direct reference, what does that mean? 15:42:30 orri: we handle foreign keys 15:42:33 ... just like primary keys 15:42:37 ... so say order has primary key 15:42:43 ... of order number and line number 15:42:46 ... so we declare it exactly 15:42:51 ... we use the names 15:43:09 ... give exactly same declaration of one key 15:43:12 ... as to another. 15:44:18 ... say that ORDER has customer 15:44:23 ... translate ORDER primary key 15:44:26 q+ 15:44:27 ... into constant 15:44:35 ... then translates foreign key to a URI. 15:44:48 ... print some expression 15:44:49 ack me 15:45:04 rcygi: I can explain it 15:45:07 ... you define a function 15:45:14 ... your input to a function is the IDs with an integer 15:45:19 ... output is a URI 15:45:23 ... if you define a subject 15:45:29 ... you put the primary key into that function 15:45:43 ... then the order number would go into the order identifier function, which generates the subject 15:45:45 s/rcygi/cygri 15:45:51 ... so to get foreign key to order table 15:46:12 ... put foreign key value, a column on order table, into the new table 15:46:19 ... and you do the same thing on the customer table 15:46:22 ... as you use the same function 15:46:26 ... these two will match in the end 15:46:55 orri: you are joining you compare functions to foreign key to primary key 15:47:00 ... but if you know they are bijections 15:47:14 ... you can just do normal joins without running functions 15:47:21 orri: normalization 15:47:29 ... you may want to flatten things 15:47:36 ... extra depth, as in SQL views 15:47:48 ... more familiar way is to do it via SQL views 15:47:55 ... qualify a mapping rule 15:48:10 ... this subject will have this subject with this value ONLY if a certain SQL condition holds 15:48:20 ... so we can start making things arbitrary complex 15:48:38 ... policy functions can then be implemented in the view, but also in the database 15:48:49 ... so we get benefits that you don't get from RDF warehouse view. 15:49:18 ... kilometers to miles, store once, have a conversion function. like computed column in a relational table. 15:49:24 ... for each entity for each mapping 15:49:29 slide #9 15:49:34 ... you need a way to determine a URI from a key 15:49:44 ... %s done like in C 15:49:51 ... then a place holder for domain name 15:49:56 ... multi-part keys 15:50:01 ... line number, order number 15:50:45 slide #10 15:50:50 TPC-H 15:50:53 minimum baseline 15:50:55 q+ 15:51:15 these queries given direct into SPARQL. 15:51:19 ack Souri 15:51:21 ack Souri 15:51:27 Souri: Question about multiple tables 15:51:47 ... what if I made a function that gave things the same URIs? 15:51:55 ... should we prevent that? 15:52:00 MacTed? 15:52:02 orri: we don't always forbid that. 15:52:24 ... employee and dept and mixing, but blogposts and wiki articles you might want to combine 15:52:35 ... ask for number 16 15:52:40 ... so you might want that. 15:52:49 iv_an_ru or MacTed can you provide the correct URI for the TPC-H demo please? 15:53:04 The URL http://demo.openlinksw.com/tpc-h/ does not work 15:53:09 ... might want to infer that some URIs are distinct and should not join 15:53:26 ... like joining a qualified subject to an unqualified one 15:53:36 well spotted Marcelo, hence I asked Ivan/Ted ... ;) 15:54:03 orri: I will fix URIs! 15:54:50 ... all of these 22 queries 15:55:17 ... they got SPARQL expression of similar length, all single SQL converted to single SPARQL queries 15:55:49 ... for complicated queries, overhead of mapping can be neglible 15:55:56 ... some tweaks in the SQL 15:56:05 ... but we take care of these idioms in a virtual database layer 15:56:09 ... barring pathological queries 15:56:20 ... the mapping should not generate substantial penalties 15:56:31 slide #11 15:56:33 ... keys in URI 15:56:42 ... but the URI does not specifiy what table it came from 15:56:55 ... the relational database does not know some of these things aren't meaningful 15:57:00 ... little any database can do 15:57:05 ... intelligence must be in mapping layer 15:57:24 ... one must know about databases 15:57:39 ... in particular what a relational optimizer can and can't do. 15:57:43 ... where data is located 15:57:47 .. avoiding joining too much 15:57:53 ... a cost model 15:58:12 ... so we need to know how databases work. 15:58:16 lma has joined #RDB2RDF 15:58:42 Ahmed: Are you expecting mapping to have access to remote database stats 15:58:54 orri: If we have some idea 15:58:58 ... we can say, import cardinality 15:59:05 ... depends on type of remote database 15:59:11 ... so we try to push things to remote database 15:59:20 ... o/w use a cost model to determine joins 15:59:37 Ahmed: We have a difficulty in keeping these statistics current even with just relational databases 15:59:46 Orri: We keep them current as much as we can. 16:00:18 ... we have maps into integrating vocabularies 16:00:26 ... we are presently working on enterprise accounts 16:00:42 slide #14 16:01:03 orri: we use this all internally at OpenLink 16:01:09 ... all our customers have URIs. 16:01:14 ... questions? 16:01:21 Ahmed: If you have a 50 terabyte warehouse 16:01:29 ... if you convert that all into RDF 16:01:33 ... how big is blow-up? 16:01:39 orri: depends on compression 16:01:42 we're already at the top of the hour, so we should do a round-up 16:01:54 ... columns are generally preferred 16:01:58 ... so we try to stick to that. 16:01:58 next week soeren is planned 16:02:00 see http://www.w3.org/2001/sw/rdb2rdf/wiki/Initial_Round_of_Presentations 16:02:07 ... that's the theory 16:02:12 ... practice is not quite as good 16:02:18 ... 4-5 times bigger is not unreasonable 16:02:27 ... so we need to get that better 16:02:31 ... but reasons are not fundamental 16:02:41 ... we imagine type-specific compression would work in rdf 16:03:14 Ahmed: What's average of slow-down of queries with mapping? 16:03:19 orri: an order of magnitude 16:03:31 ... we don't want to replace a warehouse with RDF at the moment. 16:03:37 http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/ 16:03:43 mhausenblas: send questions to mailing list 16:03:57 ... soeren is up next week! 16:04:05 ... I might become a father on that day! 16:04:45 ACTION: hhalpin to ask EricP for back-up 16:04:45 Created ACTION-11 - Ask EricP for back-up [on Harry Halpin - due 2009-10-29]. 16:04:49 Congrats, Soeren! 16:05:01 http://www.slideshare.net/soeren1611/triplify-1341084 16:05:23 orri: so on warehouse side 16:05:32 ... if you have regular relational warehouse 16:05:36 ... it's not RDF's strong point 16:05:50 -iv_an_ru 16:06:08 ... RDF is unbeatable for structured queries against hetereogenous data 16:06:10 (oops, I've lost the connection) 16:06:24 ... strong need for queries and schema-less, then we win. 16:07:13 thanks orri! 16:07:15 -??P4 16:07:16 -metatomix 16:07:16 -whalb 16:07:18 -Orri 16:07:18 -Stevef 16:07:19 -mhausenblas 16:07:20 trackbot, end meeting 16:07:20 Zakim, list attendees 16:07:21 -LeeF 16:07:21 RRSAgent, please draft minutes 16:07:21 I have made the request to generate http://www.w3.org/2009/10/22-RDB2RDF-minutes.html trackbot 16:07:22 RRSAgent, bye 16:07:22 I see 4 open action items saved in http://www.w3.org/2009/10/22-RDB2RDF-actions.rdf : 16:07:22 ACTION: hhalpin to double-check EricP's status and have him send in his proposals about mapping [1] 16:07:22 recorded in http://www.w3.org/2009/10/22-RDB2RDF-irc#T15-17-23 16:07:22 ACTION: mhausenb to put Ahmed's proposals regarding R2RML requirements onto the Wiki [2] 16:07:22 recorded in http://www.w3.org/2009/10/22-RDB2RDF-irc#T15-22-02 16:07:22 ACTION: hhalpin will add 10 more participants to telecon [3] 16:07:22 recorded in http://www.w3.org/2009/10/22-RDB2RDF-irc#T15-26-05 16:07:22 ACTION: hhalpin to ask EricP for back-up [4] 16:07:22 recorded in http://www.w3.org/2009/10/22-RDB2RDF-irc#T16-04-45 16:07:22 -Souri 16:07:23 Thanks! 16:07:25 -Seema 16:07:26 As of this point the attendees have been Seema, metatomix, +39.046.188.aaaa, mhausenblas, MacTed, angela_UNITN, hhalpin, Ashok_Malhotra, Souri, whalb, nunolopes, LeeF, cygri, 16:07:31 ... iv_an_ru, Stevef, +87875aabb, Orri 16:07:33 -??P39 16:07:36 -angela_UNITN 16:07:38 metatomix has left #rdb2rdf