16:59:21 RRSAgent has joined #RDB2RDF 16:59:21 logging to http://www.w3.org/2009/12/15-RDB2RDF-irc 16:59:23 RRSAgent, make logs world 16:59:23 Zakim has joined #RDB2RDF 16:59:25 Zakim, this will be 7322733 16:59:25 ok, trackbot; I see SW_RDB2RDF()12:00PM scheduled to start in 1 minute 16:59:26 Meeting: RDB2RDF Working Group Teleconference 16:59:26 Date: 15 December 2009 16:59:36 Ashok has joined #rdb2rdf 16:59:42 Chair: Michael 16:59:57 soeren has joined #RDB2RDF 17:00:02 Agenda: http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2009Dec/0008.html 17:00:16 jsequeda has joined #RDB2RDF 17:00:25 Marcelo has joined #rdb2rdf 17:01:04 Do we have telcons on Dec 22 and 27 ? 17:01:15 no, Ashok ;) 17:01:22 slackers 17:01:27 Zakim, this is 7322733 17:01:27 ok, MacTed; that matches SW_RDB2RDF()12:00PM 17:01:31 Zakim, who's here? 17:01:31 On the phone I see Seema, +43.316.876.aaaa, +1.562.249.aabb, +039046188aacc, +39.046.188.aadd, Ashok_Malhotra, Souri, [IPcaller], EricP, OpenLink_Software 17:01:34 On IRC I see Marcelo, jsequeda, soeren, Ashok, Zakim, RRSAgent, angela_UNITN, whalb, Seema, Souri, HeikoStoermer, mhausenblas, MacTed, iv_an_ru, trackbot, ericP 17:01:46 +mhausenblas 17:01:48 Thanks, Michael! 17:01:48 Zakim, OpenLink_Software is temporarily MacTed 17:01:48 +MacTed; got it 17:01:51 Zakim, mute me 17:01:51 MacTed should now be muted 17:01:57 zakim, [IPcaller] is soeren 17:01:57 +soeren; got it 17:02:01 cygri has joined #rdb2rdf 17:02:03 zakim, aaa is me 17:02:03 sorry, whalb, I do not recognize a party named 'aaa' 17:02:09 Zakim, cygri is with me 17:02:09 +cygri; got it 17:02:11 zakim, aaaa is me 17:02:11 +whalb; got it 17:02:15 scribenick: cygri 17:02:23 RRSAgent, draft minutes 17:02:23 I have made the request to generate http://www.w3.org/2009/12/15-RDB2RDF-minutes.html mhausenblas 17:02:30 rrsagent, make logs public 17:02:40 Zakim, who's here? 17:02:41 On the phone I see Seema, whalb, +1.562.249.aabb, +039046188aacc, +39.046.188.aadd, Ashok_Malhotra, Souri, soeren, EricP, MacTed (muted), mhausenblas 17:02:46 mhausenblas has mhausenblas, cygri 17:02:50 On IRC I see cygri, Marcelo, jsequeda, soeren, Ashok, Zakim, RRSAgent, angela_UNITN, whalb, Seema, Souri, HeikoStoermer, mhausenblas, MacTed, iv_an_ru, trackbot, ericP 17:03:19 +[IPcaller] 17:03:21 topic: Admin 17:03:30 Topic: Admin 17:04:08 MacTed = Ted Thibodeau 17:04:16 correct 17:04:46 aacc is me 17:04:53 Zakim, aacc is angela_UNITN 17:04:53 +angela_UNITN; got it 17:04:54 aadd is heiko 17:05:08 Zakim, aadd is HeikoStoermer 17:05:08 +HeikoStoermer; got it 17:05:13 right 17:05:33 +[IPcaller.a] 17:05:46 Zakim, aabb is jsequeda 17:05:46 +jsequeda; got it 17:05:49 present+ Orri 17:06:18 PROPOSAL: Accept the minutes of the 8 December 2009 telecon, 17:06:19 http://www.w3.org/2009/12/08-RDB2RDF-minutes.html 17:06:24 +1 17:06:26 +1 17:06:28 +1 17:06:30 +1 17:06:48 RESOLVED: Accept the minutes of the 8 December 2009 telecon 17:06:52 Use Case planning 17:06:55 Topic: Use Case planning 17:07:14 mhausenblas: http://www.w3.org/2001/sw/rdb2rdf/wiki/Use_Cases_and_Requirements 17:07:23 mhausenblas: invite ppl to add their use cases 17:08:12 Ashok: format? HTML or only wiki? 17:08:52 mhausenblas: initially collaborate on the wiki, then turn into proper WG Note with help of EricP 17:09:08 Soeren: present use cases as database schemas? 17:09:48 mhausenblas: rather keep it on user level, e.g., "we have a web shop..." 17:10:05 or "combine crm system with web shop" 17:10:34 for now, it's structured brainstorming 17:11:05 number of use cases we're aiming at? 17:11:22 EricP: a size that we can manage 17:11:32 Topic: Presentation - Okkam/ENS 17:11:50 http://www.w3.org/2001/sw/rdb2rdf/wiki/images/c/cf/Okkam.pdf 17:12:03 Heiko Störmer is presenting 17:12:35 work is part of OKKAM, EU project 17:12:49 ENS -- Entity Naming System 17:12:59 s/Heiko Störmer/Heiko Stoermer 17:13:05 thanks mhausenblas! 17:13:06 slide 2 17:13:14 rrsagent, draft minutes 17:13:14 I have made the request to generate http://www.w3.org/2009/12/15-RDB2RDF-minutes.html mhausenblas 17:14:27 slide 3 17:14:43 ENS provides services for re-use of identifiers 17:14:58 several public services 17:15:33 ID search, ID creation, ID management (alternative IDs), create+update profiles of entities 17:15:57 scalable architecture 17:16:14 access through SOAP services, REST is coming 17:16:23 web frontends 17:16:59 slide 4 17:17:10 benefits from using ENS 17:18:08 heiko: easily retrieve all data attached to the same ID 17:18:13 thx ericP! 17:18:50 ... maintain metadata about entities 17:19:10 ... profile updates based on popularity 17:19:33 ... application in business intelligence 17:19:39 ... integrate data across systems 17:20:03 ... potentially get links to stuff outside on the web for free 17:20:25 ... e.g. other people talking about your product (SAP use case) 17:20:27 slide 5 17:20:38 heiko: architecture 17:20:45 I have made the request to generate http://www.w3.org/2009/12/15-RDB2RDF-minutes.html mhausenblas 17:20:58 ... storage 17:21:13 ... lifecycle, e.g. ageing, merging, splitting of IDs 17:21:22 ... entity matching (queries) 17:21:48 ... access management: no mining queries ("give me all XYZ") 17:21:52 ... access APIs 17:21:54 slide 6 17:21:59 heiko: scalability 17:22:24 ... storage has distributed index, and distributed entity store, both clustered 17:22:37 ... replication+sharding 17:22:42 LeeF has joined #rdb2rdf 17:22:47 ... solr 17:23:12 ... ENS Core does life cycle etc, also clustered 17:23:24 slide 7 17:23:47 heiko: currently also working on offline processing 17:23:58 ... batch processing, deduplication, data quality assessment etc 17:24:03 hhalpin has joined #rdb2rdf 17:24:12 slide 8 17:24:28 heiko: under development for 2 years, version 2 coming 17:24:51 ... now at 7.5M records, system scales to 50M 17:25:12 ... want to be at 50M records and capability of 500M at project end 06/2010 17:25:15 slide 9 17:25:23 regrets+ Ben_Szekely 17:25:41 regrets+ Nuno 17:25:54 heiko: entity repository = ID + attached entity description 17:25:59 regrets+ Ahmed 17:26:03 I have made the request to generate http://www.w3.org/2009/12/15-RDB2RDF-minutes.html mhausenblas 17:26:21 slide 10 17:26:29 heiko: challenges 17:26:41 ... no defined fixed schema, just vocabularies 17:26:57 ... we don't define vocabularies 17:27:05 ... users specify name-value pairs 17:27:19 ... matching afterwards is difficult 17:27:50 ... users can use whatever vocab they want, "professor" instead of "person", we must deal with that 17:27:53 slide 11 17:28:18 heiko: internal representation: XML documents with name-value pairs describing the entities 17:28:23 ... and alternative identifiers 17:28:40 ... can be interpreted as linked data style sameAs 17:28:43 ... e.g. dbpedia URI 17:29:03 ... API call for retrieving the canonical OKKAM ID for an alternative identifier 17:29:10 slide 12 17:29:21 heiko: current content of the repo 17:29:29 ... wikipedia, geonames, manually created 17:29:35 ... total 7.5M entities 17:29:47 ... currently adding DBLP 17:30:14 ... no restriction w.r.t. types of entites, we can manage everything 17:30:16 slide 13 17:30:51 heiko: entity ID search 17:31:03 ... user submits key-value pairs as query 17:31:14 ... query must be matched against profiles 17:31:25 ... result is canonical identifier 17:31:40 ... skip slide 14 17:31:42 slide 15 17:31:59 heiko: 2 phase process in search 17:32:19 ... 1. entity search, 2. refined entity matching 17:32:36 ... entity search is for recall, pull out everything that is relevant, that's fast 17:32:55 ... refined matching then to increase precision, can be more expensive 17:33:06 ... return match or no match 17:33:19 slide 16 17:33:24 heiko: bridging to database integration 17:33:44 ... expose two DBs as two knowledge bases (graph) 17:34:04 ... typical approach for integration: owl:sameAs between records in diff DBs 17:34:07 slide 17 17:34:24 heiko: owl:sameAs has strong semantics, you forget where the data came from 17:34:58 ... (slide 18) better: use same ID everywhere 17:35:08 ... OKKAM ID as "mediator" in the middle 17:35:19 ... without undesirable consequences of sameAs 17:35:21 slide 19 17:36:45 + +44.131.208.aaee 17:36:46 heiko: you can give local identifiers and then connect them to OKKAM ID 17:36:58 ... then you can merge based on the ID, with desired semantic rules 17:37:00 slide 20 17:37:29 heiko: a database alignment project with okkam 17:37:45 ... client has bunch of databases 17:37:52 ... want unified view 17:38:00 ... convert them all to RDF 17:38:09 ... use ENS to align 17:38:44 ... so entities are linked without having to merge the graphs 17:38:55 Zakim, aaee is hhalpin (sorry about disconnect!) 17:38:55 I don't understand you, hhalpin 17:39:19 Zakim, aaee is hhalpin 17:39:19 +hhalpin; got it 17:39:42 slide 22 17:40:19 heiko: in RDB you have PKs, so unique ID is often a number 17:40:24 ... in RDF you need a URI 17:40:38 rrsagent, draft minutes 17:40:38 I have made the request to generate http://www.w3.org/2009/12/15-RDB2RDF-minutes.html mhausenblas 17:40:59 ... ENS is the thing that can enable stepping from the RDF world to the RDF world 17:41:18 ... afterwards, coreference is syntactically evident 17:41:42 ... so okkam provides mapping between local ID and global OKKAM ID 17:41:59 ... DERI has sig.ma application 17:42:22 ... you can give it an okkam ID and it will give view on all data out there that uses the ID 17:42:27 +q 17:42:35 Q&A 17:43:08 ericP: similar to Shared Names project? Concept Wiki? 17:43:25 heiko: they do life science IDs, we do all domains 17:43:43 ... they are vertical app 17:44:10 ericP: different proteins are sometimes the same, sometimes not considered the same 17:44:25 ... predicated similarity? 17:45:05 heiko: frequently raised point... up until which is X the same when you start replacing all its parts? 17:45:11 ... we don't deal with that kind of semantics 17:45:23 q+ 17:45:37 ... what's the same or not is in your knowledge base 17:46:29 ... if you describe things differently from me, if we need insulation, we will have two different entites 17:47:24 ericP: when I do SPARQL queries, should engine be aware of OKKAM? 17:47:50 heiko: no SPARQL interface yet 17:48:11 ack Souri 17:49:22 Souri: q related to goal of this WG... how do you do mapping in the DBs? 17:50:12 heiko: that's up to mapping infrastructure. we just provide a URI. ENS is not a mapping layer between DB and RDF. ENS is ID management 17:50:16 -soeren 17:50:30 q+ 17:50:55 Souri: do you hand an ID to the user, "build your DB using this"? or does user give all hist IDs to the ENS? 17:51:55 heiko: can do two things. first, whenever I create an entity, ENS assigns it an ID. when someone else wants to talk about same entity, ID is already there in the ENS 17:52:37 ... second, we already have distributed data. you give data to the ENS, it gives you an ID (existing or newly created). repeat for different data sources, you get same ID 17:52:59 ack Ashok 17:53:24 Ashok: are okkam IDs URIs? what's the structure of the URI? 17:53:33 heiko: yes, they are URIs 17:53:42 http://www.okkam.org/entity/ok5f23a5ce-a683-4c4d-ae73-b78cdc17aec1 17:53:50 heiko: that's an okkam ID 17:54:21 ... it's a UUID 17:54:31 ack jsequeda 17:54:48 you can aggragate data by okkamID using sig.ma for example 17:54:49 http://sig.ma/search?q=http://www.okkam.org/entity/ok5f23a5ce-a683-4c4d-ae73-b78cdc17aec1 17:55:34 jsequeda: let's say i have legacy DB about companies with PKs. so I would map my PKs to okkam IDs? 17:55:56 heiko: yes you want to have the okkam ID somewhere in your data, because then it's stable 17:56:03 I have made the request to generate http://www.w3.org/2009/12/15-RDB2RDF-minutes.html mhausenblas 17:56:23 Zakim, list attendees 17:56:23 As of this point the attendees have been Seema, +43.316.876.aaaa, +1.562.249.aabb, +039046188aacc, +39.046.188.aadd, Ashok_Malhotra, Souri, EricP, mhausenblas, MacTed, soeren, 17:56:26 ... either do it entity by entity, or use batch processor where you send the data to the ENS 17:56:27 ... cygri, whalb, [IPcaller], angela_UNITN, HeikoStoermer, jsequeda, +44.131.208.aaee, hhalpin 17:56:35 ... privacy issues of course 17:57:05 jsequeda: how much disambiguation do you do? how tell apart oracle the company and oracle the DB? 17:57:24 heiko: if you just have a string, we can do nothing for you. need more info in your record 17:57:52 ... sometimes can fall back on global popularity. IBM the company vs IBM the band 17:58:49 ... in practice, today: build a slightly more elaborate description of your entity; do it one by one; send query to ENS 17:59:19 ... real examples from use case partners have sufficient detail 17:59:48 ... structure of query: simplest is bag of words; more complex is key value pairs; easy to pull that from a DB and that helps us a great deal 18:00:15 mhausenblas: further questions on the mailing list 18:00:18 Topic: AOB 18:00:32 +1 18:00:35 mhausenblas: no telecon on december 22nd and 29th 18:00:41 +1 18:00:49 PROPOSAL: reconvene jan 5th 18:00:57 http://www.w3.org/2001/sw/rdb2rdf/wiki/ScribeList 18:01:09 next scribe is Souri 18:02:02 microsoft patent ... apparently does not come from SQL Server team but perhaps Live Search 18:03:41 Email on the New York Semantic Web mailing list 18:03:42 Actually its not a patent yet, just an application. The USPTO is looking at ways to improve discovery of prior art, and has a pilot program where you can participate in the examination process. So if you know of prior art, post it here: 18:03:46 http://www.peertopatent.org/ 18:04:58 there is a date that "prior art" must exist before, associated with the patent ... but I forget whether that's the "submission date" or something else 18:06:17 Oracle has a paper in VLDB 2005 18:06:43 -MacTed 18:06:45 [adjourned] 18:06:50 -Souri 18:06:51 -mhausenblas 18:06:52 -[IPcaller.a] 18:06:53 -[IPcaller] 18:06:53 -EricP 18:06:54 -Ashok_Malhotra 18:06:54 -Seema 18:06:55 -whalb 18:06:56 RRSAgent, draft minutes 18:06:56 I have made the request to generate http://www.w3.org/2009/12/15-RDB2RDF-minutes.html mhausenblas 18:06:59 -jsequeda 18:07:00 -angela_UNITN 18:07:10 -hhalpin 18:07:12 -HeikoStoermer 18:07:13 SW_RDB2RDF()12:00PM has ended 18:07:15 Attendees were Seema, +43.316.876.aaaa, +1.562.249.aabb, +039046188aacc, +39.046.188.aadd, Ashok_Malhotra, Souri, EricP, mhausenblas, MacTed, soeren, cygri, whalb, [IPcaller], 18:07:17 ... angela_UNITN, HeikoStoermer, jsequeda, +44.131.208.aaee, hhalpin 18:08:33 Zakim, bye 18:08:33 Zakim has left #rdb2rdf 18:08:39 RRSAgent, bye 18:08:39 I see no action items