IRC log of RDB2RDF on 2009-12-15
Timestamps are in UTC.
- 16:59:21 [RRSAgent]
- RRSAgent has joined #RDB2RDF
- 16:59:21 [RRSAgent]
- logging to http://www.w3.org/2009/12/15-RDB2RDF-irc
- 16:59:23 [trackbot]
- RRSAgent, make logs world
- 16:59:23 [Zakim]
- Zakim has joined #RDB2RDF
- 16:59:25 [trackbot]
- Zakim, this will be 7322733
- 16:59:25 [Zakim]
- ok, trackbot; I see SW_RDB2RDF()12:00PM scheduled to start in 1 minute
- 16:59:26 [trackbot]
- Meeting: RDB2RDF Working Group Teleconference
- 16:59:26 [trackbot]
- Date: 15 December 2009
- 16:59:36 [Ashok]
- Ashok has joined #rdb2rdf
- 16:59:42 [mhausenblas]
- Chair: Michael
- 16:59:57 [soeren]
- soeren has joined #RDB2RDF
- 17:00:02 [mhausenblas]
- Agenda: http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2009Dec/0008.html
- 17:00:16 [jsequeda]
- jsequeda has joined #RDB2RDF
- 17:00:25 [Marcelo]
- Marcelo has joined #rdb2rdf
- 17:01:04 [Ashok]
- Do we have telcons on Dec 22 and 27 ?
- 17:01:15 [mhausenblas]
- no, Ashok ;)
- 17:01:22 [ericP]
- slackers
- 17:01:27 [MacTed]
- Zakim, this is 7322733
- 17:01:27 [Zakim]
- ok, MacTed; that matches SW_RDB2RDF()12:00PM
- 17:01:31 [MacTed]
- Zakim, who's here?
- 17:01:31 [Zakim]
- On the phone I see Seema, +43.316.876.aaaa, +1.562.249.aabb, +039046188aacc, +39.046.188.aadd, Ashok_Malhotra, Souri, [IPcaller], EricP, OpenLink_Software
- 17:01:34 [Zakim]
- On IRC I see Marcelo, jsequeda, soeren, Ashok, Zakim, RRSAgent, angela_UNITN, whalb, Seema, Souri, HeikoStoermer, mhausenblas, MacTed, iv_an_ru, trackbot, ericP
- 17:01:46 [Zakim]
- +mhausenblas
- 17:01:48 [Ashok]
- Thanks, Michael!
- 17:01:48 [MacTed]
- Zakim, OpenLink_Software is temporarily MacTed
- 17:01:48 [Zakim]
- +MacTed; got it
- 17:01:51 [MacTed]
- Zakim, mute me
- 17:01:51 [Zakim]
- MacTed should now be muted
- 17:01:57 [soeren]
- zakim, [IPcaller] is soeren
- 17:01:57 [Zakim]
- +soeren; got it
- 17:02:01 [cygri]
- cygri has joined #rdb2rdf
- 17:02:03 [whalb]
- zakim, aaa is me
- 17:02:03 [Zakim]
- sorry, whalb, I do not recognize a party named 'aaa'
- 17:02:09 [mhausenblas]
- Zakim, cygri is with me
- 17:02:09 [Zakim]
- +cygri; got it
- 17:02:11 [whalb]
- zakim, aaaa is me
- 17:02:11 [Zakim]
- +whalb; got it
- 17:02:15 [mhausenblas]
- scribenick: cygri
- 17:02:23 [mhausenblas]
- RRSAgent, draft minutes
- 17:02:23 [RRSAgent]
- I have made the request to generate http://www.w3.org/2009/12/15-RDB2RDF-minutes.html mhausenblas
- 17:02:30 [mhausenblas]
- rrsagent, make logs public
- 17:02:40 [mhausenblas]
- Zakim, who's here?
- 17:02:41 [Zakim]
- On the phone I see Seema, whalb, +1.562.249.aabb, +039046188aacc, +39.046.188.aadd, Ashok_Malhotra, Souri, soeren, EricP, MacTed (muted), mhausenblas
- 17:02:46 [Zakim]
- mhausenblas has mhausenblas, cygri
- 17:02:50 [Zakim]
- On IRC I see cygri, Marcelo, jsequeda, soeren, Ashok, Zakim, RRSAgent, angela_UNITN, whalb, Seema, Souri, HeikoStoermer, mhausenblas, MacTed, iv_an_ru, trackbot, ericP
- 17:03:19 [Zakim]
- +[IPcaller]
- 17:03:21 [cygri]
- topic: Admin
- 17:03:30 [cygri]
- Topic: Admin
- 17:04:08 [MacTed]
- MacTed = Ted Thibodeau
- 17:04:16 [MacTed]
- correct
- 17:04:46 [angela_UNITN]
- aacc is me
- 17:04:53 [MacTed]
- Zakim, aacc is angela_UNITN
- 17:04:53 [Zakim]
- +angela_UNITN; got it
- 17:04:54 [angela_UNITN]
- aadd is heiko
- 17:05:08 [MacTed]
- Zakim, aadd is HeikoStoermer
- 17:05:08 [Zakim]
- +HeikoStoermer; got it
- 17:05:13 [HeikoStoermer]
- right
- 17:05:33 [Zakim]
- +[IPcaller.a]
- 17:05:46 [jsequeda]
- Zakim, aabb is jsequeda
- 17:05:46 [Zakim]
- +jsequeda; got it
- 17:05:49 [mhausenblas]
- present+ Orri
- 17:06:18 [cygri]
- PROPOSAL: Accept the minutes of the 8 December 2009 telecon,
- 17:06:19 [cygri]
- http://www.w3.org/2009/12/08-RDB2RDF-minutes.html
- 17:06:24 [whalb]
- +1
- 17:06:26 [Marcelo]
- +1
- 17:06:28 [cygri]
- +1
- 17:06:30 [soeren]
- +1
- 17:06:48 [cygri]
- RESOLVED: Accept the minutes of the 8 December 2009 telecon
- 17:06:52 [cygri]
- Use Case planning
- 17:06:55 [cygri]
- Topic: Use Case planning
- 17:07:14 [cygri]
- mhausenblas: http://www.w3.org/2001/sw/rdb2rdf/wiki/Use_Cases_and_Requirements
- 17:07:23 [cygri]
- mhausenblas: invite ppl to add their use cases
- 17:08:12 [cygri]
- Ashok: format? HTML or only wiki?
- 17:08:52 [cygri]
- mhausenblas: initially collaborate on the wiki, then turn into proper WG Note with help of EricP
- 17:09:08 [cygri]
- Soeren: present use cases as database schemas?
- 17:09:48 [cygri]
- mhausenblas: rather keep it on user level, e.g., "we have a web shop..."
- 17:10:05 [cygri]
- or "combine crm system with web shop"
- 17:10:34 [cygri]
- for now, it's structured brainstorming
- 17:11:05 [cygri]
- number of use cases we're aiming at?
- 17:11:22 [cygri]
- EricP: a size that we can manage
- 17:11:32 [cygri]
- Topic: Presentation - Okkam/ENS
- 17:11:50 [cygri]
- http://www.w3.org/2001/sw/rdb2rdf/wiki/images/c/cf/Okkam.pdf
- 17:12:03 [cygri]
- Heiko Störmer is presenting
- 17:12:35 [cygri]
- work is part of OKKAM, EU project
- 17:12:49 [cygri]
- ENS -- Entity Naming System
- 17:12:59 [mhausenblas]
- s/Heiko Störmer/Heiko Stoermer
- 17:13:05 [cygri]
- thanks mhausenblas!
- 17:13:06 [cygri]
- slide 2
- 17:13:14 [mhausenblas]
- rrsagent, draft minutes
- 17:13:14 [RRSAgent]
- I have made the request to generate http://www.w3.org/2009/12/15-RDB2RDF-minutes.html mhausenblas
- 17:14:27 [cygri]
- slide 3
- 17:14:43 [cygri]
- ENS provides services for re-use of identifiers
- 17:14:58 [cygri]
- several public services
- 17:15:33 [cygri]
- ID search, ID creation, ID management (alternative IDs), create+update profiles of entities
- 17:15:57 [cygri]
- scalable architecture
- 17:16:14 [cygri]
- access through SOAP services, REST is coming
- 17:16:23 [cygri]
- web frontends
- 17:16:59 [cygri]
- slide 4
- 17:17:10 [cygri]
- benefits from using ENS
- 17:18:08 [cygri]
- heiko: easily retrieve all data attached to the same ID
- 17:18:13 [cygri]
- thx ericP!
- 17:18:50 [cygri]
- ... maintain metadata about entities
- 17:19:10 [cygri]
- ... profile updates based on popularity
- 17:19:33 [cygri]
- ... application in business intelligence
- 17:19:39 [cygri]
- ... integrate data across systems
- 17:20:03 [cygri]
- ... potentially get links to stuff outside on the web for free
- 17:20:25 [cygri]
- ... e.g. other people talking about your product (SAP use case)
- 17:20:27 [cygri]
- slide 5
- 17:20:38 [cygri]
- heiko: architecture
- 17:20:45 [RRSAgent]
- I have made the request to generate http://www.w3.org/2009/12/15-RDB2RDF-minutes.html mhausenblas
- 17:20:58 [cygri]
- ... storage
- 17:21:13 [cygri]
- ... lifecycle, e.g. ageing, merging, splitting of IDs
- 17:21:22 [cygri]
- ... entity matching (queries)
- 17:21:48 [cygri]
- ... access management: no mining queries ("give me all XYZ")
- 17:21:52 [cygri]
- ... access APIs
- 17:21:54 [cygri]
- slide 6
- 17:21:59 [cygri]
- heiko: scalability
- 17:22:24 [cygri]
- ... storage has distributed index, and distributed entity store, both clustered
- 17:22:37 [cygri]
- ... replication+sharding
- 17:22:42 [LeeF]
- LeeF has joined #rdb2rdf
- 17:22:47 [cygri]
- ... solr
- 17:23:12 [cygri]
- ... ENS Core does life cycle etc, also clustered
- 17:23:24 [cygri]
- slide 7
- 17:23:47 [cygri]
- heiko: currently also working on offline processing
- 17:23:58 [cygri]
- ... batch processing, deduplication, data quality assessment etc
- 17:24:03 [hhalpin]
- hhalpin has joined #rdb2rdf
- 17:24:12 [cygri]
- slide 8
- 17:24:28 [cygri]
- heiko: under development for 2 years, version 2 coming
- 17:24:51 [cygri]
- ... now at 7.5M records, system scales to 50M
- 17:25:12 [cygri]
- ... want to be at 50M records and capability of 500M at project end 06/2010
- 17:25:15 [cygri]
- slide 9
- 17:25:23 [mhausenblas]
- regrets+ Ben_Szekely
- 17:25:41 [mhausenblas]
- regrets+ Nuno
- 17:25:54 [cygri]
- heiko: entity repository = ID + attached entity description
- 17:25:59 [mhausenblas]
- regrets+ Ahmed
- 17:26:03 [RRSAgent]
- I have made the request to generate http://www.w3.org/2009/12/15-RDB2RDF-minutes.html mhausenblas
- 17:26:21 [cygri]
- slide 10
- 17:26:29 [cygri]
- heiko: challenges
- 17:26:41 [cygri]
- ... no defined fixed schema, just vocabularies
- 17:26:57 [cygri]
- ... we don't define vocabularies
- 17:27:05 [cygri]
- ... users specify name-value pairs
- 17:27:19 [cygri]
- ... matching afterwards is difficult
- 17:27:50 [cygri]
- ... users can use whatever vocab they want, "professor" instead of "person", we must deal with that
- 17:27:53 [cygri]
- slide 11
- 17:28:18 [cygri]
- heiko: internal representation: XML documents with name-value pairs describing the entities
- 17:28:23 [cygri]
- ... and alternative identifiers
- 17:28:40 [cygri]
- ... can be interpreted as linked data style sameAs
- 17:28:43 [cygri]
- ... e.g. dbpedia URI
- 17:29:03 [cygri]
- ... API call for retrieving the canonical OKKAM ID for an alternative identifier
- 17:29:10 [cygri]
- slide 12
- 17:29:21 [cygri]
- heiko: current content of the repo
- 17:29:29 [cygri]
- ... wikipedia, geonames, manually created
- 17:29:35 [cygri]
- ... total 7.5M entities
- 17:29:47 [cygri]
- ... currently adding DBLP
- 17:30:14 [cygri]
- ... no restriction w.r.t. types of entites, we can manage everything
- 17:30:16 [cygri]
- slide 13
- 17:30:51 [cygri]
- heiko: entity ID search
- 17:31:03 [cygri]
- ... user submits key-value pairs as query
- 17:31:14 [cygri]
- ... query must be matched against profiles
- 17:31:25 [cygri]
- ... result is canonical identifier
- 17:31:40 [cygri]
- ... skip slide 14
- 17:31:42 [cygri]
- slide 15
- 17:31:59 [cygri]
- heiko: 2 phase process in search
- 17:32:19 [cygri]
- ... 1. entity search, 2. refined entity matching
- 17:32:36 [cygri]
- ... entity search is for recall, pull out everything that is relevant, that's fast
- 17:32:55 [cygri]
- ... refined matching then to increase precision, can be more expensive
- 17:33:06 [cygri]
- ... return match or no match
- 17:33:19 [cygri]
- slide 16
- 17:33:24 [cygri]
- heiko: bridging to database integration
- 17:33:44 [cygri]
- ... expose two DBs as two knowledge bases (graph)
- 17:34:04 [cygri]
- ... typical approach for integration: owl:sameAs between records in diff DBs
- 17:34:07 [cygri]
- slide 17
- 17:34:24 [cygri]
- heiko: owl:sameAs has strong semantics, you forget where the data came from
- 17:34:58 [cygri]
- ... (slide 18) better: use same ID everywhere
- 17:35:08 [cygri]
- ... OKKAM ID as "mediator" in the middle
- 17:35:19 [cygri]
- ... without undesirable consequences of sameAs
- 17:35:21 [cygri]
- slide 19
- 17:36:45 [Zakim]
- + +44.131.208.aaee
- 17:36:46 [cygri]
- heiko: you can give local identifiers and then connect them to OKKAM ID
- 17:36:58 [cygri]
- ... then you can merge based on the ID, with desired semantic rules
- 17:37:00 [cygri]
- slide 20
- 17:37:29 [cygri]
- heiko: a database alignment project with okkam
- 17:37:45 [cygri]
- ... client has bunch of databases
- 17:37:52 [cygri]
- ... want unified view
- 17:38:00 [cygri]
- ... convert them all to RDF
- 17:38:09 [cygri]
- ... use ENS to align
- 17:38:44 [cygri]
- ... so entities are linked without having to merge the graphs
- 17:38:55 [hhalpin]
- Zakim, aaee is hhalpin (sorry about disconnect!)
- 17:38:55 [Zakim]
- I don't understand you, hhalpin
- 17:39:19 [mhausenblas]
- Zakim, aaee is hhalpin
- 17:39:19 [Zakim]
- +hhalpin; got it
- 17:39:42 [cygri]
- slide 22
- 17:40:19 [cygri]
- heiko: in RDB you have PKs, so unique ID is often a number
- 17:40:24 [cygri]
- ... in RDF you need a URI
- 17:40:38 [mhausenblas]
- rrsagent, draft minutes
- 17:40:38 [RRSAgent]
- I have made the request to generate http://www.w3.org/2009/12/15-RDB2RDF-minutes.html mhausenblas
- 17:40:59 [cygri]
- ... ENS is the thing that can enable stepping from the RDF world to the RDF world
- 17:41:18 [cygri]
- ... afterwards, coreference is syntactically evident
- 17:41:42 [cygri]
- ... so okkam provides mapping between local ID and global OKKAM ID
- 17:41:59 [cygri]
- ... DERI has sig.ma application
- 17:42:22 [cygri]
- ... you can give it an okkam ID and it will give view on all data out there that uses the ID
- 17:42:27 [Souri]
- +q
- 17:42:35 [cygri]
- Q&A
- 17:43:08 [cygri]
- ericP: similar to Shared Names project? Concept Wiki?
- 17:43:25 [cygri]
- heiko: they do life science IDs, we do all domains
- 17:43:43 [cygri]
- ... they are vertical app
- 17:44:10 [cygri]
- ericP: different proteins are sometimes the same, sometimes not considered the same
- 17:44:25 [cygri]
- ... predicated similarity?
- 17:45:05 [cygri]
- heiko: frequently raised point... up until which is X the same when you start replacing all its parts?
- 17:45:11 [cygri]
- ... we don't deal with that kind of semantics
- 17:45:23 [Ashok]
- q+
- 17:45:37 [cygri]
- ... what's the same or not is in your knowledge base
- 17:46:29 [cygri]
- ... if you describe things differently from me, if we need insulation, we will have two different entites
- 17:47:24 [cygri]
- ericP: when I do SPARQL queries, should engine be aware of OKKAM?
- 17:47:50 [cygri]
- heiko: no SPARQL interface yet
- 17:48:11 [mhausenblas]
- ack Souri
- 17:49:22 [cygri]
- Souri: q related to goal of this WG... how do you do mapping in the DBs?
- 17:50:12 [cygri]
- heiko: that's up to mapping infrastructure. we just provide a URI. ENS is not a mapping layer between DB and RDF. ENS is ID management
- 17:50:16 [Zakim]
- -soeren
- 17:50:30 [jsequeda]
- q+
- 17:50:55 [cygri]
- Souri: do you hand an ID to the user, "build your DB using this"? or does user give all hist IDs to the ENS?
- 17:51:55 [cygri]
- heiko: can do two things. first, whenever I create an entity, ENS assigns it an ID. when someone else wants to talk about same entity, ID is already there in the ENS
- 17:52:37 [cygri]
- ... second, we already have distributed data. you give data to the ENS, it gives you an ID (existing or newly created). repeat for different data sources, you get same ID
- 17:52:59 [mhausenblas]
- ack Ashok
- 17:53:24 [cygri]
- Ashok: are okkam IDs URIs? what's the structure of the URI?
- 17:53:33 [cygri]
- heiko: yes, they are URIs
- 17:53:42 [HeikoStoermer]
- http://www.okkam.org/entity/ok5f23a5ce-a683-4c4d-ae73-b78cdc17aec1
- 17:53:50 [cygri]
- heiko: that's an okkam ID
- 17:54:21 [cygri]
- ... it's a UUID
- 17:54:31 [mhausenblas]
- ack jsequeda
- 17:54:48 [angela_UNITN]
- you can aggragate data by okkamID using sig.ma for example
- 17:54:49 [angela_UNITN]
- http://sig.ma/search?q=http://www.okkam.org/entity/ok5f23a5ce-a683-4c4d-ae73-b78cdc17aec1
- 17:55:34 [cygri]
- jsequeda: let's say i have legacy DB about companies with PKs. so I would map my PKs to okkam IDs?
- 17:55:56 [cygri]
- heiko: yes you want to have the okkam ID somewhere in your data, because then it's stable
- 17:56:03 [RRSAgent]
- I have made the request to generate http://www.w3.org/2009/12/15-RDB2RDF-minutes.html mhausenblas
- 17:56:23 [mhausenblas]
- Zakim, list attendees
- 17:56:23 [Zakim]
- As of this point the attendees have been Seema, +43.316.876.aaaa, +1.562.249.aabb, +039046188aacc, +39.046.188.aadd, Ashok_Malhotra, Souri, EricP, mhausenblas, MacTed, soeren,
- 17:56:26 [cygri]
- ... either do it entity by entity, or use batch processor where you send the data to the ENS
- 17:56:27 [Zakim]
- ... cygri, whalb, [IPcaller], angela_UNITN, HeikoStoermer, jsequeda, +44.131.208.aaee, hhalpin
- 17:56:35 [cygri]
- ... privacy issues of course
- 17:57:05 [cygri]
- jsequeda: how much disambiguation do you do? how tell apart oracle the company and oracle the DB?
- 17:57:24 [cygri]
- heiko: if you just have a string, we can do nothing for you. need more info in your record
- 17:57:52 [cygri]
- ... sometimes can fall back on global popularity. IBM the company vs IBM the band
- 17:58:49 [cygri]
- ... in practice, today: build a slightly more elaborate description of your entity; do it one by one; send query to ENS
- 17:59:19 [cygri]
- ... real examples from use case partners have sufficient detail
- 17:59:48 [cygri]
- ... structure of query: simplest is bag of words; more complex is key value pairs; easy to pull that from a DB and that helps us a great deal
- 18:00:15 [cygri]
- mhausenblas: further questions on the mailing list
- 18:00:18 [cygri]
- Topic: AOB
- 18:00:32 [ericP]
- +1
- 18:00:35 [cygri]
- mhausenblas: no telecon on december 22nd and 29th
- 18:00:41 [Marcelo]
- +1
- 18:00:49 [cygri]
- PROPOSAL: reconvene jan 5th
- 18:00:57 [mhausenblas]
- http://www.w3.org/2001/sw/rdb2rdf/wiki/ScribeList
- 18:01:09 [cygri]
- next scribe is Souri
- 18:02:02 [cygri]
- microsoft patent ... apparently does not come from SQL Server team but perhaps Live Search
- 18:03:41 [jsequeda]
- Email on the New York Semantic Web mailing list
- 18:03:42 [jsequeda]
- Actually its not a patent yet, just an application. The USPTO is looking at ways to improve discovery of prior art, and has a pilot program where you can participate in the examination process. So if you know of prior art, post it here:
- 18:03:46 [jsequeda]
- http://www.peertopatent.org/
- 18:04:58 [MacTed]
- there is a date that "prior art" must exist before, associated with the patent ... but I forget whether that's the "submission date" or something else
- 18:06:17 [Souri]
- Oracle has a paper in VLDB 2005
- 18:06:43 [Zakim]
- -MacTed
- 18:06:45 [mhausenblas]
- [adjourned]
- 18:06:50 [Zakim]
- -Souri
- 18:06:51 [Zakim]
- -mhausenblas
- 18:06:52 [Zakim]
- -[IPcaller.a]
- 18:06:53 [Zakim]
- -[IPcaller]
- 18:06:53 [Zakim]
- -EricP
- 18:06:54 [Zakim]
- -Ashok_Malhotra
- 18:06:54 [Zakim]
- -Seema
- 18:06:55 [Zakim]
- -whalb
- 18:06:56 [mhausenblas]
- RRSAgent, draft minutes
- 18:06:56 [RRSAgent]
- I have made the request to generate http://www.w3.org/2009/12/15-RDB2RDF-minutes.html mhausenblas
- 18:06:59 [Zakim]
- -jsequeda
- 18:07:00 [Zakim]
- -angela_UNITN
- 18:07:10 [Zakim]
- -hhalpin
- 18:07:12 [Zakim]
- -HeikoStoermer
- 18:07:13 [Zakim]
- SW_RDB2RDF()12:00PM has ended
- 18:07:15 [Zakim]
- Attendees were Seema, +43.316.876.aaaa, +1.562.249.aabb, +039046188aacc, +39.046.188.aadd, Ashok_Malhotra, Souri, EricP, mhausenblas, MacTed, soeren, cygri, whalb, [IPcaller],
- 18:07:17 [Zakim]
- ... angela_UNITN, HeikoStoermer, jsequeda, +44.131.208.aaee, hhalpin
- 18:08:33 [mhausenblas]
- Zakim, bye
- 18:08:33 [Zakim]
- Zakim has left #rdb2rdf
- 18:08:39 [mhausenblas]
- RRSAgent, bye
- 18:08:39 [RRSAgent]
- I see no action items