IRC log of RDB2RDF on 2009-12-15

Timestamps are in UTC.

16:59:21 [RRSAgent]
RRSAgent has joined #RDB2RDF
16:59:21 [RRSAgent]
logging to http://www.w3.org/2009/12/15-RDB2RDF-irc
16:59:23 [trackbot]
RRSAgent, make logs world
16:59:23 [Zakim]
Zakim has joined #RDB2RDF
16:59:25 [trackbot]
Zakim, this will be 7322733
16:59:25 [Zakim]
ok, trackbot; I see SW_RDB2RDF()12:00PM scheduled to start in 1 minute
16:59:26 [trackbot]
Meeting: RDB2RDF Working Group Teleconference
16:59:26 [trackbot]
Date: 15 December 2009
16:59:36 [Ashok]
Ashok has joined #rdb2rdf
16:59:42 [mhausenblas]
Chair: Michael
16:59:57 [soeren]
soeren has joined #RDB2RDF
17:00:02 [mhausenblas]
Agenda: http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2009Dec/0008.html
17:00:16 [jsequeda]
jsequeda has joined #RDB2RDF
17:00:25 [Marcelo]
Marcelo has joined #rdb2rdf
17:01:04 [Ashok]
Do we have telcons on Dec 22 and 27 ?
17:01:15 [mhausenblas]
no, Ashok ;)
17:01:22 [ericP]
slackers
17:01:27 [MacTed]
Zakim, this is 7322733
17:01:27 [Zakim]
ok, MacTed; that matches SW_RDB2RDF()12:00PM
17:01:31 [MacTed]
Zakim, who's here?
17:01:31 [Zakim]
On the phone I see Seema, +43.316.876.aaaa, +1.562.249.aabb, +039046188aacc, +39.046.188.aadd, Ashok_Malhotra, Souri, [IPcaller], EricP, OpenLink_Software
17:01:34 [Zakim]
On IRC I see Marcelo, jsequeda, soeren, Ashok, Zakim, RRSAgent, angela_UNITN, whalb, Seema, Souri, HeikoStoermer, mhausenblas, MacTed, iv_an_ru, trackbot, ericP
17:01:46 [Zakim]
+mhausenblas
17:01:48 [Ashok]
Thanks, Michael!
17:01:48 [MacTed]
Zakim, OpenLink_Software is temporarily MacTed
17:01:48 [Zakim]
+MacTed; got it
17:01:51 [MacTed]
Zakim, mute me
17:01:51 [Zakim]
MacTed should now be muted
17:01:57 [soeren]
zakim, [IPcaller] is soeren
17:01:57 [Zakim]
+soeren; got it
17:02:01 [cygri]
cygri has joined #rdb2rdf
17:02:03 [whalb]
zakim, aaa is me
17:02:03 [Zakim]
sorry, whalb, I do not recognize a party named 'aaa'
17:02:09 [mhausenblas]
Zakim, cygri is with me
17:02:09 [Zakim]
+cygri; got it
17:02:11 [whalb]
zakim, aaaa is me
17:02:11 [Zakim]
+whalb; got it
17:02:15 [mhausenblas]
scribenick: cygri
17:02:23 [mhausenblas]
RRSAgent, draft minutes
17:02:23 [RRSAgent]
I have made the request to generate http://www.w3.org/2009/12/15-RDB2RDF-minutes.html mhausenblas
17:02:30 [mhausenblas]
rrsagent, make logs public
17:02:40 [mhausenblas]
Zakim, who's here?
17:02:41 [Zakim]
On the phone I see Seema, whalb, +1.562.249.aabb, +039046188aacc, +39.046.188.aadd, Ashok_Malhotra, Souri, soeren, EricP, MacTed (muted), mhausenblas
17:02:46 [Zakim]
mhausenblas has mhausenblas, cygri
17:02:50 [Zakim]
On IRC I see cygri, Marcelo, jsequeda, soeren, Ashok, Zakim, RRSAgent, angela_UNITN, whalb, Seema, Souri, HeikoStoermer, mhausenblas, MacTed, iv_an_ru, trackbot, ericP
17:03:19 [Zakim]
+[IPcaller]
17:03:21 [cygri]
topic: Admin
17:03:30 [cygri]
Topic: Admin
17:04:08 [MacTed]
MacTed = Ted Thibodeau
17:04:16 [MacTed]
correct
17:04:46 [angela_UNITN]
aacc is me
17:04:53 [MacTed]
Zakim, aacc is angela_UNITN
17:04:53 [Zakim]
+angela_UNITN; got it
17:04:54 [angela_UNITN]
aadd is heiko
17:05:08 [MacTed]
Zakim, aadd is HeikoStoermer
17:05:08 [Zakim]
+HeikoStoermer; got it
17:05:13 [HeikoStoermer]
right
17:05:33 [Zakim]
+[IPcaller.a]
17:05:46 [jsequeda]
Zakim, aabb is jsequeda
17:05:46 [Zakim]
+jsequeda; got it
17:05:49 [mhausenblas]
present+ Orri
17:06:18 [cygri]
PROPOSAL: Accept the minutes of the 8 December 2009 telecon,
17:06:19 [cygri]
http://www.w3.org/2009/12/08-RDB2RDF-minutes.html
17:06:24 [whalb]
+1
17:06:26 [Marcelo]
+1
17:06:28 [cygri]
+1
17:06:30 [soeren]
+1
17:06:48 [cygri]
RESOLVED: Accept the minutes of the 8 December 2009 telecon
17:06:52 [cygri]
Use Case planning
17:06:55 [cygri]
Topic: Use Case planning
17:07:14 [cygri]
mhausenblas: http://www.w3.org/2001/sw/rdb2rdf/wiki/Use_Cases_and_Requirements
17:07:23 [cygri]
mhausenblas: invite ppl to add their use cases
17:08:12 [cygri]
Ashok: format? HTML or only wiki?
17:08:52 [cygri]
mhausenblas: initially collaborate on the wiki, then turn into proper WG Note with help of EricP
17:09:08 [cygri]
Soeren: present use cases as database schemas?
17:09:48 [cygri]
mhausenblas: rather keep it on user level, e.g., "we have a web shop..."
17:10:05 [cygri]
or "combine crm system with web shop"
17:10:34 [cygri]
for now, it's structured brainstorming
17:11:05 [cygri]
number of use cases we're aiming at?
17:11:22 [cygri]
EricP: a size that we can manage
17:11:32 [cygri]
Topic: Presentation - Okkam/ENS
17:11:50 [cygri]
http://www.w3.org/2001/sw/rdb2rdf/wiki/images/c/cf/Okkam.pdf
17:12:03 [cygri]
Heiko Störmer is presenting
17:12:35 [cygri]
work is part of OKKAM, EU project
17:12:49 [cygri]
ENS -- Entity Naming System
17:12:59 [mhausenblas]
s/Heiko Störmer/Heiko Stoermer
17:13:05 [cygri]
thanks mhausenblas!
17:13:06 [cygri]
slide 2
17:13:14 [mhausenblas]
rrsagent, draft minutes
17:13:14 [RRSAgent]
I have made the request to generate http://www.w3.org/2009/12/15-RDB2RDF-minutes.html mhausenblas
17:14:27 [cygri]
slide 3
17:14:43 [cygri]
ENS provides services for re-use of identifiers
17:14:58 [cygri]
several public services
17:15:33 [cygri]
ID search, ID creation, ID management (alternative IDs), create+update profiles of entities
17:15:57 [cygri]
scalable architecture
17:16:14 [cygri]
access through SOAP services, REST is coming
17:16:23 [cygri]
web frontends
17:16:59 [cygri]
slide 4
17:17:10 [cygri]
benefits from using ENS
17:18:08 [cygri]
heiko: easily retrieve all data attached to the same ID
17:18:13 [cygri]
thx ericP!
17:18:50 [cygri]
... maintain metadata about entities
17:19:10 [cygri]
... profile updates based on popularity
17:19:33 [cygri]
... application in business intelligence
17:19:39 [cygri]
... integrate data across systems
17:20:03 [cygri]
... potentially get links to stuff outside on the web for free
17:20:25 [cygri]
... e.g. other people talking about your product (SAP use case)
17:20:27 [cygri]
slide 5
17:20:38 [cygri]
heiko: architecture
17:20:45 [RRSAgent]
I have made the request to generate http://www.w3.org/2009/12/15-RDB2RDF-minutes.html mhausenblas
17:20:58 [cygri]
... storage
17:21:13 [cygri]
... lifecycle, e.g. ageing, merging, splitting of IDs
17:21:22 [cygri]
... entity matching (queries)
17:21:48 [cygri]
... access management: no mining queries ("give me all XYZ")
17:21:52 [cygri]
... access APIs
17:21:54 [cygri]
slide 6
17:21:59 [cygri]
heiko: scalability
17:22:24 [cygri]
... storage has distributed index, and distributed entity store, both clustered
17:22:37 [cygri]
... replication+sharding
17:22:42 [LeeF]
LeeF has joined #rdb2rdf
17:22:47 [cygri]
... solr
17:23:12 [cygri]
... ENS Core does life cycle etc, also clustered
17:23:24 [cygri]
slide 7
17:23:47 [cygri]
heiko: currently also working on offline processing
17:23:58 [cygri]
... batch processing, deduplication, data quality assessment etc
17:24:03 [hhalpin]
hhalpin has joined #rdb2rdf
17:24:12 [cygri]
slide 8
17:24:28 [cygri]
heiko: under development for 2 years, version 2 coming
17:24:51 [cygri]
... now at 7.5M records, system scales to 50M
17:25:12 [cygri]
... want to be at 50M records and capability of 500M at project end 06/2010
17:25:15 [cygri]
slide 9
17:25:23 [mhausenblas]
regrets+ Ben_Szekely
17:25:41 [mhausenblas]
regrets+ Nuno
17:25:54 [cygri]
heiko: entity repository = ID + attached entity description
17:25:59 [mhausenblas]
regrets+ Ahmed
17:26:03 [RRSAgent]
I have made the request to generate http://www.w3.org/2009/12/15-RDB2RDF-minutes.html mhausenblas
17:26:21 [cygri]
slide 10
17:26:29 [cygri]
heiko: challenges
17:26:41 [cygri]
... no defined fixed schema, just vocabularies
17:26:57 [cygri]
... we don't define vocabularies
17:27:05 [cygri]
... users specify name-value pairs
17:27:19 [cygri]
... matching afterwards is difficult
17:27:50 [cygri]
... users can use whatever vocab they want, "professor" instead of "person", we must deal with that
17:27:53 [cygri]
slide 11
17:28:18 [cygri]
heiko: internal representation: XML documents with name-value pairs describing the entities
17:28:23 [cygri]
... and alternative identifiers
17:28:40 [cygri]
... can be interpreted as linked data style sameAs
17:28:43 [cygri]
... e.g. dbpedia URI
17:29:03 [cygri]
... API call for retrieving the canonical OKKAM ID for an alternative identifier
17:29:10 [cygri]
slide 12
17:29:21 [cygri]
heiko: current content of the repo
17:29:29 [cygri]
... wikipedia, geonames, manually created
17:29:35 [cygri]
... total 7.5M entities
17:29:47 [cygri]
... currently adding DBLP
17:30:14 [cygri]
... no restriction w.r.t. types of entites, we can manage everything
17:30:16 [cygri]
slide 13
17:30:51 [cygri]
heiko: entity ID search
17:31:03 [cygri]
... user submits key-value pairs as query
17:31:14 [cygri]
... query must be matched against profiles
17:31:25 [cygri]
... result is canonical identifier
17:31:40 [cygri]
... skip slide 14
17:31:42 [cygri]
slide 15
17:31:59 [cygri]
heiko: 2 phase process in search
17:32:19 [cygri]
... 1. entity search, 2. refined entity matching
17:32:36 [cygri]
... entity search is for recall, pull out everything that is relevant, that's fast
17:32:55 [cygri]
... refined matching then to increase precision, can be more expensive
17:33:06 [cygri]
... return match or no match
17:33:19 [cygri]
slide 16
17:33:24 [cygri]
heiko: bridging to database integration
17:33:44 [cygri]
... expose two DBs as two knowledge bases (graph)
17:34:04 [cygri]
... typical approach for integration: owl:sameAs between records in diff DBs
17:34:07 [cygri]
slide 17
17:34:24 [cygri]
heiko: owl:sameAs has strong semantics, you forget where the data came from
17:34:58 [cygri]
... (slide 18) better: use same ID everywhere
17:35:08 [cygri]
... OKKAM ID as "mediator" in the middle
17:35:19 [cygri]
... without undesirable consequences of sameAs
17:35:21 [cygri]
slide 19
17:36:45 [Zakim]
+ +44.131.208.aaee
17:36:46 [cygri]
heiko: you can give local identifiers and then connect them to OKKAM ID
17:36:58 [cygri]
... then you can merge based on the ID, with desired semantic rules
17:37:00 [cygri]
slide 20
17:37:29 [cygri]
heiko: a database alignment project with okkam
17:37:45 [cygri]
... client has bunch of databases
17:37:52 [cygri]
... want unified view
17:38:00 [cygri]
... convert them all to RDF
17:38:09 [cygri]
... use ENS to align
17:38:44 [cygri]
... so entities are linked without having to merge the graphs
17:38:55 [hhalpin]
Zakim, aaee is hhalpin (sorry about disconnect!)
17:38:55 [Zakim]
I don't understand you, hhalpin
17:39:19 [mhausenblas]
Zakim, aaee is hhalpin
17:39:19 [Zakim]
+hhalpin; got it
17:39:42 [cygri]
slide 22
17:40:19 [cygri]
heiko: in RDB you have PKs, so unique ID is often a number
17:40:24 [cygri]
... in RDF you need a URI
17:40:38 [mhausenblas]
rrsagent, draft minutes
17:40:38 [RRSAgent]
I have made the request to generate http://www.w3.org/2009/12/15-RDB2RDF-minutes.html mhausenblas
17:40:59 [cygri]
... ENS is the thing that can enable stepping from the RDF world to the RDF world
17:41:18 [cygri]
... afterwards, coreference is syntactically evident
17:41:42 [cygri]
... so okkam provides mapping between local ID and global OKKAM ID
17:41:59 [cygri]
... DERI has sig.ma application
17:42:22 [cygri]
... you can give it an okkam ID and it will give view on all data out there that uses the ID
17:42:27 [Souri]
+q
17:42:35 [cygri]
Q&A
17:43:08 [cygri]
ericP: similar to Shared Names project? Concept Wiki?
17:43:25 [cygri]
heiko: they do life science IDs, we do all domains
17:43:43 [cygri]
... they are vertical app
17:44:10 [cygri]
ericP: different proteins are sometimes the same, sometimes not considered the same
17:44:25 [cygri]
... predicated similarity?
17:45:05 [cygri]
heiko: frequently raised point... up until which is X the same when you start replacing all its parts?
17:45:11 [cygri]
... we don't deal with that kind of semantics
17:45:23 [Ashok]
q+
17:45:37 [cygri]
... what's the same or not is in your knowledge base
17:46:29 [cygri]
... if you describe things differently from me, if we need insulation, we will have two different entites
17:47:24 [cygri]
ericP: when I do SPARQL queries, should engine be aware of OKKAM?
17:47:50 [cygri]
heiko: no SPARQL interface yet
17:48:11 [mhausenblas]
ack Souri
17:49:22 [cygri]
Souri: q related to goal of this WG... how do you do mapping in the DBs?
17:50:12 [cygri]
heiko: that's up to mapping infrastructure. we just provide a URI. ENS is not a mapping layer between DB and RDF. ENS is ID management
17:50:16 [Zakim]
-soeren
17:50:30 [jsequeda]
q+
17:50:55 [cygri]
Souri: do you hand an ID to the user, "build your DB using this"? or does user give all hist IDs to the ENS?
17:51:55 [cygri]
heiko: can do two things. first, whenever I create an entity, ENS assigns it an ID. when someone else wants to talk about same entity, ID is already there in the ENS
17:52:37 [cygri]
... second, we already have distributed data. you give data to the ENS, it gives you an ID (existing or newly created). repeat for different data sources, you get same ID
17:52:59 [mhausenblas]
ack Ashok
17:53:24 [cygri]
Ashok: are okkam IDs URIs? what's the structure of the URI?
17:53:33 [cygri]
heiko: yes, they are URIs
17:53:42 [HeikoStoermer]
http://www.okkam.org/entity/ok5f23a5ce-a683-4c4d-ae73-b78cdc17aec1
17:53:50 [cygri]
heiko: that's an okkam ID
17:54:21 [cygri]
... it's a UUID
17:54:31 [mhausenblas]
ack jsequeda
17:54:48 [angela_UNITN]
you can aggragate data by okkamID using sig.ma for example
17:54:49 [angela_UNITN]
http://sig.ma/search?q=http://www.okkam.org/entity/ok5f23a5ce-a683-4c4d-ae73-b78cdc17aec1
17:55:34 [cygri]
jsequeda: let's say i have legacy DB about companies with PKs. so I would map my PKs to okkam IDs?
17:55:56 [cygri]
heiko: yes you want to have the okkam ID somewhere in your data, because then it's stable
17:56:03 [RRSAgent]
I have made the request to generate http://www.w3.org/2009/12/15-RDB2RDF-minutes.html mhausenblas
17:56:23 [mhausenblas]
Zakim, list attendees
17:56:23 [Zakim]
As of this point the attendees have been Seema, +43.316.876.aaaa, +1.562.249.aabb, +039046188aacc, +39.046.188.aadd, Ashok_Malhotra, Souri, EricP, mhausenblas, MacTed, soeren,
17:56:26 [cygri]
... either do it entity by entity, or use batch processor where you send the data to the ENS
17:56:27 [Zakim]
... cygri, whalb, [IPcaller], angela_UNITN, HeikoStoermer, jsequeda, +44.131.208.aaee, hhalpin
17:56:35 [cygri]
... privacy issues of course
17:57:05 [cygri]
jsequeda: how much disambiguation do you do? how tell apart oracle the company and oracle the DB?
17:57:24 [cygri]
heiko: if you just have a string, we can do nothing for you. need more info in your record
17:57:52 [cygri]
... sometimes can fall back on global popularity. IBM the company vs IBM the band
17:58:49 [cygri]
... in practice, today: build a slightly more elaborate description of your entity; do it one by one; send query to ENS
17:59:19 [cygri]
... real examples from use case partners have sufficient detail
17:59:48 [cygri]
... structure of query: simplest is bag of words; more complex is key value pairs; easy to pull that from a DB and that helps us a great deal
18:00:15 [cygri]
mhausenblas: further questions on the mailing list
18:00:18 [cygri]
Topic: AOB
18:00:32 [ericP]
+1
18:00:35 [cygri]
mhausenblas: no telecon on december 22nd and 29th
18:00:41 [Marcelo]
+1
18:00:49 [cygri]
PROPOSAL: reconvene jan 5th
18:00:57 [mhausenblas]
http://www.w3.org/2001/sw/rdb2rdf/wiki/ScribeList
18:01:09 [cygri]
next scribe is Souri
18:02:02 [cygri]
microsoft patent ... apparently does not come from SQL Server team but perhaps Live Search
18:03:41 [jsequeda]
Email on the New York Semantic Web mailing list
18:03:42 [jsequeda]
Actually its not a patent yet, just an application. The USPTO is looking at ways to improve discovery of prior art, and has a pilot program where you can participate in the examination process. So if you know of prior art, post it here:
18:03:46 [jsequeda]
http://www.peertopatent.org/
18:04:58 [MacTed]
there is a date that "prior art" must exist before, associated with the patent ... but I forget whether that's the "submission date" or something else
18:06:17 [Souri]
Oracle has a paper in VLDB 2005
18:06:43 [Zakim]
-MacTed
18:06:45 [mhausenblas]
[adjourned]
18:06:50 [Zakim]
-Souri
18:06:51 [Zakim]
-mhausenblas
18:06:52 [Zakim]
-[IPcaller.a]
18:06:53 [Zakim]
-[IPcaller]
18:06:53 [Zakim]
-EricP
18:06:54 [Zakim]
-Ashok_Malhotra
18:06:54 [Zakim]
-Seema
18:06:55 [Zakim]
-whalb
18:06:56 [mhausenblas]
RRSAgent, draft minutes
18:06:56 [RRSAgent]
I have made the request to generate http://www.w3.org/2009/12/15-RDB2RDF-minutes.html mhausenblas
18:06:59 [Zakim]
-jsequeda
18:07:00 [Zakim]
-angela_UNITN
18:07:10 [Zakim]
-hhalpin
18:07:12 [Zakim]
-HeikoStoermer
18:07:13 [Zakim]
SW_RDB2RDF()12:00PM has ended
18:07:15 [Zakim]
Attendees were Seema, +43.316.876.aaaa, +1.562.249.aabb, +039046188aacc, +39.046.188.aadd, Ashok_Malhotra, Souri, EricP, mhausenblas, MacTed, soeren, cygri, whalb, [IPcaller],
18:07:17 [Zakim]
... angela_UNITN, HeikoStoermer, jsequeda, +44.131.208.aaee, hhalpin
18:08:33 [mhausenblas]
Zakim, bye
18:08:33 [Zakim]
Zakim has left #rdb2rdf
18:08:39 [mhausenblas]
RRSAgent, bye
18:08:39 [RRSAgent]
I see no action items