RDB2RDF -- 30 Oct 2008

<Ashok> Wolfgang, are you there?

<whalb> hi, i'm on irc only

<Ashok> Which room did you book at the hotel?

<whalb> the small meeting room

<Ashok> I'm wondering if we can get the bigger room?

<whalb> should be no problem, i suppose

<Ashok> You will have to speak with the hotel ... they will only speak to you since you are the organizer

<whalb> ok, so you want the bigger room?

<Ashok> yes, that would be nice

<Ashok> can you phone them?

<whalb> just called - the bigger room is occupied for today

<Ashok> Please ask for tomorrow

<whalb> tomorrow is no problem - will cost eur 75 then instead of eur 45

<whalb> i can also try to convince the lady at the front desk to move to the bigger room today in the afternoon maybe?

<Ashok> That would be great!

<whalb> so for this afternoon and tomorrow?

<Ashok> BTW, where is Michael?

<Ashok> We need a projector

<whalb> he will be there with the projector soon

<Ashok> RDB2RDF XG f2f Agenda 1. Welcome and introductions 2. Appointment of minute taker 3. Approve minutes of October 10, 2008 meeting 5. Agenda bashing 6. Action Items: - Jenny/Cathy to send out use case to the group. - Orri to update the requirements document 7. Satya Sahoo and Wolfgang Halb – Update on Literature Survey http://esw.w3.org/topic/Rdb2RdfXG/StateOfTheArt 8. Usecase discussion · Automatic mapping from RDB schema to R

<Ashok> Wolfgang, please phone Michael .... we need the projector

Apporval of minutes 10Oct

<ericP> APPROVED: no objections

Next telecon

<Soeren> http://docs.google.com/Presentation?docid=dg6b38v4_3dnqsk7dc

<Soeren> that's the link to the triplify presentation

<ericP> PROPOSED: 6Nov (13Nov cancelled)

<ericP> APPROVED: no objections

Action Items

<ericP> Jenn: we have @@@ documents back now

<ericP> ... contract for getting ordinance survey data will be sent to the XG

<ericP> ... will need inspection by XG participants

<whalb> i just called the hotel and have arranged the bigger room for tomorrow

<whalb> unfortunately it is not possible to have bigger room today, sorry

Soeren's presentation

<whalb> michael should be there soon

<ericP> [google presentation clinic ensues]

<whalb> 6Nov is a Thursday? do you mean 7Nov and cancel 14Nov telecon?

<Soeren> http://docs.google.com/Presentation?id=dg6b38v4_3dnqsk7dc

<Ashok> Right!

<Ashok> So, the correct telcons details are: Cancel Nov 14 Telcon. Next telcon Nov 7

<ericP> [slide showing slow groth of SW]

<ericP> Soeren: if we put data in the SW, search engines will index it in structured ways

<ericP> ... most indexed docs are no longer simple html

<ericP> ... idea of triplify is the make it easy for relation custodians to puublish their data on thein the SW

<ericP> idea was to e

<ericP> Soeren:

<ericP> ... simpel conf -- didn't invent a new scheme

<ericP> ... tested with mysql

<ericP> ... should work with other DBs

<ericP> ... configured with SQL queries

<ericP> [ex with wordpress]

<ericP> Soeren: re-using existing vocabs is done by renaming the result coluimns

<ericP> ... URI and literal constraints are appended to gen'd SQL

<ericP> ... can use aggregates and transformations (e.g. SHA1)

<ericP> ... ...

<ericP> ... we need logs of RDB updates so a crawler can hit those changes

<ericP> Ashok: could introduce this need to DAWG2

<ericP> ...

<ericP> Soeren: [wordpress details on crawling updates]

<ericP> ... you can annotate updates (e.g. provenance)

<ericP> ... crawlers can discriminate

<ericP> ... working on Open Street Maps

<ericP> ... 160G of Geo data

<ericP> ... already deployed triplify for this data

<ericP> ... the schema is structured in terms of points

<ericP> ... triply configuration would need to know the ID

<ericP> ... we are envisioning a spacial extension

<ericP> ... URL-encode long,lat and radius

<ericP> ... in some places, Open Street Maps is much more detailed than google

<ericP> ... i find viruoso and triplify complementary

<ericP> ... haven't experienced performance issues 'cause the queries are pushed down

<jsequeda> http://docs.google.com/Presentation?id=df7pbdbw_463cspkrfdh

Juan's preso

<ericP> [slide 2]

<ericP> Juan: need a motivation to create linked data

<ericP> ... this direct mapping is an easy way to get thsi started

<ericP> [slide 3]

<ericP> Juan: have a survvey of folks who have used a direct mapping

<ericP> ... have worked out completeness

<ericP> ... demo at eswc

<ericP> [slide 4]

<ericP> [slide 5]

<ericP> Juan: everyone does this the same way

<ericP> ... we formalized this obvious stuff

<ericP> [slide 6]

<ericP> [slide 8]

<ericP> Juan: idea is to shift the prob from rdb-ont mapping problem to an ont-ont mapping problem

<ericP> [slide 10]

<ericP> Juan's slides

EricP demo

<Ashok> Eric shows demo

<Ashok> Has a paper that shows completeness

<Ashok> Pushes query down to SQL

<Ashok> Ashok: Net this out for us ... what does this tell us

<Ashok> Eric: I have another paper that is probably better for you guys

<ericP>

<ericP> LOD-friendly

<Ashok> Above has mapping rules to be friendly to linked data

Michael: I have now put the presentations so far at http://esw.w3.org/topic/Rdb2RdfXG/FirstF2F

<jsequeda> http://docs.google.com/Presentation?id=df7pbdbw_484hm92qccx Chris' presentation

BSBM [Chris]

Michael: I've updated http://esw.w3.org/topic/Rdb2RdfXG/FirstF2F to point to Soeren's presentation as well (GDocs)

<ericP> ACTION: ChrisB to email the XG list to solicit other SPARQL-SQL rewriters to test [recorded in http://www.w3.org/2008/10/30-RDB2RDF-minutes.html#action01]

<ericP> Chris: [re generating URIs]

<ericP> ... we have d2rq and d2rserver

<ericP> ... in the mapping file, you can create URIs using patterns

<ericP> ... there is a placeholder which is filled with a value from the database

<ericP> ... you can also use translation (invertable) function you like

<ericP> Ashok: when i specify the mapping, i can write a function, but otherwise can use a function

<ericP> Chris: as long as it's invertable

OWL

<ericP> Bijan:

<ericP> mhausenblas: could you liason with OWL2?

<ericP> Bijan: heading into LC. useful to get comments [early]

<ericP> ... my part of OWL is based on DLs

<Ashok> XG may want to comment on OWL 2 specs

<ericP> ... which is meant to reason over UmL and relational DBs

<ericP> ... so you can work at the conceptual level of your diagram

<ericP> ... you get reasoning functions to help with your conceptual models

<ericP> ... want to ask Q's of your conceptual view and have it work on your RDB

<ericP> ... can also use OWL for data-enrichment instead of reconciliation

<ericP> ... TMBIS project had a bunch of bio databases

<ericP> ... provided faceted browsing (as opposed to conjunctive queries)

<ericP> ... i understand we are working on a lang to go bi-directional between RDBs and OWL

<ericP> ... don't want to compete on the syntax

<ericP> ... want the user to have confidence, and to know what things they are trading off when picking a store

<ericP> ... also might be nice if it were usable

<ericP> .... some proponents of using SQL has a mapping language

<ericP> ... RDBs traditionally do model checking

<ericP> ... as in, you have a model in memory and you just check it

<ericP> ... when you go to RDFS, you can't do this so easily

<ericP> ... when you get to OWL full-glory, you can have very different data which fits the model (i.e. more than one model the check)

<ericP> ... (this is why OWL is good for ont-mapping)

<ericP> ... OWL folks working on profiles with expressivity/speed balancecs

<ericP> ... OWL-QL (least expressive) can be expanded into datalog only by representing the rule heads

<ericP> ... OWL-RL (primarily from Oracle) looks at the set of OWL which can be expressed in datalog

<ericP> ... since you can do this by expansion of queries, you can do distributed query answering

<ericP> ... you send your qyuery to different DBs; the data can overlap. it will rewrite your query from the global schema into local schemas

<ericP> ... you can download the Quanto DL engine

<ericP> ... meta: here are some theoretical results; the group should focus on commercializing/standardizing this

<ericP> ... EL is a cut down DL

<ericP> ... you can use construct arbitrarily

<ericP> ... useful in Bio where they want rich ontologies

<ericP> ... in relational, you need to massage the data to extend it to the EL model

<ericP> ericP: you mean, add transitive closuere tables?

<ericP> Bijan: by analogy yes, perhaps not specifically

<ericP> ... there's a horrocks, perez paper which will generates the minimal datalog

<ericP> ... recursive datalog is expressible in SQL99, but you may not want to

<ericP> ACTION: Bijan to send and annotated bibliography to the XG list [recorded in http://www.w3.org/2008/10/30-RDB2RDF-minutes.html#action02]

<ericP> Bijan: a decision matrix would be useful

Generating URIs for relational data (Prof. Themis Palpanas)

<ericP> Themis: need to gen ids when going from rdbms to rdf

<ericP> ... instead of arbitrary house style, have a structured (standard) way of doing this

<ericP> ... this will help them be unique and global

<ericP> Ashok: you want IDs for individual entities

<ericP> ... but in RDBs, an entity might be spread out amoungst tables

<ericP> Themis: different ways of attacking this

<ericP> ... in may cases, it's clear what the entity is

<ericP> ... e.g. Persons, Parts, Products

<ericP> ... many DBs talk about these things

<ericP> Soeren: when you have linked data providers, you want to use the same identifier

<ericP> ... want to see different ids which are linked to the same id

<ericP> Themis: we have an Entity Naming System

<ericP> ... assigns and manages IDs for these entities

<ericP> ... store only the minimal info to differentiate them

<ericP> ... use case: someone asks for X and it either already existed, or it gets created

<ericP> ... this can be a step in exporting the data to RDF

<ericP> ... gives more structured what to mint IDs

<ericP> Michael: I feel that a ENS breaks distribution and won't scale

<ericP> ... however, would work for schema names

<ericP> Orri: nice for schema names or the name of the bar where we gathered

<ericP> ... don't think it will work for the credit card transation

<ericP> Themis: ENS not applicable everywhere

<ericP> ... don't need different names for geolocations

<ericP> Cathy: i don't think the in-house will work politically

<ericP> Soeren: DNS is already political

<ericP> ... many folks are not happy with current DNS

<Ashok> +present Michael Hausenblas

<Ashok> +present +Soeren Auer

<ericP> Themis: not talking about it being central. can work like wikipedia

<Ashok> +present Orri Erling

<Ashok> +present Ivan Mikhailov

<Ashok> +present Themis Palpanas

<Ashok> +present Axel Polleres

<Ashok> +present Juan Sequeda

<Ashok> +present Jenny Green

<Ashok> +present Cathy Dolbear

<Ashok> +present Eric Prudhommeax

<Ashok> +Present Ashok Malhotra

<ericP> Micheal: when in a DB, you operate under closed world semantics

<ericP> ... when you hit the SW, it's open-world

<ericP> Micheal: will take action to see where ENS will and won't work

<ericP> Themis: makes sense in some senses 'cause it helps with the data integration problem

<ericP> ... using this kind of system does not preclude in-house ids

<ericP> orri: if i make a uri in my domain, i lean on DNS

<ericP> ... sameAs is exensive at query time

<ericP> ... the less sameAs, the better

<ericP> ... it least, there is nothing wrong with having having a world registry for these identifiers

<ericP> ... sameAs is more expensive than named graphs

<ericP> ivan: in @@@ scenarios, the primary key suffices for gen'ing the shared identifier

<ericP> ... wikipedia-style disambiguation is trivial

<ericP> Themis: entities coming out of news agencies [are a good candidate]

<ericP> orri: news use case is quite clear

<ericP> Themis: dblp has different entities for the same real-world entity

<ericP> ... we are working on a process and tools to help db folks id-generation process

<ericP> ACTION: Michael to work with Themis on use cases for ENS [recorded in http://www.w3.org/2008/10/30-RDB2RDF-minutes.html#action03]

b3s.openlink.com demo

b3s.openlinksw.com demo

<Ashok> http://b3s.openlinksw.com/

<ericP> orri: chris said that you couldn't fit the heterogenous db from the billion triple challenge in an RDB

<ericP> ... 1.5B triples running on 2 machines

<ericP> ... 12 clusters sparead across 2 machines

<ericP> ... we've extended SPARQL in all the necessary ways

<ericP> ... re: connection between

<ericP> ... both ends of the trans rel are supplied; clouds grow until they meet

<AxelPolleres> http://www.polleres.net/presentations/20081022w3cTPAC-talk.pdf

<ericP> ashock: you get RDF and it's indexed by default

<ericP> ... and you don't need a schema

<Ashok> http://demo.openlinksw.com/tpc-h

<AxelPolleres> http://xsparql.deri.org/ http://www.polleres.net/presentations/20081022w3cTPAC-talk.pdf (demo and slides for XSPARQL)

<Cathy> http://docs.google.com/Doc?id=dc9g594b_0hkgq652p&hl=en Ordnance Survey use case description

<ericP> -> http://xsparql.deri.org/ http://www.polleres.net/presentations/20081022w3cTPAC-talk.pdf demo and slides for XSPARQL

<ericP> AxelPolleres: is conventional SPARQL, there are no extensible generators

<ericP> ... the base language is XQuery and we have injected some SPARQL bits

<ericP> ... i've seen several proposals for mapping langs

<ericP> ... generally canonical approach ala ericP,

<ericP> ... @@@ [ schema-based ] mapping

<ericP> ... or proprietary

<ericP> Ordnance Survey use case description

Ordnance Survey Usecase

Ordnance Survey use case [Cathy]

<ericP> Cathy: 3 spacial databases in 3 oracle table spaces

<ericP> warehouse with more semantic annotation than SQL provides

<Ashok> Users can write SPARQL ans use ontologoes better than writing SQL

<ericP> GeraldReif: so far we've heard about materialization and query-down

<ericP> ... but there is also query update

<ericP> ... we want to re-use the mappings that are there to do update

<ericP> ... if you insert only one triple, you maybe can't fill the whole database rule

<ericP> ... the system can tell you what else you need to maintain a consistent state

<ericP> ivan: we have a related proposal on the esw wiki

<ericP> ashok: where will you add this extra info?

<ericP> GeraldReif: to the mapping?

<ericP> orri: SQL knows which cols are NULLable

<ericP> ... triples are unordered, but to apply them to a RDB, you have to order them to fulfill referential constraints

<ericP> Soeren: usual updates are ordered

<ericP> orri: in SPARUL, you make a graph

<ericP> ... then you do the equivalent of a bulk load

<ericP> GeraldReif: you know which bits are missing, you can return structured errors helping the user fix what's missing

observed reqs

<ericP> ashok: conf language should be an XML language, not RDF

<ericP> ericP: you're nearly creating an XML syntax for SPARQL

<ericP> ashok: general mapping lang will be actually quite complex, but a simple profile

<ericP> Cathy: do you expect the complex lang for an upcoming doc?

<ericP> ashok: probably simple

<ericP> orri: expect the simple language to be asequeate for ETL

<ericP> ericP: why not RDF?

<ericP> orri: our meta-represention is something no one would want to write

<Ashok> orri: RDF is harder to write

<Ashok> Eric disagrees

<Ashok> Discussion about whether language shd be XML or RDF

<AxelPolleres> not all RDF can be written in RDF/XML ... does it need more?

<Ashok> Eric: Our data (language) should be RDF also. We are asking that all data be in RDF

<AxelPolleres> take http:/foo.bar/ in predicate position.

<Ashok> scribe: Ashok

Eric: If you do not have a RDF syntax there will be opposition

Orri: The output of the mapping will be RDF or RDFS
... The compromise that we reache3d w/Eric was that we have a SPARQL like language which we wrap in RDF.

Mapping must be · Customizable – names, data mapping, RDF/ontology generation Change names, data manipulation, etc.

· Ability to add additional semantics – basically rules

Jenny: Need rules both for mapping from db to ontology and between ontologies

Orri: How expressive are rules

Jenny: we use SWRL ... we will use RIF

Orri: Do you have recursion

Jenny: Yes. Give examples of rules firing other rules

<Cathy> Jenny: In order to define a concept you have to define what the property is of that concept so you're defining a concept using the concept's properties

<Cathy> Askok: Orri, you'd like to push the rules down to the SQL layer, but that may not be possible

<Cathy> Jenny: There's enough info in the mappings of the concepts you use in the rules, to get a subset of the graph, and then fire the rules on that subset.

<Cathy> if that didn't work, it'd be because you'd written the rules too verbosely

<Cathy> Askok: performance is better if you push the rules down to SQL, but you might not always want to do that

<Cathy> Jenny: Our rules are DL safe, so not full SWRL.

Jenny: Rules are DL safe rules

<Cathy> Jenny: so the rules are always satisfiable

<Cathy> Ashok: are there other kinds of semantics people would like?

<Cathy> Orri: We'd need to classify the type of rule, but that'll come later

<Cathy> Michael: RIF is for rule exchange, so it might not be appropriate

<Cathy> Ashok: In terms of adding semantics, we want rules, and OWL semantics

<mhausenblas> Michael: why not using N3 http://www.w3.org/DesignIssues/Notation3

<Cathy> Jenny: So we propose that the WG look into whether people are needing to put rules into the mapping language, or are people wanting to push them up to the reasoner. Is it the job of the mapping file, or of the reasoner/rule engine?

Jenny: Look at how people are using rules anf then decide.

<mhausenblas> Michael: see also recent W3C team submission http://www.w3.org/TeamSubmission/n3/

MH: N3 allows rules to be integrated with RDF

· Generating identifiers for Relational data

Michael: : Dereferenceable http URIs

Ashok: Access rights are important
... Do not want to reinvent database access management, etc.

Orri: With ETL no security but is mapped back to the database then database can enforce access rights

<mhausenblas> scribenick: mhausenblas

Themis: can use ENS for this?

Orri: there are persitent entities of public interest (places, people, etc) that might have for example DBPedia URI
... can also be used with ENS
... using internal lookup function a mapping can be done
... using ENS where appropriate

<Ashok> Ashok: · Integrating data from other sources – impact on queries

Ashok: not trying to extended it to too many cases (XML, spread sheets, etc.)

Orri: we could for example say that implementations may allow the coexistence of virtual triples beside other triples

Ashok: two other things left
... (from Bijan's morning session)
... some guideance on the OWL subset used

Orri: propose to start with QL/RL

Ashok: last point

(Gerald mentioned it)

scribe: re updating stuff

Orri: updating is an overkill - out of scope

Michael: doubt that we are chartered to do that
... should we mentioned?

Orri: yes

Ivan: bit different phrasing ... couple it with SPARQL Update [scribe unsure]

Ashok: not preferable
... so we mention it to state 'not in this phase'

Michael: is data provenance and licensing an issue?

Ashok: we should mention it but we will not be able to solve that issue, I guess

Orri: should mention but not deep

Ashok: Let's sleep over it and contemplate

- DRAFT -

RDB2RDF

30 Oct 2008

Attendees

Contents