See also: IRC log
<trackbot> Date: 22 October 2009
me dialing in
<iv_an_ru> I'm trying to connect...
<scribe> scribe: hhalpin
<Souri> Are we doing a roll call?
<mhausenblas> yes, Souri
Ahmed is doing a roll call.
<iv_an_ru> wow, I've got the connection!
PROPOSED: to approve RDB2RDF Weekly -- 15th October 2009 as a true record
<mhausenblas> +1
(note that I fixed the angela's name!)
<angela_UNITN> thanks
http://www.w3.org/2009/10/15-RDB2RDF-minutes.html
(need another +1)
<MacTed> Orri will be here momentarily. timezone mixup.
RESOLUTION: approved RDB2RDF Weekly -- 15th October 2009 as a true record
RESOLUTION: RDB2RDF Weekly meets next Oct 29th.
<mhausenblas> Michael: background about DST see http://www.timeanddate.com/time/aboutdst.html
RESOLUTION: To meet at same time for next 2 weeks just to prevent daylights saving errors, and then adopt whatever the best time from the poll.
mhausenblas: Everyone tried
wiki?
... your normal w3c login should work
... you can put your thoughts there
hhalpin: any first?
<Souri> Do you have the URL for the Wiki handy?
<mhausenblas> http://www.w3.org/2001/sw/rdb2rdf/wiki/Main_Page
hhalpin: is talking about
scribing
... and I'm taking more notes
... and so on.
<cgi-irc> me: testing my username
<iv_an_ru> (as a last resort, I can provide some comments to Orri's slides, finally, that's my SPARQL2SQL translator. The problem is that my connection is bad as usual)
<cgi-irc> this is ben
http://www.w3.org/2001/sw/rdb2rdf/track/actions/open
To see open actions.
<Ben> That's better
<scribe> ACTION: hhalpin to double-check EricP's status and have him send in his proposals about mapping [recorded in http://www.w3.org/2009/10/22-RDB2RDF-minutes.html#action01]
<trackbot> Created ACTION-8 - Double-check EricP's status and have him send in his proposals about mapping [on Harry Halpin - due 2009-10-29].
http://www.w3.org/2001/sw/rdb2rdf/track/actions/open
ACTION [DONE]: Contact Soeren for presenting Triplify and ask Richard to provide D2R documentation a week in advance of his presentation
<trackbot> Sorry, couldn't find user - [DONE]
ACTION [DONE]: mhausenblas Contact Soeren for presenting Triplify and ask Richard to provide D2R documentation a week in advance of his presentation
<trackbot> Sorry, couldn't find user - [DONE]
<mhausenblas> ACTION-2
<mhausenblas> ACTION-2?
<trackbot> ACTION-2 -- Michael Hausenblas to contact Soeren for presenting Triplify and ask Richard to provide D2R documentation a week in advance of his presentation -- due 2009-10-08 -- OPEN
<trackbot> http://www.w3.org/2001/sw/rdb2rdf/track/actions/2
<mhausenblas> close ACTION-2
<trackbot> ACTION-2 Contact Soeren for presenting Triplify and ask Richard to provide D2R documentation a week in advance of his presentation closed
ACTION-3?
<trackbot> ACTION-3 -- Michael Hausenblas to draft a proposal for presentation order on the Wiki -- due 2009-10-08 -- OPEN
<trackbot> http://www.w3.org/2001/sw/rdb2rdf/track/actions/3
<Souri> A pointer to Zakim tutorial would be good.
close ACTION-3
<trackbot> ACTION-3 Draft a proposal for presentation order on the Wiki closed
http://www.w3.org/2001/12/zakim-irc-bot.html
http://www.w3.org/2002/01/UsingZakim
<mhausenblas> and also
<mhausenblas> http://www.w3.org/2002/03/RRSAgent
close ACTION-3
<trackbot> ACTION-3 Draft a proposal for presentation order on the Wiki closed
close ACTION-4
<trackbot> ACTION-4 Draft first invitation mail to WG closed
close ACTION-5
<trackbot> ACTION-5 Re-send Marcelo invited expert form closed
close ACTION-6
<trackbot> ACTION-6 Add material to the Wiki closed
close ACTION-7
<trackbot> ACTION-7 Send email out to europeans explaining daylight savings time. closed
http://www.w3.org/2001/sw/rdb2rdf/wiki/images/9/96/Relational2RDF.ppt
<mhausenblas> ACTION: mhausenb to put Ahmed's proposals regarding R2RML requirements onto the Wiki [recorded in http://www.w3.org/2009/10/22-RDB2RDF-minutes.html#action02]
<trackbot> Created ACTION-9 - Put Ahmed's proposals regarding R2RML requirements onto the Wiki [on Michael Hausenblas - due 2009-10-29].
I think Ivan can give the presentation.
<Ben> yes
ivan: I'll give the
presentation
... I'm Ivan from OpenLink
... I'm responsible for all the SPARQL
... at least the transformation parts.
<mhausenblas> (background about Ivan see http://www.linkedin.com/in/ivanmikhailov)
ivan: mapping from relational to RDF why?
<MacTed> ah hah -- Zakim says conference is full, so Orri cannot get in.
ivan: we want to save time
<MacTed> I've dropped
phone
...: we want to allow relational data of course to be accessed
by RDF.
Orri?
<scribe> ACTION: hhalpin will add 10 more participants to telecon [recorded in http://www.w3.org/2009/10/22-RDB2RDF-minutes.html#action03]
<trackbot> Created ACTION-10 - Will add 10 more participants to telecon [on Harry Halpin - due 2009-10-29].
<Souri> Please mention the slide# or slide title
ivan - can you put the slide number in IRC?
orri: let's categorize structured
data on the Web
...: by exposing all content from any Web 2.0 application
... data-warehouse people who are deeply concerned with
different identifiers, data not joining cleanly
orri: these two sites have different requirements
<mhausenblas> still slide #2 methinks
orri: if you have a
straightforward mapping
... then you just extract RDF and then load it into RDF.
... if you put more effort into mapping
... you can do it both ways.
... translate SPARQL into SQL
... then you don't have to make a logically equivalent RDF
store.
<mhausenblas> slide #4
orri: this involves converting
everything
... to RDF but it has pros and cons
... the main pro is any query you can do a variable in
predicate
... high risk that it won't work well in SQL, big unions
... another pro is if you want to materialize inference.
... union and join across inferred and non-inferred triples, no
end of trouble
... lots of different data sources
... pouring them all into bucket of RDF might be a good
solution
... so no a priori need to give an exact schema
... if it does not link or havea an intersection, the benefit
is neglible.
... the cons are updating
... latency
... large space, bigger than equivalent relational data.
... lots of research into compression of RDF.
... the task specific though will always be more compact and
more efficient.
whose speaking there?
<mhausenblas> Ashok
Ashok: the queries will be slower against RDF, no?
orri: certainly the case right now
<mhausenblas> slide #5
orri: what are benefits of mapping on demand?
No synchronization to do.
orri: don't have to cross
multiple databases
... if you can push all work into single relational
databases
... you get all benefits of optimization
... if you want to add sources
... then you don't have to copy into RDF, saving space
... Cons are the same, non-SQL sources, inferences, but
experience
... has shown that inference can be done in mapping
... in particular sub-classes and sub-properties
slide #6
orri: about virtuoso
... we do mapping of SPARQL to SQL schema from any relational
database
... not just our own, but do the rest via data access
drivers
... this would allow us to join DB2 and Oracle for
example
... we map it all into Virtuoso SQL
... then we can deal with it as a single SQL dialect
... and so do RDF mapping across single SQL dialect
... we store physical quads
... we store up to 8 billion quads, with full text index
... can do ranking of entities, and do other relational
database functionality, transactions etc.
slide #7
scribe: for mapping to be
useful
... one of the large factors
... SPARQL as a query language
... lacked aggregaton and GROUP BY
... but all vendors added, albeit in a vendor-specific
manner
... SPARQL 1.1 should standardize all of these.
... SPARQL can without reliance on extensions
... derive subqueries etc.
... within the standard.
... if you have this mapping.
... some mappings are still difficult
... we would prefer it if a single SQL statement were produced
even if it was ugly
... as to not have problems re exectuion plans
... you must be fairly clever in mapping
... to do mapping on fly
... must know when some kind of RDF entity (person, article)
can come from any table
... union of customers across many tables, queries then become
joins between unions, a mess to optimize
... so what can be optimized in mapping layer!
... do NOT do needless joins, this mus be done in mapping
slide #8
scribe: cases for
integratin
... similar but hetereogenous schemas combined
... each application has users, comment, comment field
... but each slightly different
... so we map them all into SIOC vocabulary
... six relational tables
... so union of all types of comments to union of all types of
posts
... intelligence is needed there.
Ahmed: How is foreign key being
handled?
... better covered by direct reference, what does that
mean?
orri: we handle foreign
keys
... just like primary keys
... so say order has primary key
... of order number and line number
... so we declare it exactly
... we use the names
... give exactly same declaration of one key
... as to another.
... say that ORDER has customer
... translate ORDER primary key
... into constant
... then translates foreign key to a URI.
... print some expression
cygri: I can explain it
... you define a function
... your input to a function is the IDs with an integer
... output is a URI
... if you define a subject
... you put the primary key into that function
... then the order number would go into the order identifier
function, which generates the subject
... so to get foreign key to order table
... put foreign key value, a column on order table, into the
new table
... and you do the same thing on the customer table
... as you use the same function
... these two will match in the end
orri: you are joining you compare
functions to foreign key to primary key
... but if you know they are bijections
... you can just do normal joins without running
functions
... normalization
... you may want to flatten things
... extra depth, as in SQL views
... more familiar way is to do it via SQL views
... qualify a mapping rule
... this subject will have this subject with this value ONLY if
a certain SQL condition holds
... so we can start making things arbitrary complex
... policy functions can then be implemented in the view, but
also in the database
... so we get benefits that you don't get from RDF warehouse
view.
... kilometers to miles, store once, have a conversion
function. like computed column in a relational table.
... for each entity for each mapping
<mhausenblas> slide #9
orri: you need a way to determine
a URI from a key
... %s done like in C
... then a place holder for domain name
... multi-part keys
... line number, order number
slide #10
TPC-H
minimum baseline
these queries given direct into SPARQL.
Souri: Question about multiple
tables
... what if I made a function that gave things the same
URIs?
... should we prevent that?
<mhausenblas> MacTed?
orri: we don't always forbid
that.
... employee and dept and mixing, but blogposts and wiki
articles you might want to combine
... ask for number 16
... so you might want that.
<mhausenblas> iv_an_ru or MacTed can you provide the correct URI for the TPC-H demo please?
<Marcelo> The URL http://demo.openlinksw.com/tpc-h/ does not work
orri: might want to infer that
some URIs are distinct and should not join
... like joining a qualified subject to an unqualified one
<mhausenblas> well spotted Marcelo, hence I asked Ivan/Ted ... ;)
orri: I will fix URIs!
... all of these 22 queries
... they got SPARQL expression of similar length, all single
SQL converted to single SPARQL queries
... for complicated queries, overhead of mapping can be
neglible
... some tweaks in the SQL
... but we take care of these idioms in a virtual database
layer
... barring pathological queries
... the mapping should not generate substantial penalties
slide #11
scribe: keys in URI
... but the URI does not specifiy what table it came from
... the relational database does not know some of these things
aren't meaningful
... little any database can do
... intelligence must be in mapping layer
... one must know about databases
... in particular what a relational optimizer can and can't
do.
... where data is located
... avoiding joining too much
... a cost model
... so we need to know how databases work.
Ahmed: Are you expecting mapping to have access to remote database stats
orri: If we have some idea
... we can say, import cardinality
... depends on type of remote database
... so we try to push things to remote database
... o/w use a cost model to determine joins
Ahmed: We have a difficulty in keeping these statistics current even with just relational databases
Orri: We keep them current as
much as we can.
... we have maps into integrating vocabularies
... we are presently working on enterprise accounts
slide #14
orri: we use this all internally
at OpenLink
... all our customers have URIs.
... questions?
Ahmed: If you have a 50 terabyte
warehouse
... if you convert that all into RDF
... how big is blow-up?
orri: depends on compression
<mhausenblas> we're already at the top of the hour, so we should do a round-up
orri: columns are generally
preferred
... so we try to stick to that.
<mhausenblas> next week soeren is planned
<mhausenblas> see http://www.w3.org/2001/sw/rdb2rdf/wiki/Initial_Round_of_Presentations
orri: that's the theory
... practice is not quite as good
... 4-5 times bigger is not unreasonable
... so we need to get that better
... but reasons are not fundamental
... we imagine type-specific compression would work in rdf
Ahmed: What's average of slow-down of queries with mapping?
orri: an order of magnitude
... we don't want to replace a warehouse with RDF at the
moment.
<mhausenblas> http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/
mhausenblas: send questions to
mailing list
... soeren is up next week!
... I might become a father on that day!
<scribe> ACTION: hhalpin to ask EricP for back-up [recorded in http://www.w3.org/2009/10/22-RDB2RDF-minutes.html#action04]
<trackbot> Created ACTION-11 - Ask EricP for back-up [on Harry Halpin - due 2009-10-29].
<iv_an_ru> Congrats, Soeren!
<mhausenblas> http://www.slideshare.net/soeren1611/triplify-1341084
orri: so on warehouse side
... if you have regular relational warehouse
... it's not RDF's strong point
... RDF is unbeatable for structured queries against
hetereogenous data
<iv_an_ru> (oops, I've lost the connection)
orri: strong need for queries and schema-less, then we win.
thanks orri!
trackbot, end meeting
This is scribe.perl Revision: 1.135 of Date: 2009/03/02 03:52:20 Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/ Guessing input format: RRSAgent_Text_Format (score 1.00) Succeeded: s/rcygi/cygri/ Found Scribe: hhalpin Inferring ScribeNick: hhalpin WARNING: No "Present: ... " found! Possibly Present: Ahmed Ashok Ashok_Malhotra Ben LeeF MacTed Marcelo Michael OpenLink_Software Orri P25 P39 P4 P52 PROPOSED Seema Souri Stevef aaaa angela_UNITN cgi-irc cygri hhalpin inserted iv_an_ru ivan lma me metatomix mhausenblas nunolopes soeren trackbot whalb You can indicate people for the Present list like this: <dbooth> Present: dbooth jonathan mary <dbooth> Present+ amy Found Date: 22 Oct 2009 Guessing minutes URL: http://www.w3.org/2009/10/22-RDB2RDF-minutes.html People with action items: hhalpin mhausenb[End of scribe.perl diagnostic output]