See also: IRC log
<nunolopes> Zakim, ??P26 is me
<dmcneil> I can do it
<Ashok> scribenick: David
<MacTed> scribenick: dmcneil
minutes accepted
where we were: we spoke last time about what to do
one thing we spoke about was writing some text describing the disconnect between the DM and the R2RML
there was a sense that we should bite the bullet and address this issue by extending R2RML
Richard & Eric wrote up a proposal
<cygri> proposal was here: http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2012May/0075.html
and he thought we had reasonable support for the proposal
but, then David argued that we should do nothing because you can use vendor-specific SQL in R2RML to accomplish the goal of mapping tables without primary keys
<cygri> dmcneil: core issue is that vendor-specific sql is needed
<cygri> ... i argue best approch is to do it with views
<cygri> ... because that puts the burden on the DM implementer
<Ashok> David: I think the problem can be done with views, so no change is needed
<cygri> ... whatever SQL query would be needed to actually compute the DM, just write that into a view
<juansequeda> Which vender does not have a generate_series/rownum function?
<cygri> ... example, postgres has a function for computing sequences. same with other DBs
<cygri> ... then the usual R2RML mechanisms can be used to generate blank nodes
+q
<cygri> ashok: so we need to find out whether all the DBs have some mechanism like that?
<Ashok> Need to find out whether SQL Server and DB2 has such a function
<cygri> dmcneil: suppose DB2 doesn't have it. how does hiding this behind RowBlankNode help then?
cygri: once concern is that the
way R2RML is defined in terms of core SQL 2008
... we acknowledged that people would use vendor specific
SQL
... the way we dealt with that is that if you wnat to use db
specific dialect of SQL then you are defining a vendor specific
extension of R2RML
... taking that to it's logical conclusion, then we are saying
that R2RML as specified cannot be used to implement the Direct
Mapping
... since the general assumption is it is not possible to
support this case with generic SQL
... so the user must immediately extend R2RML for this case
+q
scribe: this is not a particularly pleasant situation
ashok: neither of this is perfect
dmcneil: I think that is a specious argument, because we expect that most mappings will have vendor-specific SQL in them
cygri: correction, by default embedded SQL is expected to be SQL CORE 2008
macted: if we did that (not sure what "that" references) it was a serious mistake
<cygri> http://www.w3.org/2001/sw/rdb2rdf/r2rml/#conformance
<Ashok> This specification defines R2RML for databases that conform to Core SQL 2008, as defined in ISO/IEC 9075-1:2008 [SQL1] and ISO/IEC 9075-2:2008 [SQL2]. Processors and mappings may have to deviate from the R2RML specification in order to support databases that do not conform to this version of SQL.
from the spec "The absence of a SQL version identifier indicates that no claim to Core SQL 2008 conformance is made."
+q
MacTed: suggests that he thought we had agreement last week
discussion of who dissented
cygri: eric strongly believes that preserving cardinality is important
macted: let them choose the DM variant that preserves it then
cygri: the argument was made that
the cardinality preserving option is more correct
... I argued that that is just an option, implementations are
free to implement it
... was also ok with putting in warnings about the
non-cardinality preserving option
... from last week we said if we had more time we would define
a way to make this work in R2RML
... the point was raised that for backwards compatibility we
cannot remove it later
... suggested wording that "we might remove this in the future"
was not well received
... also, they don't want options in the DM, just a single
monolithic approach
ashok: last week, we spoke about
some text that ivan had crafted
... are you speaking about that, or a previous position they
had
cygri: I am speaking about some
text that I drafted
... ivan drafted some text proposing no change but saying they
are incompatible
<cygri> http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2012May/0054.html
<Ashok> PROPOSAL A: In the DM spec, replace the following text: [[ If the table has no primary key, the row node is a fresh blank node that is unique to this row. ]] with this: [[ If the table has no primary key, the row node is a blank node. Distinct blank nodes MUST be generated for rows with distinct column values. For duplicate rows with identical values, implementations SHOULD generate a fresh blank for each duplicate row (resulting in a non-lean RDF graph [R
<juansequeda> I still support Proposal A :)
<Ashok> o In the DM, instead of "is intended to provide a default behavior for R2RML: RDB to RDF Mapping Language" say "is intended to provide a default behavior for R2RML: RDB to RDF Mapping Language for tables which have at least one unique key" o Add to the R2RML document (probably in the intro part): "R2RML implementations are encouraged to provide a default mapping equivalent to the Direct Mapping for tables which have at least one unique key" o Add a Note
cygri: last week we said we would
explore adding something to R2RML
... then the objection from david, which i think is reasonable,
I can see where he is coming from
... did eric come up with a use case?
<cygri> http://www.w3.org/2001/sw/rdb2rdf/wiki/Non-unique_Tables
ashok: yes, it is on the wiki,
but it is a bit complicated
... do you support one of these two proposals
dmcneil: i think, that other than doing nothing, adding a rownum column to R2RML is the most interesting
cygri: the problem with that
approach is that it leads too much to a particular
implementation
... implementations could choose something more efficient than
rownum
... for example, this mysql query pasted (not sure which query)
numbers the rows, but doesn't use rownum
<Souri_> how about rr:genRowId ?
cygri: since rownum forces a particular approach that requires the user to spell out what the blank node identifier looks like
<Ashok> In a ROWNUM capable DB, the mapping processor implicitly converts it to the following R2RML mapping (the actual implementation may vary from DB to DB based upon how the equivalent of ROWNUM can be implemented) <Tmap1> rr:logicalTable [ rr:sqlQuery """ Select ROWNUM AS "rr:rownum", t.* from Wonderland t order by "rr:rownum" """ ] rr:subjectMap [ rr:template "http://Wonderland/my_rownum={\"rr:rownum\"}" ] We can also say that rr:rownum cannot be used when log
<cygri> A SQL query is a SELECT query in the SQL language that can be executed over the input database. The string must conform to the production <direct select statement: multiple rows> in [SQL2]
dmcneil: since SQL defaults to not be SQL 2008 in an R2RML view, this means adding an R2RML view by default makes a vendor specific mapping
cygri: actually the SQL version identifier doesn't affect the processing at all, per the spec
souri: regarding the rownum
discussion
... for every row we need a unique id
... if we knew the target database, then we could write
vendor-specific SQL
... but, if we present that to the user, then the user may not
understand that and it is not portable to other database
backends
... therefore we want a logical representation
... if "rownum" has too much meaning with it
... then we can use "rowidentifier" or something
... this could be used in blank node generation or URI
generation
... we need one R2RML construct, which provides a point of
indirection for whatever vendor-specific mechanism it will be
translated to
ashok: isn't the ability to add a column that gets it's values from a function part of SQL-2008?
souri: not sure, but this is very common
ashok: so we could add a column in the SQL query that gives distinct numbers
souri: a sequence generator,
right?
... in Oracle, access to the sequence generator is a DML
operation
cygri: regarding souri's point
about semantics of the pseudo-rownum column
... agree all we need is an identifier
... understand that calling it rownum, does not mean it must be
literally the db's rownum
... just one tiny step from that to the RowBlankNode
proposal
... we leave it completely up to the implementation what the
blank node template it
+q
scribe: since blank node
identifiers have no semantics, they just must be unique
... so i can't see the usefullness of letting the user put the
rownum into a template
dmcneil: we worked out how
regular blank node ids work, we based that on whether the blank
node template produces the same value
... would need that same capability for these new RowBlankNode
things
MacTed: the question comes down to doing a dump of the data
+q
<Zakim> cygri, you wanted to mention jena api
dmcneil: I am talking about within the context of a query, not between queries
cygri: ted, what you say is
completely correct for SPARQL queries
... can't tell if the blank node ids between queries are the
same resource or not
... per the spec
... but, looking at the jena api
... must look at constraints of jena api
... which expects blank nodes to have a persistent
identity
... this leads to the need to go back to the database and graph
properties for a specific blank node
ted: once you go back to the database, you cannot rely on the blank nodes IDs
yes
+q
cygri: the RDF working group has been arguing about this for a year
several people said they don't want to talk about blank node ID
dmcneil: this is relevant
<cygri> dmcneil, i believe it is all worked out in the proposal. at least i thought about it hard. i need to answer your email, sorry i didn't get around to do that yet
dmcneil: because we worked out the blank node ID semantics carefully for the existing mechanism, not so much for the new RowBlankNode
+q
ashok: we have two options on the table, how to proceed
<Souri_> Is that all the disagreement about? Nothing else?
dmcneil: there is a third option, let the DM use R2RML views to implement this
ashok: I like that option, but I thought richard disagreed
cygri: my position is: it can be
done, but it cannot be done in a way that conforms with the
spec
... because it requires vendor-specific SQL
... it is going to be slow, and no way to make it fast
ashok: some would argue that since SQL is such a sprawling spec, anytime someone writes SQL they are using vendor-specific SQL
cygri: if the argument is that the DM can be implemented on R2RML, then the question is how?
dmcneil: the DM is implemented on a specific database, so the DM generates an R2RML view that uses that database's features to implement the DM
cygri: that leads to very in-efficient implementations
<Souri_> Identical rows: We do not care whether they are assigned Ids <1,2,...,n> or <n,...,2,1> (because their contents are identical)
cygri: stable identifiers are needed so the next query produces the same identifiers
macted: you are not going to get
it
... the database is free to change the ids
ashok: saying the identifiers are
stable is going beyond the spec
... three choices:
... 1) do nothing
<Souri_> Two rows <a, b, c1> and <a, b,c2> may be assigned Ids _:b1 and _:b2 during one access and _:b2 and _:b1 during another access. Is this acceptable or not?
ashok: 2) Richard's
proposal
... 3) Souri's proposal
-q
<cygri> souri, yes it is. but if they are _:b3 and _:b4 that's not acceptable
scribe: how do we come to agreement?
macted: do nothing is not an option because currently there is a "must" in there
cygri: "do nothing" means
... DM and R2RML are two entirely separate beasts
... just a violation of the basic premise of the DM
+q
eric: it could still be the default behavior for all but the case of duplicate keys
souri: still trying to understand
the requirement
... does the blank node ID need to be stable, or can it
change?
cygri: yes, they can be scrambled but they cannot change to a different set of blank node IDs
macted: what!?
<MacTed> _blank1 a, b, c1
<MacTed> _blank2 a, b, c2
<MacTed> _blank2 a, b, c1
<MacTed> _blank1 a, b, c2
cygri: oh, then i misunderstood
the example
... still the case without a primary key, right?
... in this case it would have to be a stable blank node label
across queries
souri: inside the same
translation, you may access the table twice, in two
places
... the access order may be different
... based on that you may not be able to join the same
rows
... so this even applies within the scope of the same query
process
macted: how is this relevant?
souri: if we generate a blank
node ID for a row
... then the same row from different parts of a query should
generate the same blank node ID so they can be joined
cygri: yes, that is what is required, and that is part of what it makes it so difficult
macted: my model was that the blank node IDs are generated on the result set, not during the query
<Souri_> {?p :fname ?fnm} ... complex stuff ... {?p :lname ?lnm}
macted: this group is not about translating SPARQL to SQL
cygri: the RDF concepts doc says "you don't know anything about blank nodes except whether they are the same"
macted: but that applies to query results, not the underlying data
<cygri> ericP: counter examples: jena, 4store, …
<MacTed> +1 ericP
dmcneil: I think Richard's earlier statement that "doing nothing means DM and R2RML are completely separate" is quite overstated
souri: I am still trying to understand the target for generating blank node IDs
macted: for DM we need to
maintain cardinality
... i.e. every row in the result set
eric: the value of a bnode is not
something that can be referenced later
... the jena API over a SPARQL endpoint does not allow bnode
IDs to be submitted again in subsequent queries
... so it is ok that Jena over SQL does not provide persistent
bnode IDs
ashok: we are over time
... it is not clear how to make progress
... need either new proposals, or someone to change their
position
juan: can we summarize the current options and who supports them?
ashok: if we only talk about
changes to R2RML, then yes there are 3 options
... 1) do nothing
... 2) Richard's idea - add RowBlankNode
... 3) Souri's idea - add psuedo-column: "rowidentifier"
... 4) add wording saying "the DM is different in this special
case"
macted: I am less clear on these options than when we started
<cygri> 2) and 3) are variations of “fix R2RML”
<cygri> 4) is proposal A from last time
<cygri> 1) is B from last time
macted: what happened to last week's proposal to strike the word "should"
cygri: that is now option 4
macted: i still like 4
juan: me too
<MacTed> >>> [[
<MacTed> >>> If the table has no primary key, the row node is a blank node. Distinct blank nodes MUST be generated for rows with distinct column values. For duplicate rows with identical values, implementations SHOULD generate a fresh blank for each duplicate row (resulting in a non-lean RDF graph [RDF Semantics]). However, if the underlying database system does not provide any means to reliably differentiate among the rows, then
<MacTed> implementations MAY re-use the same blank node for multiple duplicate rows (resulting in a lean RDF graph). Implementations SHOULD document and advertise their chosen behavior.
<MacTed> >>> ]]
<MacTed> The above replaces the following sentence in the current DM spec --
<MacTed> >>> [[
<MacTed> >>> If the table has no primary key, the row node is a fresh blank node that is unique to this row.
<MacTed> >>> ]]
eric: I object to losing cardinality on the basis of something that R2RML cannot do
ashok: would you be willing to word-smith it?
eric: no, because I disagree with
the premise of losing cardinality
... from the DMs perspective there is no reason the MUST should
be relaxed to SHOULD
<cygri> i don't know how to handle it *with acceptable performance*, i should say
eric: why are we breaking interop on the DM because R2RML cannot handle this case?
+q
dmcneil: but R2RML can handle it, use R2RML views
eric: why don't we just tell
R2RML users that they are losing cardinality in these
cases?
... there is no issue in DM, the issue is an interop issue in
R2RML
cygri: I was working on the
assumption that the DM is a default mapping for R2RML
... that should answer the question of why I expect the DM to
accomodate the capabilities of R2RML
... if there are restrictions in R2RML, which is 1.0,
then...
eric: we have a case where R2RML
cannot preserve cardinality
... we have a DM which provides default mapping for R2RML
... the places where R2RML cannot preserve cardinality, should
be identified as the places where problems will occur
cygri: R2RML as it stands, the user specifies the identities of the rows
+1
scribe: by specifying the columns
or the templates
... if they lose cardinality they lose it because of how it was
mapped, it is transparent
+1
scribe: the other point is: what
do you suggest for me as an R2RML implementor
... push a button and get an automatic mapping
... what should that mapping be in the case of a table without
primary keys
eric: it should be what it is, just not promise to be the DM
cygri: how to communicate to users that it is not the DM?
eric: tell them R2RML does not have the ability to preserve cardinality in this case
cygri: how should we describe the default mapping we implement?
eric: say "it is similar to the DM except repeated rows will be collapsed into one"
cygri: can we write that into the R2RML spec?
eric: yes, that would be good
cygri: so remove the R2RML
reference from DM
... instead add a sentence to the R2RML spec
... saying the default mapping is the "DM - repeated rows"
ashok: why remove R2RML ref from DM?
<cygri> The Direct Mapping is intended to provide a default behavior for http://www.w3.org/TR/2012/CR-r2rml-20120223/ [R2RML].
cygri: there is a sentence in DM
saying DM is default behavior for R2RML
... that sentence must go
eric: we could add a caveat to it
ashok: eric can you work with richard and david on this?
eric: yes, i think the wording is close to what ivan proposed
cygri: it would also have to address the repeated rows caveat
seems to be concensus that we will try to develop wording around this
ashok: we will try to work it out in email
thanks: )
This is scribe.perl Revision: 1.136 of Date: 2011/05/12 12:01:43 Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/ Guessing input format: RRSAgent_Text_Format (score 1.00) Succeeded: s/has/had/ Succeeded: s/teh/the/ Found ScribeNick: David WARNING: No scribe lines found matching ScribeNick pattern: <David> ... Found ScribeNick: dmcneil Inferring Scribes: David, dmcneil Scribes: David, dmcneil ScribeNicks: David, dmcneil Default Present: +1.781.273.aaaa, MacTed, Ashok_Malhotra, +1.314.395.aabb, +3539149aacc, cygri, +575737aadd, juansequeda, nunolopes, dmcneil, +1.603.897.aaee, +1.603.897.aaff, EricP Present: Richard David Ted Ashok Juan Nuno Souri Seema Eric Regrets: Eric Ivan Michael Boris Got date from IRC log name: 15 May 2012 Guessing minutes URL: http://www.w3.org/2012/05/15-RDB2RDF-minutes.html People with action items:[End of scribe.perl diagnostic output]