Warning:
This wiki has been archived and is now read-only.

CommentResponse:JP-2

From SPARQL Working Group
Jump to: navigation, search

Draft

Hi Jorge,

XPath is designed for XML processing where XML nodes and values are treated in different ways. XPath evaluation returns distinct XML nodes, but duplicate values. One evaluation of an XPath expression can't mix XML nodes and values - see the numbered list in [1]. "XQuery 1.0 and XPath 2.0 Functions and Operators" has an operation fn:distinct-values to make values unique in a sequence [2].

An RDF graph does not have this distinction of nodes and values. Graph nodes (vertexes) are IRIs, blank nodes or literals with no separation. Repetition of literals is significant, consider SUM applied to a purchase order where two items have the same price, so multiple paths to the same endpoint do matter.

SPARQL property paths do not apply uniqueness to property paths and the property path expression is, where appropriate, the same the expansion in terms of triple patterns. It is not a matter of efficiency because the answers concerning duplicate literal values would be rather unexpected if only distinct values were returned.

This leaves the ArbitraryLengthPath operation for the use of "+" in paths. This traverses cycles once by terminating the search on encountering an edge already traversed for that evaluation of ArbitraryLengthPath. In an earlier design, cycle termination was by detecting visiting nodes but the WG considers the edge traversal a better choice. The new design is one more step of evaluation on a cycle than the first design and leaves better prospects for future standardization.

SPARQL has the keyword DISTINCT so an application can choose between duplicates and no duplicates. A query engine can exploit this if it chooses to; use with sub-queries mean that solution modifiers can be applied to specific parts of the query such as a path.

An implementation is free to implement evaluation in anyway it chooses proved it results in the same answers. The WG felt that using an algorithm was the most helpful way to specify the feature, especially to implementers.

Property paths have been implemented in a number of systems (see [3] for a partial list) and found to be useful.

We would be grateful if you would acknowledge that your comment has been answered by sending a reply to this mailing list.

Andy
On behalf of the SPARQL working group.

[1] http://www.w3.org/TR/xpath20/#id-path-expressions
[2] http://www.w3.org/TR/xpath-functions/#func-distinct-values
[3] http://esw.w3.org/SPARQL/Extensions/Paths


Hello Andy,

Thank you very much for your response and for considering my comments,
and sorry for the late reply.

There is a couple of comments that you have not answered.

""
As a separate but very important issue, notice that the XPath language
does not consider duplicate paths when evaluating expressions (XPath
is evaluated in the "there exists" way that I mentioned before). Thus,
counting paths in SPARQL would be somewhat in contradiction with
previously proposed path languages considered by the W3C.
""

I think that if this W3C Recommendation is in discordance with a
previous Recommendation about a similar topic, then DAWG should have
strong reasons for that, and make them clear in the specification. The
specification should also advice the reader about this issue.

Besides that comment, you have said nothing about efficiency of
evaluation. Notice that this not related to a particular way of
implementing the language. It is about the huge efficiency impact that
any implementation will suffer in practice. You have not acknowledge
that in your response. Have you consider this as an issue?

Another comment that is not covered by your response is whether there
exists a use case that demand counting different paths. In your
response, it seems that the reason for counting paths is to make
easier the job of the implementors (by reusing algebra operators).
Opposite to what the group think, I think that not counting paths
gives the implementor more freedom since paths could be implemented in
several different ways, being just one of them by reusing algebra
operators. Can you please clarify whether there are use cases about
this? This would help a lot.

If you respond to the comments above I can consider my comments answered.

I have a couple of additional words. Please do not consider them as a
formal objection to the process, but just as my opinion.

I still strongly disagree with your design decisions about property
paths. In particular, I insist that it is a mistake to define the
semantics in the presence of cycles in a non-standard way and by
forcing a particular algorithm to evaluate them. In your response you
say that there can be corner cases, but it is not only a problem of
corner cases. From my point of view it will become a problem of
adoption of the standard. In this point I think that the group should
not neglect that there is a lot of related (theoretical and practical)
work in this area that have handled cycles in a completely different
way.

To conclude, I do think that the property-paths material in the
current specification is far from being mature. Considering that the
group is in a tight schedule, I think that it would be better to not
include property paths in this round of standardization, than
including them in their current form.

Thank you very much for considering my comments.
- jorge