This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 5223 - [XPath] Casting rules in 3.5.2 General Comparisons (editorial)
Summary: [XPath] Casting rules in 3.5.2 General Comparisons (editorial)
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: XPath 2.0 (show other bugs)
Version: Recommendation
Hardware: PC Windows XP
: P2 normal
Target Milestone: ---
Assignee: Don Chamberlin
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-10-23 20:59 UTC by Hans-Juergen Rennau
Modified: 2008-02-26 20:24 UTC (History)
0 users

See Also:


Attachments

Description Hans-Juergen Rennau 2007-10-23 20:59:34 UTC
The casting rules for xs:untypedAtomic as described in items a-c depend on whether the situation that "the other operand is an instance of T" is given or not when the other operand's dynamic type is a subtype of T. 

Consider these example:
ex 1:    <a>1</a> = xs:Name("A1")
ex 2:    <a>A1 </a> = xs:Name("A1")

If the right operand is "an instance of xs:string", ex1 and ex2 yield false. Otherwise, ex1 yields a cast error, and ex2 yields true.

Testing 3 major processors, I found both behaviours!

I believe "is an instance of T" is meant in accordance with the instance operator, that is, to include all subtypes of T. However, the term is nowhere defined, and, interestingly, not used when explaining the semantics of the instance operator. Therefore perhaps it would be worthwhile to add a note to b), making it clear that the situation "other operand is an instance of a subtype of xs:string" is included.

With kind regards,
Hans-Juergen Rennau
Comment 1 Michael Kay 2007-10-23 21:19:32 UTC
I think "A is an instance of T" definitely includes the case where A is an instance of a subtype of T. I haven't looked to see whether we say that clearly anywhere, but it is undoubtedly the intent.

The interesting thing about your example is that it sheds new light on the phrase "is cast to the dynamic type of the other value". I had always assumed that it was intended that one should cast to the primitive type of the other value, that is, in your example, to cast to xs:string. In fact it never occurred to me that casting to xs:Name could give a different result for the comparison, but your example clearly shows that because of whitespace normalization, it can.

I find it hard to believe that we really intended to require casting to the derived type, because that would cause a large number of errors in places where a false result is surely more reasonable. Also, instead of optimization using indexes or hash tables being difficult, it becomes virtually impossible. It would also defy expectations on substitutability: if a developer writes //a[.=1000] in the knowledge and belief that a is typed as xs:int, it's unreasonable that this should fail at run-time because someone has created a subtype in which a is an xs:byte.

Michael Kay
Comment 2 Michael Kay 2007-10-24 08:22:27 UTC
Reassigned to XPath.

I realized that my rationale in comment #1 was OK in principle, but flawed in the detail. Here is a better example.

Consider the following function

<xsl:function name="x" as="xs:boolean">
  <xsl:param name="y" as="xs:integer"/>
  <xsl:sequence select="exists($input//a[.=$y])"/>
</xsl:function>

where $input is untyped, and it is known that the <a> elements have values whose lexical form makes them castable to integer.

Now it seems entirely unreasonable to me that this function should cause a dynamic error when someone calls it supplying a value of type xs:negativeInteger, merely because casting one of the <a> values to xs:negativeInteger fails. The writer of the function should not have to defend against that possibility.

Michael Kay
Comment 3 Hans-Juergen Rennau 2007-10-28 23:09:47 UTC
(In reply to comment #2)

It took me some time to fully understand the implications of your remarks! We must pay attention to the particular situation when one operand is a formal parameter, which implies: the query writer has no possibility whatsoever to know *exactly* which subtype of the formal parameter type has been provided by the function call (unless he himself wrote the call, of course).

The "right" to provide any subtype of the formal parameter type is really a vital aspect of function call semantics! This implies a general rule, pertaining to the semantics of any expression (excepting sequence-type related expressions like "typeswitch"): the semantics should warrant that changing any subexpression's type annotation to a derived type does not affect the expression's evaluation result. (In a P.S. I try to formulate this rule more formally.)

So your remarks reveal that the present semantics of general comparisons should be changed indeed because they constitute a conceptual bug - not less. (Although practical consequences will be very rare, because present rules a) and b) exclude any trouble as long as the operand compared with the untypedAtomic operand is any numeric type, or a string-derived type.) Here comes a proposal for new rules, which should replace "3.5.2 General Comparisons, rules a) to c):

<proposedNewText>
(a) If both atomic values are instances of xs:untypedAtomic, then the values are cast to the type xs:string.
(b) If exactly one of the atomic values is an instance of xs:untypedAtomic, it is cast to a type depending on the other value's dynamic type T according to the following rules, in which V denotes the value to be cast:
(b1) If T is an instance of a numeric type, V is cast to xs:double
(b2) If T is an instance of xs:dayTimeDuration, V is cast to xs:dayTimeDuration
(b3) If T is an instance of xs:yearMonthDuration, V is cast to xs:yearMonthDuration
(b4) In all other cases, V is cast to the primitive base type of T

Note:
The special treatment of the duration types is required to avoid errors that may arise when comparing the primitive type xs:duration with any duration type.
</proposedNewText>

Finally, one question concerning the rule: 
<quote>
If a cast operation called for by these rules is not successful, a dynamic error is raised.
</quote>

Might we not completely drop this rule? It constitutes a permanent threat to queries' runtime safety, and what does it protect, which quality does it assert? It seems quite natural to discard any value pair where the cast is not possible as simply not having the required magnitude relationship.

With kind regards -
Hans-Juergen Rennau


P.S.
An attempt at a formal rule to be observed when defining expression semantics in order to protect the "right" of a function caller to provide a formal parameter's subtype.

<rule>
Consider an expression E containing a subexpression U which has the value V of type T. Let E neither contain any type-related subexpression (like typeswitch) nor explicitly refer to any type S that is a subtype of T (like "let $x as S := ..."). For any value V1 from the value space of T let V2 be a value obtained by replacing the type annotation of V1 by a subtype of T. Then XPath expression semantics SHOULD guarantee the following rules, where E(V) denotes the value of E, as dependent on the value V of subexpression U:
- if E(V1) raises an error, E(V2) raises an error
- if E(V1) evaluates to a value V3, E(V2) evaluates to a value V4 which can be obtained from V3 by replacing the type annotation by a subtype.
</rule>

In particular, if a certain value V of the subexpression raises no error, submitting the same value with a subtyped type annotation should also raise no error. And this requirement is exactly what the present rules of 3.5.2 do not meet.

P.P.S Privetik ot zheni i Marini.
Comment 4 Michael Kay 2007-10-29 09:03:32 UTC
Thanks for your comment - your proposed reformulation of the type conversion rule seems very precise and (speaking personally of course) I favour it.

We had a lot of debates about the "fail vs. return false" question. On the whole I was personally inclined to favour the "return false" approach. In fact this semantic is the one that was eventually adopted for some analogous cases including the functions distinct-values(), index-of(), and deep-equal() (and also for key() in XSLT). It's also (more-or-less) the semantics we adopted for pattern matching in XSLT: a failure during attempted matching is treated as no-match.

Although I would have preferred the "return false" behaviour, I don't think there is any rationale that would justify a change to the spec at this stage. However, there is some latitude under the "errors an optimization" rules.

Arguably a conformant implementation could exploit the "errors and optimization" rules to deliver false for the general comparison "a"=3. In section 2.3.4 we discuss the example //product[id = 47]. We say "if an implementation can find (for example, by using an index) the product element-nodes that have an id child with the value 47, it is allowed to return these nodes as the result of the path expression, without searching for another product node that would raise an error because it has an id child whose value is not an integer." I think this includes the case where the set of product elements with id=47 is empty; in this case we can return an empty sequence without testing that all id's are numeric. That's equivalent to returning false rather than an error from the general comparison.

There is of course one big disadvantage to returning false in such cases - it makes it quite hard for the user who has made a genuine mistake (like writing id=47 instead of id='47') to work out what has gone wrong. This I think is the reason the spec is written as it is.

Michael Kay
Comment 5 Michael Kay 2007-10-30 16:03:37 UTC
(Discussed on 2007-10-30. No clear consensus on what the original intention of the WG was - some thought we intended the untypedAtomic value to be cast to the primitive type, some that we intended it to be cast to the specific type, others that we never gave the question any thought... Will come back to it.)
Comment 6 Don Chamberlin 2008-02-26 20:10:46 UTC
This bug report was discussed by the working group on 26 Feb 2008. The group decided to accept the changes labeled as <proposedNewText> in Comment #3 and to make no other changes. Hans-Juergen, if this resolution is acceptable to you, please change the status of this bug to "Closed".
Regards,
Don Chamberlin (for the Query and XSL working groups)