This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 24569 - Least common types and lattices
Summary: Least common types and lattices
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: XSLT 3.0 (show other bugs)
Version: Last Call drafts
Hardware: PC All
: P2 normal
Target Milestone: ---
Assignee: Michael Kay
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on: 24568
Blocks:
  Show dependency treegraph
 
Reported: 2014-02-06 21:34 UTC by C. M. Sperberg-McQueen
Modified: 2014-05-15 14:00 UTC (History)
0 users

See Also:


Attachments

Description C. M. Sperberg-McQueen 2014-02-06 21:34:58 UTC
In section 19.2 [1], XSLT 3.0 says that two items do not necessarily have a unique  least common type, because item types form a lattice, not a hierarchy:

  In some cases the above entries require computation of the least 
  common type of two types T and U. Since item types form a lattice 
  rather than a hierarchy, there may be a set of types V such that 
  T and U are both subtypes of every type in V, and no type in V 
  is unambiguously the "least" common type in the sense that all 
  the others are subtypes of it. In this situation the choice of 
  which type in V to use as the inferred static type is 
  implementation-defined.

[1] http://www.w3.org/TR/xslt-30/#determining-static-type

I believe that in a lattice, any two points have a unique least upper bound or join -- if they don't, one is not dealing with a lattice.

The claim that itemtypes form a lattice is also found in XDM [2].

[2] http://www.w3.org/TR/xpath-datamodel-30/#types-hierarchy

If any two types do in fact have a least common type, then the paragraph quoted above from section 19.2 should probably go away; if we do have pairs of types that lack a unique least common type, then the claim that our types form a lattice should be dropped from both specs and any other spec that makes it.
Comment 1 Michael Kay 2014-02-07 00:41:58 UTC
I have to confess that I was quite unaware of this mathematical definition of a lattice, as will be evident to any informed reader of this paragraph!

I'm having trouble reconstructing the thinking that led me to believe that the lowest common supertype of two types was not unique. Certainly, I'm having trouble finding a counter-example. I may have simply been assuming that there was no unique least common supertype on the basis that I could see no proof that there was.

Perhaps we need to distinguish the set of types that are actually defined in the in-scope schema definitions from the set of types that the type system permits to be defined. This only really makes a difference for unions, where we can easily construct a counter-example if we restrict ourselves to named union types: if union(A, B, C) and union(A, B, D) both "exist" and union(A, B) does not "exist", then union(A, B, C) and union(A, B, D) are both common supertypes of A and B, and neither is the lesser.

But for the purpose of type inference in section 19.2, I don't think we're constrained to infer a type that is present in the in-scope schema definitions, so this counter-example isn't pertinent. And I can't think of another (though I have no idea how to go about proving its absence).
Comment 2 C. M. Sperberg-McQueen 2014-02-07 02:50:23 UTC
A simple case for which two XSD simple types do not have a least common supertype expressible in XSD:  given simple types A and B, the union of A and B is a supertype (as defined by the XDM / XPath substitution rules, and also by XSD substitutability), and so is the union of B and A (which is distinct in XSD, since unions are ordered).

For a few minutes today I thought that it might be the case that function items might also lack a least common supertype, but I have come to believe that for any functions involving simple types that do have least common supertypes and greatest common subtypes, there will be a least common supertype.  However, when functions are defined on unions, the problem mentioned above will prevent a unique solution.

The only caveat here is that the example of union(A,B) and union(B,A) applies to types as they are defined by XSD.  Whether it also applies to XSD types as they occur filtered through the XDM spec is currently beyond my ken.  

See also bug 24568 against XDM.
Comment 3 Michael Kay 2014-02-07 09:31:19 UTC
The only XSD union types that are item types are "pure union types" as defined in 

http://www.w3.org/TR/xpath-30/#dt-pure-union-type

I believe it is a consequence of the rules in

http://www.w3.org/TR/xpath-30/#id-itemtype-subtype

that given two union types U = union(A, B) and V = union(B, A), then U and V are "equal" in the type graph, that is, they are subtypes of each other and they have the same supertypes and subtypes, even though they are distinct in the sense that they define different cast operations on strings.
Comment 4 Michael Kay 2014-02-07 09:48:41 UTC
I just blurted:

>then U and V are "equal" in the type graph, that is, they are subtypes of each other 

which has the implication that the type graph contains a cycle. Is that in itself enough to show that the graph is not a lattice?
Comment 5 Michael Kay 2014-02-07 10:06:24 UTC
Perhaps there is a line of reasoning as follows. The subset relation defines a lattice (see http://www.proofwiki.org/wiki/Power_Set_is_Complete_Lattice). Every type has a value space S(T), and if T is a subtype of U then S(T) is a subset of S(U) (However, the converse does not follow). 

This means that if we consider the value spaces defined by our types, then the value spaces define a lattice under the subset relation. The types themselves do not necessarily form a lattice, because two different types may have the same value space.

When we do type inference for streamability, we are only interested in the value space, and not in any other properties of the type. So we can regard two types that have the same value space as equivalent for the purpose. If equivalent types in this sense are regarded as a single type, then the types (hopefully) form a lattice.
Comment 6 C. M. Sperberg-McQueen 2014-02-13 11:21:32 UTC
We discussed this at our face-to-face meeting in Prague.

We concluded that whether we have a type lattice depends on exactly what we wish to regard as a "type" for purposes of this analysis.

One solution would be to specify that the points in the lattice are groups of types equivalent in the sense that subtype-itemtype(A,B) and subtype-itemtype(B,A).

Another would be to ignore whether itemtypes are or are not a lattice, and simply define an ad hoc type inferencing scheme for use here.  We are interested primarily in two bits of information:  (a) can there be children? (b) can members of the set be numeric?  It might be possible to capture this with a very simple type hierarchy, or a simple 2x2 diagram.  

A 2x2 matrix could give us better information.  For example it would allow us to infer that the static type of (element(), xs:string) is non-numeric, whereas a type hierarchy or lattice that looks anything like what's described in our specs must end up with a least common supertype of item() or something similar.

We concluded that this issue needs further study.
Comment 7 Michael Kay 2014-02-14 07:49:51 UTC
One way forward on this might be to say that the inferred type is always a subset of the set of disjoint types {decimal, double, float, boolean, string, dateTime, ..., document-node(), element(), attribute(), text(), ..., function(*)}. The "lowest common supertype" of two types is then simply the union of the two sets. An expression is then "possibly numeric" if it includes decimal, double, or float in the subset, and it is "possibly parental" if it includes document-node() or element() in the subset.
Comment 8 C. M. Sperberg-McQueen 2014-04-10 16:28:16 UTC
See also some further discussion in the thread beginning at  https://lists.w3.org/Archives/Member/w3c-xsl-wg/2014Apr/0019.html (member-only link).
Comment 9 Michael Kay 2014-04-24 14:45:06 UTC
Following email and telcon discussion, section 19.2 has been recast to use the new concspt of U-types, which can be cleanly composed under union, intersection, and difference.