2790 – Instance of with union type results in surprising results

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 2790 - Instance of with union type results in surprising results

Summary: Instance of with union type results in surprising results

Status:	CLOSED FIXED

Alias:	None

Product:	XPath / XQuery / XSLT
Classification:	Unclassified
Component:	Data Model 1.0 (show other bugs)
Version:	Candidate Recommendation
Hardware:	PC Windows XP

Importance:	P2 normal
Target Milestone:	---
Assignee:	Norman Walsh
QA Contact:	Mailing list for public feedback on specs from XSL and XML Query WGs

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2006-02-01 16:14 UTC by Michael Rys
Modified:	2006-06-29 18:57 UTC (History)
CC List:	0 users

See Also:

Attachments
Proposals adopted for bugs 2768 and 2790. (18.47 KB, text/html) 2006-04-03 17:41 UTC, Jonathan Robie	Details

Description Michael Rys 2006-02-01 16:14:51 UTC

Let's assume that we have

Schema:

declare element E of type U
define type U restricts xs:anySimpleType { T1 | T2 }
define type T1 restricts xs:int
define type T2 restricts xs:string

Instances that are validated according to the schema

<E>42</E>
<E xsi:type="T2">42</E>

The question is what the result of the following queries are:

Q1: for $e in /E 
return $e instance of element(E, U)
 
Q2: for $e in /E 
return $e instance of element(E, T1)

Q3: for $e in /E 
return $e instance of element(E, T2)

Q4: for $e in /E
return data($e) instance of T1

Q5: for $e in /E
return data($e) instance of T2

Q6: for $e in /E
return data($e) instance of U

Let's look at the validation and data model generation where I still think we 
have a need for further clarification.

XSD and PSVI generation: This is not fully clear yet. We all agree that the 
types T1 and T2 are not subtypes in the XQuery type system but that they are 
member types of the union type U. 

This is what I found in the XML Schema document about this type of validation 
(and to be honest, I can not clearly understand how this applies to the given 
example):

Schema Information Set Contribution: Element Validated by Type

If an element information item is ·valid· with respect to a ·type definition· 
as per Element Locally Valid (Type) (§3.3.4), in the ·post-schema-validation 
infoset· the item has a property: 

PSVI Contributions for element information items 
[schema normalized value] 

The appropriate case among the following: 
1. If clause 3.2 of Element Locally Valid (Element) (§3.3.4) and 
   Element Default Value (§3.3.5) above have not applied and either 
   the ·type definition· is a simple type definition or its {content 
   type} is a simple type definition, then the ·normalized value· of 
   the item as ·validated·.

2. otherwise ·absent·.

Furthermore, the item has one of the following alternative sets of properties: 

Either 

PSVI Contributions for element information items 
[type definition] 
An ·item isomorphic· to the ·type definition· component itself. 

[member type definition] 
If and only if that type definition is a simple type definition with {variety} 
union, or a complex type definition whose {content type} is a simple type 
definition with {variety} union, then an ·item isomorphic· to that member of 
the union's {member type definitions} which actually ·validated· the element 
item's ·normalized value·. 

Some of my schema experts think that this means that if xsi:type is given, 
only the type given in xsi:type is being preserved for the element's type, 
since validation will pick the type given in xsi:type directly and not look at 
the union type at all. Let's call that interpretation A.

On the other hand, this seems like it is loosing type information and is in 
contradiction to what we expect from the data model document which says:

3.3.1.1 Element and Attribute Node Type Names

The precise definition of the schema type of an element or attribute 
information item depends on the properties of the PSVI. In the PSVI, [Schema 
Part 1] only guarantees the existence of either the [type definition] 
property, or the [type definition namespace], [type definition name] and [type 
definition anonymous] properties. If the type definition refers to a union 
type, there are further properties defined, that refer to the type definition 
which actually validated the item's normalized value. These properties are not 
used to determine the schema type of the node but they may be used to 
determine the typed value of the node, as described in 3.3.1.2 Typed Value 
Determination.

This explanation seems to be clear, but according to interpretation A of the 
schema document, you would not have the node's type if an xsi:type value has 
been present. But let's assume that interpretation A is wrong and that we can 
map the PSVI into the following data model instance (let's call this 
interpretation B):

element E of type U{42 of type T1}
element E of type U{"42" of type T2}

Note that according to interpretation A we would get:

element E of type U{42 of type T1}
element E of type T2{"42" of type T2}

Now let's look what the answers should be for Q1 to Q6 given interpretations A 
and B:

Q1 - A: true false
Q1 - B: true true

Q2 - A: false false
Q2 - B: false false

Q3 - A: false true
Q3 - B: false false

Q4 - A: true false
Q4 - B: true false

Q5 - A: false true
Q5 - B: false true

Q6: Parse error since U is not an atomic type.

Obviously, from a type consistency point of view, in my personal opinion, 
interpretation B is the only one that makes sense. However, interpretation A 
seems to be what the schema processor implies according to our reading.

The question is, is interpretation A correct (and therefore schema's semantics 
inconsistent) or interpretation B (and therefore the schema spec needs to be 
fixed or clarified)?

According to (member-only) http://lists.w3.org/Archives/Member/w3c-xml-query-
wg/2005Dec/0025.html, we need to fix the PSVI to Data model mapping with:

<cite>
So data model construction could/should be fixed to always use the
declared type for the node's type.  The only time this will be
different from the [type definition] is when xsi:type has been used.
</cite>

Comment 1 Mary Holstege 2006-02-28 16:01:05 UTC

Verification from Schema WG: Yes, interpretation A is correct:
xsi:type sidesteps union processing.  (The type of xsi:type 
must be validly derived from the declared type; however, that
derivation is not necessarily restriction.)

The correct fix is as cited at the bottom of the initial comment.
Quoting Henry's message:
"Note this information is already available in the PSVI, as
the {type definition} of the [element declaration] PSVI property."

This means expanding the requirement for which PSVI properties
must be reported.  Currently the [element declaration] property
is not required.

Comment 2 Norman Walsh 2006-03-14 16:09:50 UTC

See also bug #2768.

Comment 3 C. M. Sperberg-McQueen 2006-03-20 18:54:47 UTC

A proposal for changing the relevant part of the Data Model is at
http://lists.w3.org/Archives/Public/www-archive/2006Mar/att-0021/bugs.2768.2790.html__charset_ISO-8859-1

It follows the suggestion mentioned in comment #1, namely using
the declared type, not the [type definition] property, in cases
where xsi:type has been used to name a member of a union.  It
does not used the declared type, however, when xsi:type is used to
name a type derived from the declared type (where 'derived' means
derived in the narrow sense, excluding the sense in which members
are treated as 'derived' from unions).

Comment 4 Jonathan Robie 2006-04-03 17:41:39 UTC

Created attachment 416 [details]
Proposals adopted for bugs 2768 and 2790.

These proposals were adopted on 3 May 2006.

Comment 5 Jonathan Robie 2006-04-03 17:42:31 UTC

We have closed this by adopting the attached proposals.

Comment 6 Joanne Tong 2006-06-23 12:49:22 UTC

Additional concerns raised from members mailing list: 
http://lists.w3.org/Archives/Member/w3c-xsl-query/2006Jun/0017.html

For 2768

1. Part of the new text reads "modified by the rules in [Schema Part 1] 
which make it into a function". I'm not sure which rules you are talking 
about. Seems to me that all union aspects are handled in part 2. (I may be 
missing something in part 1. But if I'm having a hard time finding the 
relevant rules, I'm sure the users/readers will also have difficulties. 
You may want to have explicit reference to specific sections/constraints.)

2. anySimpleType is handled separated. How about anyAtomicType? In Schema 
1.1, similar to anySimpleType, anyAtomicType can also give multiple values 
for a given lexical input. If Query handles anyAtomicType in the same way, 
then there is a potential problem of which value to choose; if Query 
handles anyAtomicType differently, then there is a potential mismatch 
between Query and schema 1.1.

3. The sentence "the typed value is the result of applying M to the string 
value" surprised me a bit. Not saying it's wrong, but I didn't expect that 
values in Query are the same values as in Schema. For example, values of 
hexBinary and those of base64Binary don't overlap in schema. Is this also 
true for Query values? And for an atomic type, applying M (schema 
lexical->value mapping) gives you a single value, where Query would need a 
sequence.

4.  For example, type U is declared as a union of xs:integer|xs:decimal 
and the lexical value to be validated is "1".  According to Schema, this 
value is then validated as an xs:integer and the name from the members 
type definition is xs:integer.  However, the proposal introduces the 
concept that lexical mapping should be applied to determine its 
type-value.  In this example, the integer lexical mapping is still 
applied, but the actual value yields a value from the decimal value space. 
 Thus, the actual type information, xs:integer,  is lost.

The stuff in Schema on the mapping from the lexical space to the value 
space for a datatype doesn't give us much to latch onto, but here are some 
of the things that I think support my understanding of the value space

According to section 4.2.1 of Schema Part 2,[1] "for any a and b in the 
·value space· if a = b, then a and b cannot be distinguished (i.e., 
equality is identity)" and "if a datatype T' is ·derived· by ·restriction· 
from an atomic datatype T then the ·value space· of T' is a subset of the 
·value space· of T. Values in the ·value space·s of T and T' can be 
compared according to the above rules."
According to section 2.2,[2] a value space can be defined, among other 
ways, "by restricting the ·value space· of an already defined datatype" or 
"as a combination of values from one or more already defined ·value 
space·(s)"

So I read all that as saying that the values in the value space of a type 
derived by restriction are the same values as the values in the primitive 
type from which it's derived and that the values in the value space of a 
union type are the same values in the value spaces of the member types. 
So, for a union type like xs:int|xs:decimal, I believe that the two points 
in the lexical space "1" and "1.0" both map to the same value (one) in the 
value space of the union type, and there is no way of distinguishing the 
value to which those two points in the lexical space map to - in 
particular, that there is no type information associated with those 
values.

The stuff in Schema Part 1 about unions only seems to be helpful for 
unions like xs:gYear|xs:int versus xs:int|xs:gYear where the lexical 
string "2006" maps to the gYear value 2006 in the first case and to the 
decimal value 2006 in the second case.

5. "the W3C XML Schema specification defines a function M mapping the 
lexical representation of a value onto the value itself "
Why refer to "function M" when this term is not really defined in Schema 
(and if so, where in Schema?) .  Isn't it more of a "relation" rather than 
an actual "function"?



For 2790

1. (omit)

2. (editorial) Given that you already require the [element/attribute
declaration] 
property, it feels unnecessarily confusing to refer to [type definition 
anonymous] property later. I would just say "I need [xxx declaration] and 
[type definition] properties from PSVI." And (for the "otherwise" case) 
use {name} property of the [type definition] to determine which name to 
use, instead of consulting [type definition anonymous].

3. The [element declaration] property may be absent. (For example, an 
element matching a wildcard but having xsi:type.) You may need to take 
care of this special case both in the definition of "declared type" (what 
if [element declaration] is absent) and when "declared type" is used (what 
if it's absent).

Comment 7 Norman Walsh 2006-06-29 18:56:59 UTC

Really actually fixed now. Per the decisions made at the June 2006 f2f.