4273 – [FS] data on element()

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 4273 - [FS] data on element()

Summary: [FS] data on element()

Status:	CLOSED FIXED

Alias:	None

Product:	XPath / XQuery / XSLT
Classification:	Unclassified
Component:	Formal Semantics 1.0 (show other bugs)
Version:	Proposed Recommendation
Hardware:	PC Windows XP

Importance:	P2 normal
Target Milestone:	---
Assignee:	Michael Dyck
QA Contact:	Mailing list for public feedback on specs from XSL and XML Query WGs

URL:
Whiteboard:	find better solution for 1.1
Keywords:

Depends on:
Blocks:

Reported:	2007-01-23 09:29 UTC by Tim Mills
Modified:	2009-02-18 09:35 UTC (History)
CC List:	3 users (show)

See Also:

Attachments

Description Tim Mills 2007-01-23 09:29:39 UTC

Consider the query:

declare variable $x as element(a, xs:integer) external;
declare variable $y as element() := $x;

let $z as xs:integer := fn:data($y)
  return $z;

From FS 3.5.4, we know that:

element(a, xs:integer) = element a of type xs:integer
element() = element * of type xs:anyType

From FS 3.5.1, we know that:

xs:integer <: xs:anyType

From FS 5.14, we know that:

1. the static type of $x is element a of type xs:integer
2. the static type of $y is element * of type xs:anyType

$y is assignable from $x because:

element a of type xs:integer <: element * of type xs:anyType

Now consider the type checking of fn:data($y).  This depends on the "data on" judgements in FS 7.2.6.  As these judgements get quite involved, I refer to XQuery 2.5.2, point 4a:

"For an element node, the relationship between typed value and string value depends on the node's type annotation, as follows: If the type annotation is xs:untyped or xs:anySimpleType or denotes a complex type with mixed content (including xs:anyType), then the typed value of the node is equal to its string value, as an instance of xs:untypedAtomic"

Therefore fn:data($y) is of type xs:untypedAtomic.  Now,

not(xs:untypedAtomic <: xs:integer)

and therefore "let $z as xs:integer := fn:data($y) return $z;" will fail to type check.

Consider the differences if the argument to fn:data had been $x instead of $y.

let $z as xs:integer := fn:data($x)
  return $z;

From FS 7.2.6, "When applied to an element type whose type annotation denotes a simple type or a complex type of simple content, data on returns the element's simple type."  

So fn:data($x) is of type xs:integer, and therefore the query will type check correctly.

The strange thing is that $x and $y are the same value.  The only difference is their static types at type check time - during evaluation their dynamic types will be identical.  Thus an implementation which only performs type checking at run time will execute both queries happily.

Have I made some mistake, or is there a problem here?

Comment 1 Jerome Simeon 2007-02-20 00:40:55 UTC

Tim,

This is a personal reply for now, as the working groups haven't gotten a chance to look at this yet.

After a quick review of this, it seems that you may be right that there is a problem here. I think it may have some impact on type soundness. I.e., the static and dynamic types maybe inconsistent. In your example, this could lead to worse problems than the one you mention. For instance consider the following variation of your example:

declare variable $x as element(a, xs:integer) external;
declare variable $y as element() := $x;

fn:lower-case($y)

it seems that here the static typing will let this query pass (based on the fact that fn:data($y) will have xs:untypedAtomic as static type, and that xs:untypedAtomic can be promoted to xs:string.

It also seem that it can happen for more cases than the one you mention, as in XML Schema, a typed with simple content (T1) can be derived from a type with mixed content (T2), and that a similar example could be built using this.

Except if I missed something in the 'data on' rules (which I may have), this is a serious problem.

A possible solution would be to change the rule for 'data on' to return xs:anyAtomicType in the case of type with mixed content, but I don't know if that would be an acceptable solution for you of the WGs. Note that I'm not sure we can do better, as we could always have the case you mention created for any atomic type so that seems a reasonable solution.

Any feedback on this would be great.
Thanks,
- Jerome

Comment 2 Tim Mills 2007-02-20 13:35:00 UTC

Thanks for confirming that this might be a problem.

I agree that it seems reasonable to define:

-------------------------------------------------------
data on element * of type xs:anyType : xs:anyAtomicType

(although should it be xs:anyAtomicType or xs:anyAtomicType* ?)

But once you make this change, problems arise such as:

1 + <a>2</a>

Under the current rules for "data on" fn:data(<a>2</a>) is of type
xs:untypedAtomic, and so the query will type check correctly because the
xs:untypedAtomic values gets "fixed up" by function call argument conversion.

Changing "data on" to return anything other than xs:untypedAtomic means 
function call argument conversion isn't going to happen and so this query
will fail static type checking.

Secondly, if I understand the data model correctly, values cannot be
union types.  i.e. during schema validation a specific type from a union
gets selected by considering each type in the union in declaration order.
So what is the dynamic type of fn:data(<a>2</a>)?








But once you introduce this, you are landed with the problem that,
because only xs:untypedAtomic values are automatically converted by
function calls.

Comment 3 Michael Rys 2007-02-20 18:55:23 UTC

Changing fn:data() to return xs:anyAtomicType* in this case is not a viable solution, since it would violate the core of the design of XQuery to work on untyped data.

Your example is an unfortunate consequence of the assert (as xs:integer) not promoting untyped values as the other type assertions (function parameters) do. I would be strongly opposed to change our type inference rules to make this edge case work though. People should use a treat as if they run into this situation.

Comment 4 Jerome Simeon 2007-02-21 00:08:57 UTC

Michael,

I am not sure I fully follow your response. The type in this case is already untyped data. However, I think the example that Tim pointed out shows a problem with type soundness which may break notably some optimizers which rely on the fact that the static type is consistent with the dynamic type, which it isn't here.

Do you disagree that there is a problem? or disagree with the stated solution?

Thanks,
- Jerome

Comment 5 Michael Rys 2007-02-21 00:32:35 UTC

There is an issue in that data on will return xs:untypedAtomic, while dynamically it returns xs:integer. I see that. 

The problem with xs:anyType is that it serves both as a specific content type with lax validation that can be mixed and as the most general type in our type hierarchy. In the former application we want to use it like mixed and make it map to untyped atomic values during atomization. In the second case we want it to be xs:anyAtomicType*.

So I am not sure about the solution yet. 

Best regards
Michael

Comment 6 Tim Mills 2007-02-21 07:53:17 UTC

"However, I think the example that Tim pointed out shows a problem
with type soundness which may break notably some optimizers which rely on the
fact that the static type is consistent with the dynamic type, which it isn't
here."

This describes the problem succinctly.  I would expect the dynamic type of any
expression to be a subtype of the expression's static type.  fn:data seems to be
broken in this respect.

Comment 7 Tim Mills 2007-02-26 12:19:31 UTC

This leads me to the conclusion that, when constructing nodes with construction mode preserve, the associated type-name should not be element().  Rather it should be a type I've named "xs:typed" (for want of anything better), defined as:

 define type xs:typed restricts xs:anyType {
    attribute * of type xs:anySimpleType,
    ( element * of type xs:anyType | text | comment | processing-instruction )*
  }

I believe this to be correct because, 

1. Any copied attributes will retain their type annotations, therefore their most specific common type is attribute() = attribute * of type xs:anySimpleType.

2. Any copied elements will retain their type annotations, therefore their
most specific common type is element() = element * of type xs:anyType.

3. Text, comment and processing instructions nodes are copied through.

4. Enclosed expressions which are a subtype of xs:anySimpleType are converted into text nodes.  i.e. <a>{1}</a> is an element containing a text node, not an integer.

i.e. fn:data() on any constructed node (regardless of construction mode) will can never return a value of any atomic type other than xs:untypedAtomic.

e.g. In the query 1 + <a>2</a>, fn:data(<a>2</a>) is of type xs:untypedAtomic, regardless of construction mode.

In my original example, fn:data($y) is of xs:anySimpleType.  Since xs:integer is a subtype of xs:anySimpleType, treat as type assertions (explicit if static typing is required, otherwise implicit) will work.  

I've now implemented this, together with:

-------------------------------------------------------
data on element * of type xs:anyType : xs:anySimpleType

and it appears to work in all XQTS tests.

Comment 8 Michael Kay 2007-10-21 23:15:25 UTC

I'm wondering whether there is a problem here with the dynamic semantics as well as the static semantics.

It seems that a value S of type element(*, xs:string) is not substitutable for a value A of type element(*, xs:anyType) even though the latter type subsumes the former, because A can be used as an operand of an arithmetic expression and S can't. This seems to be broken at a fairly deep level, and I can't quite see how Tim's proposal is supposed to fix it.

Comment 9 Tim Mills 2007-10-22 09:17:39 UTC

declare variable $x as element(a, xs:anyType) external;
$x + 1

This fails static type checking because we cannot determine statically that data($x) is convertible to a numeric value (ensuring that + is defined).  Note that I believe data on xs:anyType to be xs:anyAtomicType* and not xs:untypedAtomic as stated in the specification.

If we type check optimistically, it will pass and run to completion in the event that the data($x) is convertible to a numeric value.  Otherwise a runtime type check error will occur.  By type checking optimistically, the implementation effectively introduces extra type constraints which it checks at runtime.  It is as if the type of $x were treated as element(*, T) where T is any type convertible (using fs:convert-operand) to a numeric type).


declare variable $x as element(a, xs:string) external;
$x + 1

As expected, this will fail type checking for all inputs.

Comment 10 Michael Kay 2007-10-22 10:53:20 UTC

>Note that I believe data on xs:anyType to be xs:anyAtomicType* and not xs:untypedAtomic as stated in the specification.

I'm not sure why you believe that. Elements labelled as xs:anyType work the same way as any other element with mixed content: the typed value is the concatenated content of the contained text nodes, stripped of any start and end tags, typed as untypedAtomic. This is essential for the correct functioning of document-based applications that use a schema.

Comment 11 Tim Mills 2007-10-22 11:57:43 UTC

What follows is my arguement that data on xs:anyType = xs:anyAtomicType*

element(*, xs:anyType) is the most general type of an element.

For the type system to work, I argue that:

T <: xs:anyType
--------------------------------------------
data on T <: data on element(*, xs:anyType)

Let's break it down into 3 specific cases of T (covered in FS) which are necessarily always subtypes of element(*, xs:anyType).  

1. simple type or complex types of simple content.

statEnv |-  ElementType type lookup TypeReference
statEnv |-  TypeReference expands to Type
statEnv  |-  Type <: (attribute*, Type1)      
statEnv  |-  Type1 <: xs:anyAtomicType*
----------------------------------------
statEnv |-  data on ElementType : Type1

So if the Type1 = xs:anyAtomicType*, we know that

data on element(*, xs:anyAtomicType*) = xs:anyAtomicType*

and therefore xs:anyAtomicType* must be a subtype of 'data on' applied to element(a,  xs:anyType).

2.  complex type of mixed content

statEnv |-  ElementType type lookup of type TypeName
statEnv |-  TypeName of elem/type expands to expanded-QName      statEnv.typeDefn(expanded-QName) =
  define type TypeName Derivation mixed { Type1 }
--------------------------------------------------------------
statEnv |-  data on ElementType : xs:untypedAtomic

We know that data on applied to such an element is xs:untypedAtomic.

3.  The data on filter is not defined on any element type whose type annotation denotes a complex type of complex content and therefore applying data on to such a node raises a static error.

So data on applied to element(*, xs:anyType) must be the union of data on applied to elements defined in the above cases.  By my reckoning, that's xs:anyAtomicType*.

I'm willing to proved wrong though.   If you have an example that you think won't work, I'd be happy to give it a whirl.  I'm sure it would also be a useful addition to XQTS.

Comment 12 Michael Kay 2007-10-22 12:14:28 UTC

Perhaps we're arguing at cross-purposes here. I was pointing out that by definition, data() applied to an element annotated as xs:anyType returns untypedAtomic. You, I think, are saying that if all you know about a value V is that it satisfies element(*,xs:anyType), then all you know about data(V) is that it satisfies xs:anyAtomicType*. Both correct. Pax.

Comment 13 Tim Mills 2007-10-22 13:12:47 UTC

Mike, if I understand you correctly, when you say:

"an element annotated as xs:anyType"

and 

"a value V is that satisfies element(*,xs:anyType)"

you are getting at the difference between the type actually assigned by schema verification and whether the type assigned matches element(*, xs:anyType).

Lets assume that after schema validation, an element E is typed to be xs:anyType.  i.e. it's type is element(E, xs:anyType).  In XQuery 2.5.2 Typed Value and String Value, I find the following strange:

"If the type annotation is xs:anySimpleType or ... (including xs:anyType), then the typed value of the node is equal to its string value, as an instance of xs:untypedAtomic. "

which I take to mean:

(1) data on element(*, xs:anySimpleType) = xs:untypedAtomic
(2) data on element(*, xs:anyType) = xs:untypedAtomic

Each of these violates the rule for substitution to work, namely:

T <: U
----------------------------------------------
data on element(*, T) <: data on element(*, U)

But the following does not hold if (1) and (2) hold

element(*, xs:integer) <: element(*, xs:anySimpleType)
----------------------------------------------------------------------
data on element(*, xs:integer) <: data on element(*, xs:anySimpleType)

and neither does:

element(*, xs:integer) <: element(, xs:anyType)
---------------------------------------------------------------
data on element(*, xs:integer) <: data on element(, xs:anyType)

because xs:integer is not a subtype of xs:untypedAtomic.  Make xs:untypedAtomic a subtype of all the other atomic types and I'd guess that this would all work out; but this isn't how untypedAtomic has been defined.

I believe (1) and (2) to be wrong because the type of the data isn't known to be untypedAtomic, it's just that we don't know what the type is.  From XML Schema,

"The content of the element declared in this way is unconstrained, so the element value may be 423.46, but it may be any other sequence of characters as well, or indeed a mixture of characters and elements. "

which I'd hope means:

the element value may be the decimal 423.46

as opposed to 

the element value may be the untypedAtomic 423.46

Comment 14 Michael Kay 2007-10-22 14:04:24 UTC

>"If the type annotation is xs:anySimpleType or ... (including xs:anyType), then >the typed value of the node is equal to its string value, as an instance of >xs:untypedAtomic. "

>which I take to mean:

>(1) data on element(*, xs:anySimpleType) = xs:untypedAtomic
>(2) data on element(*, xs:anyType) = xs:untypedAtomic

Yes, these rules are quite explicit in the data model and elsewhere, and an awful lot of things would break if they changed.

>Each of these violates the rule for substitution to work, namely:

Yes, hence comment #8.

In many ways xs:untypedAtomic is intended to mean "unknown type". At one stage during WG discussions I proposed that the relationship between untyped and the set of real types was the same as the relationship between null and the set of real values; we should not treat it as a type like any other. In particular, expressions such as xs:untypedAtomic :> T should return "unknown" rather than true or false.

Where are the theorists when we need them?

Comment 15 Tim Mills 2007-10-22 16:35:35 UTC

If I understand you correctly, you are saying that, for example, the value of an attribute known only to be of (static) type xs:anySimpleType (through validation or construction) would be represented by a value whose (dynamic) type is xs:untypedAtomic.  Since xs:untypedAtomic is a subtype of xs:anySimpleType the type system works correctly.

Comment 16 Michael Dyck 2007-10-23 08:13:35 UTC

I agree that Static Type Analysis (as defined in the FS) is unsound, due to the static typing rules for fn:data().

I'm not convinced that there's a problem in the dynamic semantics, as suggested in Comment #8.

Re Comment #14: The statements in the XDM and XQuery specs about typed value specify the run-time behaviour of fn:data (operating on XDM values), whereas the "data on" judgment is (roughly speaking) the compile-time analog (operating on types). We can change the latter without changing the former.

Specifically, if we change the "complex type of mixed content" case of 'data on' so that the inferred static type is xs:anyAtomicType*, then I think the static typing rules for fn:data() would be sound, and wouldn't conflict with the existing specifications of the dynamic behaviour of fn:data(). However, static type analysis would reject more queries than currently.

Comment 17 Tim Mills 2007-10-23 09:01:22 UTC

As Bill Patton has pointed out in a message to public-qt-comments, data on is not always defined.  Quoting FS:

"The data on filter is not defined on any element type whose type annotationXQ denotes a complex type of complex content and therefore applying data on to such a node raises a static error."

He therefore concludes that fn:data applied to a value whose static type is known to be element() must necessarily raise a static type checking exception (because it is conceivable that the element supplied at runtime is has complex content).

If however, this was replaced with:

"If ElementType denotes a complex type of complex content then the static type is 'none' and applying fn:data to such a node raises a dynamic error [err:FOTY0012]."

Thus 'data on' is always defined.  That's nice, because the type declaration of fn:data is:

fn:data($arg as item()*) as xs:anyAtomicType*

i.e. it would be surprising to an end user if fn:data couldn't pass static type checking for any item().  

The cases where 'data on' is 'none' is exactly the cases where fn:data is defined to raise the error [err:FOTY0012], and of course, fn:error has type 'none'.

While I realise that avoiding these sort of runtime errors is static type checking's raison d'etre, however we risk making a static typing implementation completely unusable.

XQTS test K2-Axes-81 is an example of a query which is painless with optimistic typing.

<a>
    <b id="person0">
        <c/>
    </b>
</a>/*[attribute::id eq "person0"]/c

To make this pass static typing, it has to be modified to:

<a>
    <b id="person0">
        <c/>
    </b>
</a>/*[(data(attribute::id) treat as xs:untypedAtomic) eq "person0"]/c

Not pretty, but it works.  The static type of data(attribute::id) is xs:anyAtomicType*.  However, the computed value of the attribute will be xs:untypedAtomic, meaning that 'treat as' will not raise a runtime type error.

So

data on element(*, xs:anyAtomicType*) = xs:anyAtomicType*

but when fn:data is applied to an element whose schema type no more specific than anyType, fn:data will return an untypedAtomic value.

Comment 18 Michael Kay 2007-10-23 09:15:18 UTC

On reflection, I agree with comment #16. While the lack of substitutability in the dynamic semantics is something that people need to watch out for, it's probably liveable with. On the static side, I have a feeling that one thing that we could really do with is an "all data is untyped" switch (in the static context) so that users working in a non-schema-aware world don't suffer all these usability problems. Perversely, if you know that all data is untyped then you can often do much better type inference than when you don't.

Comment 19 Michael Rys 2007-10-23 18:03:35 UTC

Actually the static type inference rules (except for the official parent axis rules I think) were designed that if you know that an element or attribute is untyped, you will stay untyped. All you have to do to implement your switch is to make sure that the root document node is typed as containing untyped elements and you are fine. 

That's how we implemented the static typing of the non-schema constrained XML data type (and skip sections) in our product and it works fine.

Best regards
Michael

Comment 20 Michael Dyck 2007-11-03 23:55:43 UTC

To make this comment more concise, let me introduce the abbreviation:
    ct(foo) = "complex type with foo content"
where 'foo' is one of "complex", "mixed", "element-only", or "empty"

(In reply to comment #17)
> As Bill Patton has pointed out in a message to public-qt-comments, data on
> is not always defined.

I haven't been able to find his message about 'data on'. Could you provide a
pointer?

> Quoting FS:
> 
> "The data on filter is not defined on any element type whose type
> annotationXQ denotes a complex type of complex content and therefore
> applying data on to such a node raises a static error."
>
> He therefore concludes that fn:data applied to a value whose static type
> is known to be element() must necessarily raise a static type checking
> exception (because it is conceivable that the element supplied at runtime
> is has complex content).

Yes, the SequenceType element() matches any element node, including those
for which fn:data raises a type error, so type soundness dictates that
'data on' applied to the type denoted by element() must raise a type error.
Of course, this disagrees with the existing rules for 'data on', and also
disagrees with my proposed solution in Comment #16. And probably wouldn't
be very pleasant to work with.


> If however, this was replaced with:
> 
> "If ElementType denotes a complex type of complex content then the static
> type is 'none' and applying fn:data to such a node raises a dynamic error
> [err:FOTY0012]."
>
> ...
> The cases where 'data on' is 'none' is exactly the cases where fn:data is
> defined to raise the error [err:FOTY0012], ...

No, I don't think that's true. The snag (or one of them) is that a ct(mixed) (Tm, say) can derive a ct(element-only) (Teo).  An element node with type annotation Tm has a typed-value (and thus fn:data result) of type xs:untypedAtomic, whereas an element node with type annotation Teo has an undefined typed-value (and fn:data raises an error). But because of the 'derives-from' relation, Tm matches both nodes. So 'data on' applied to Tm should "cover" both outcomes.

In fact, you pointed that out yourself when quoting Bill Patton: element() denotes an element type with ct(mixed), and it matches all element nodes, some of which have a type annotation that is a ct(element-only), and thus cause fn:data to raise an error.

On the other hand, you suggested changing the error [err:FOTY0012] from a type error to a dynamic error, and that might be most of the solution. It would mean that static typing is no longer obliged to reflect the possibility of that error. 

So my revised proposal is this:

(1) In the XQuery spec, sections 2.4.2 & 2.5.2, change FOTY0012 from a type
    error to a dynamic error.

(2) In the FS spec, section 7.2.6, Static Type Analysis:
    (2a) In the last rule, the one dealing with ct(mixed), change the result
         type from xs:untypedAtomic to xs:anyAtomicType* (as in Comment #16).
    (2b) Replace the last paragraph with a rule that says:
             When applied to an element type whose type annotation denotes
             a complex type of element-only content, the 'data on' filter
             returns the type 'empty'.

Comment 21 Tim Mills 2007-11-05 08:30:44 UTC

That makes sense.  

So the static type of fn:data(complex type of element-only) = empty-sequence(),
but applying fn:data() a value of such a type would raise a dynamic error.

Would that require a change in the definition of error XPST0005?

err:XPST0005

    During the analysis phase, it is a static error if the static type assigned to an expression other than the expression () or data(()) is empty-sequence().

Recalling that optimizations are allowed to avoid errors, I'm a bit concerned that statically typing a call to fn:data() might introduce some strange behaviour.  e.g. An optimizer might use the identity:

count( $seq ) = 0 when $seq has type empty-sequence()

Is this a valid concern?

Comment 22 Michael Dyck 2007-11-05 22:32:24 UTC

(In reply to comment #21)
> That makes sense.  
> 
> So the static type of fn:data(complex type of element-only) =
> empty-sequence(), but applying fn:data() a value of such a type
> would raise a dynamic error.

It depends on what you mean by "a value of such a type" --
    "a value whose type annotation is such a type"
    (in which case, yes, an error)
or
    "a value matched by an element type whose type reference is such a type"
    (in which case, not necessarily an error).

As I understand it, some ct(element-only) can derive ct(empty). Therefore, some element type whose type reference is a ct(element-only) matches an element node whose type annotation is a ct(empty). We know from the XDM spec that the latter has a typed-value (and thus an fn:data() result) of the empty sequence, therefore 'data on' applied to the former (if it yields a type) must be a supertype of 'empty'. I think the only other dynamic possibility is an error, so if it's a dynamic error, then the static type might as well be just 'empty'.


> Would that require a change in the definition of error XPST0005?
> 
> err:XPST0005
> During the analysis phase, it is a static error if the static type assigned
> to an expression other than the expression () or data(()) is empty-sequence().

I suspect that would stay as is.


> Recalling that optimizations are allowed to avoid errors, I'm a bit concerned
> that statically typing a call to fn:data() might introduce some strange
> behaviour.  e.g. An optimizer might use the identity:
> 
> count( $seq ) = 0 when $seq has type empty-sequence()
> 
> Is this a valid concern?

If the expression $seq has the static type 'empty', then it's a static error according to the rule quoted above. Are you suggesting that an implementation might rewrite "count($seq)" as "0", thereby avoiding the static error? If so, note that the proposed text in Bug 4446 Comment #1 says "Expressions must not be rewritten in such a way as to create or remove static
errors."

Comment 23 Michael Dyck 2007-11-13 06:14:19 UTC

In addition to the solution I proposed in Comment #20, there might be an alternative involving use of the "expands to" judgment, but I'm not sure what it would look like yet.

Comment 24 zhen hua liu 2008-04-28 23:30:30 UTC

 To make static type analysis sound, fn:data(xs:anyType) has to return xs:anyAtomicType  instead of xs:untypedAtomic. However doing so will make
optimistic based static typing based on current formal semantic spec
 almost useless for XML documents whose type information is unknown until run time. That is why optimistic static typing implementation has to rely on users to explicitly provide the input if the XML documents is untyped or schema based during query compile time. This is one more argument why optimistic static typing is much more preferable than that of  pessimestic static typing.

Comment 25 Tim Mills 2008-04-29 15:20:13 UTC

Regarding Comment #24, could you perhaps give a concrete example?

I'm not quite sure how having an unsound type system could be circumvented by employing optimistic static typing.

We have implemented both pessimistic and optimistic static typing.  The latter just introduces run-time type checks where pessimistic checking determines that a runtime type check error is not inevitable.

This bug report is one of the older unresolved reports.  Is the solution proposed in Comment #20 likely to be adopted in the near future?

Comment 26 Michael Dyck 2009-02-07 21:46:33 UTC

(Following up comment #23)
> In addition to the solution I proposed in Comment #20, there might be an
> alternative involving use of the "expands to" judgment, but I'm not sure
> what it would look like yet.

More specifically, I was thinking that rather than doing a necessarily
general case analysis on a (possibly) high-level type, we could construct
its "union interpretation", and then do a more specific case analysis
on each type in the union (and then take the union of the results).

I still believe this is possible, and would in some situations furnish a
more specific static type. However, writing the necessary inference rules
turns out to be harder than I thought (due to the way that the FS
represents information in the type hierarchy), and I'm not confident
that I'd get it right.

So, here is my summary of the situation:

 -- If there is to be no change to any document other than FS, then FS must
    be changed so that "data on element()" (among other things) raises a
    type error.
    
    E.g.,
       fn:data(<a>2</a>)
    would raise a type error under the Static Typing Feature.

 -- If that's unacceptable, then the minimal outside-FS impact is to
    change FOTY0012 from a type error into a dynamic error as described in
    comment #20.
    
    "data on element()" would then be xs:anyAtomicType* rather than an
    error, so, e.g.
       fn:data(<a>2</a>)
    would have static type xs:anyAtomicType*. But queries such as
        1 + <a>2</a>
    would still raise a type error during static analysis, as pointed out
    in comment #2.

 -- If that's still unacceptable, then I believe the next step would
    involve the introduction of a type such as "xs:typed" described in
    comment #7.
    
    If I understand correctly, that type would be the type annotation for
    unvalidated elements, and 'data on element(*,xs:typed)' would yield
    the wished-for xs:untypedAtomic. So it separates the two uses of
    xs:anyType pointed out in comment #5.
    
    However, this would be a fairly fundamental change to the XDM, F+O, and
    XQuery/XPath docs, and I suspect it would not be accepted as an erratum
    against 1.0/2.0.

Comment 27 Michael Dyck 2009-02-07 21:49:11 UTC

I'm not sure that the XQuery test suite has a good test case for this
issue. I suggest adding one that's pretty much the same as fn-data-1,
but change the query body to:

    let $e as element() := $input-context1/examples:E6-Root/examples:E6
    return fn:data($e)

The expected result is still err:FOTY0012, but the current (unsound) FS rules
would deduce a static type of xs:untypedAtomic.

Comment 28 Tim Mills 2009-02-09 12:15:11 UTC

Thanks for looking at this report again.

I think separating out the 'typed' and 'untyped' cases is the only way to get this (usefully) correct.  If this can't be fixed for XQuery 1.0, then the Formal Semantics document needs to come with a prominent warning.

In our implementation, we use xs:typed as the type annotation for elements created under the 'preserve' construction mode.  We also use it for elements validated against xs:any with processContents="lax" when the element's type is not defined in an imported schema.  That is, under construction mode "preserve", newly created elements are typed as if they had been validated against <xs:any processContents="lax">.

Perhaps parallels can be drawn with the process of validating an element against a simple union type.  Suppose we validate against an XML Schema simple type 'foo' defined as the union of xs:float or xs:date.  Following vaildation, the type annotation is either xs:float or xs:date, but not 'foo'.  Similarly when validating against xs:anyType (via xs:any), we would only ever end up with type annotations which were subtypes of xs:anyType.

There is still a problem with stating that the typed value of an attribute with type annotation xs:anySimpleType is xs:untypedAtomic (XQ 2.5.2 Typed Value and String Value), since again the union of "data on" applied to the constituents of this union type is xs:anySimpleType, not xs:untypedAtomic.

Comment 29 Michael Dyck 2009-02-11 00:18:00 UTC

Whoops! It has come to my attention that it's impossible to derive types
from xs:untyped. Therefore, 'data on element(*,xs:untyped)' can be
xs:untypedAtomic without having to worry about subtypes. Consequently,
I'll revise some statements about the examples in my comment #26:

If construction mode is 'strip':
 -- The static type of         <a>2</a>  is element(a,xs:untyped).
 -- The static type of fn:data(<a>2</a>) is xs:untypedAtomic.
 -- The static type of     1 + <a>2</a>  is xs:integer.

But if construction mode is 'preserve', then what I said before holds:
 -- The static type of <a>2</a> is element(a,xs:anyType).
 -- Currently, static analysis of fn:data(<a>2</a>)  or  1+<a>2</a> 
    must raise type error FOTY0012.
 -- If we made FOTY0012 a dynamic error, the static type of
    fn:data(<a>2</a>) would be xs:anyAtomicType*, and static analysis
    of 1+<a>2</a> would have to raise type error XPTY0004.

Comment 30 Tim Mills 2009-02-11 09:53:19 UTC

I agree with what you say in Comment #29, but think that that if 

1 + <e>2</e> 

can't give the answer 3 in static typing mode then the static typing mode is too inconvenient to use.  

Assigning the type element(*, xs;typed) to elements in construction mode 'preserve' solves this problem and renders the static typing mode more useful,  and perhaps even usable.

Comment 31 zhen hua liu 2009-02-11 17:33:10 UTC

The  subtle point  here is
that  the concept of xs:anyType is overloaded with the following 3 distinct meanings:
(1) The most  general type at the root of the type hierarchy
(2)  A concrete type during run time to describe node which is not fully schema validated
(3) An abstract type during static analysis to describe node which, during run time,
 might be non-schema validated (xs:untyped)
or strictly schema validated (a specific schema type) or not-fully validated (xs:anyType).

Therefore, we need to interpret the xs:anyType depending on the context in which it is used.
=================================

During static type analysis, we use definition (3), atomization of node of xs:anyType shall be xs:anyAtomicType for element node and xs:anySimpleType for attribute node.
Since static type defined by
the current formal semantics is pessimistic (any potential run time type error shall be raised),
more  static type errors shall be raised when one uses xs:anyType.

 In other words, xs:anyType is NOT nstatic type friendly, users shall either use xs:untyped or specific schema type to describe the node type statically.
That is why when SQL/XML standard defines XML type, it allows implementation defined
behavior to default XML to XML(xs:anyType) or XML(xs:untyped) so that vendors may
choose their friendly default depending on if they implement pessimistic static typing.

=======================================
When xs:anyType is used as part of the sequence type syntax http://www.w3.org/TR/xquery/#id-sequencetype-syntax,  it uses definition (1).
=============================
When reading http://www.w3.org/TR/xquery/#id-typed-value,  use definition (2).

Comment 32 Tim Mills 2009-02-11 17:37:14 UTC

Regarding Comment #31, I wounder if thjs overloading was accidental rather than by design.  I suspect that it was not intentional.

Comment 33 zhen hua liu 2009-02-11 17:41:16 UTC

for comment #30, I agree that it is too restrictive to assign xs:anyType to
node constructor for preserve type mode, therefore,  implementation shall infer
static type of <e>2</e> as 
more specific type element(xs:integer), this makes it passing static typing.

Comment 34 Michael Rys 2009-02-11 18:01:44 UTC

I need to catch up on this thread.

But for now please let me say that you cannot infer a type of element(a,xs:int) for the XQuery expression <a>2</a> since there is no type information associated with the content of a unless you validate. 

Now you can infer that type for <a>{2}</a>.

Comment 35 zhen hua liu 2009-02-11 19:57:48 UTC

for comment #32, I don't think this is by design either. Since I was not in the working group in the past, so I would not know. However, I do recall that xs:anyType used to be an abstract type in some early version of the xquery spec ?

Comment 36 Michael Kay 2009-02-11 22:36:09 UTC

>static type of <e>2</e> as more specific type element(xs:integer)

Sorry, but we spent at least six months of WG time trying to find a way to make that kind of thing work, and failed. Even with <e>{2}</e> it's too difficult, because it doesn't extend smoothly to more complex content expressions.

Comment 37 Tim Mills 2009-02-12 10:25:36 UTC

This thread is getting quite long, so it might be worth a quick summary of where we've got to.

xs:anyType is the root of the type system

xs:anySimpleType is equivalent to xs:anyAtomicType* and is the union of all atomic types.

element() is shorthand for element(*, xs:anyType).  All element types are a subtype of this type.

attribute() is shorthand for attribute(*, xs:anySimpleType).  All attribute types are a subtype of this type.

XQuery 1.0 specifies that with construction mode 'preserve':

constructed elements have the type annotation element(*, anyType)
constructed attributes have the type annotation attribute(*, xs:untypedAtomic)

and with construction mode 'strip':

a constructed element has the type annotation element(*, xs:untyped)
a constructed attribute has the type annotation attribute(*, xs:untypedAtomic)

An unknown element validated against XML Schema's xs:any with processContents="lax" has the type annotation element(*, xs:anyType)

An unknown element validated against XML Schema's xs:any with processContents="strip" has the type annotation element(*, xs:untyped)

Additionally, an XML Schema may declare an attribute's type to be xs:anySimpleType, resulting in the type annotation attribute(*, xs:anySimpleType).

fn:data can be used to return the typed value of an element or attribute.  The type of fn:data applied to an argument is defined in XQuery Formal Semantics using a judgement called "data on".

XQuery 1.0 states that the typed value of an element with type annotation element(*, xs:anyType) is xs:untypedAtomic.  

Similarly, it states that the typed value of an attribute with type annotation attribute(*, xs:anySimpleType) is also xs:untypedAtomic.

Unforunately this results in an unsound type system, because:

data on T for any particular atomic type T is T

xs:anySimpleType is the union of all atomic types

therefore:

data on xs:anySimpleType is the union of data on T for all atomic types T.

i.e. data on xs:anySimpleType is xs:anySimpleType.

But XQuery 1.0 states that data on xs:anySimpleType is xs:untypedAtomic.

data on xs:anyType is also xs:anySimpleType.  

(In fact it is the union of xs:anySimpleType and the 'error' type (relevant when an element is a complex type containing only other elements), but this is exactly equivalent to xs:anySimpleType.)

But XQuery 1.0 states that data on xs:anyType is xs:untypedAtomic.

It is a requirement of XQuery 1.0 that a query such as:

1 + <e>2</e>

returns 3.  This is achieved via an implicit call to fn:data on element e returning a value of type xs:untypedAtomic, followed by type promotion to xs:integer.  Be clear that <e>2</e> does not have type element(*, xs:integer).

If the static typing feature is to reflect the behaviour of a dynamically typed system a query such as:

1 + <e>2</e>

must pass static typing.  If it doesn't then the static typing feature is essentiallyl useless since it will needlessly prevent valid queries from executing.

This can be achieved by avoiding the type annotations attribute(*, xs:anySimpleType) and element(*, xs:anyType).  We define xs:typed

 define type xs:typed restricts xs:anyType {
    attribute * of type xs:anySimpleType,
    ( element * of type xs:anyType | text | comment | processing-instruction )*
  }

such that data on xs:typed is xs:untypedAtomic.

Instead of assigning the type annotation element(*, xs:anyType) during construction or validation, the type annotation element(*, xs:typed) would be assigned.

Instaed of assigning the type attribute(*, xs:anySimpleType) during construction orvalidation, the type annotation attribute(*, xs:untypedAtomic) would be assigned.

This leaves xs:anyType and xs:anySimpleType as abstract types.

I hope I've provided an accurate summary - I'm a little hazy  on schema validation.

Comment 38 Michael Dyck 2009-02-13 04:21:04 UTC

(In reply to comment #37)
> 
> constructed elements have the type annotation element(*, anyType)

This (and similar statements) misuse the term "type annotation".
You can say that an element node matches element(*,anyType) [an item type],
or that the node's type annotation is anyType [a schema type], but not that
its type annotation is element(*, anyType).

> XQuery 1.0 states that the typed value of an element with type annotation
> element(*, xs:anyType) is xs:untypedAtomic.  

More correctly: The typed value of an element node with type annotation
xs:anyType is a value of type xs:untypedAtomic.

> Unforunately this results in an unsound type system, because:
> ... data on xs:anySimpleType is xs:anySimpleType.
> But XQuery 1.0 states that data on xs:anySimpleType is xs:untypedAtomic.

No, the XQuery spec makes no statement about the 'data on' judgment. It
says that if you've got a node whose type annotation is xs:anySimpleType,
its typed value (and thus, the result of applying fn:data to it) is a value
of type xs:untypedAtomic. That statement does *not* imply that 'data on
xs:anySimpleType' is xs:untypedAtomic. Rather, it implies (assuming a sound
static type system) that (roughly speaking):
    xs:untypedAtomic <: data on element(*,xs:anySimpleType)
    xs:untypedAtomic <: data on attribute(*,xs:anySimpleType)
Note that these two constraints are satisfied by the rules for 'data on' in
FS 7.2.6. Those rules have some problems, but that isn't one of them.

> If the static typing feature is to reflect the behaviour of a dynamically
> typed system a query such as:
> 
> 1 + <e>2</e>
> 
> must pass static typing.

That's not required for soundness, though it might well be demanded for
usability.

> If it doesn't then the static typing feature is essentiallyl useless
> since it will needlessly prevent valid queries from executing.

Any static typing feature will reject some valid queries. The challenge is
to not be too annoying about it.

> This can be achieved by avoiding the type annotations
> attribute(*,xs:anySimpleType) and element(*,xs:anyType).
> We define xs:typed
> 
>  define type xs:typed restricts xs:anyType {
>     attribute * of type xs:anySimpleType,
>     ( element * of type xs:anyType | text | comment | processing-instruction )*
>   }
> 
> such that data on xs:typed is xs:untypedAtomic.
>
> Instead of assigning the type annotation element(*, xs:anyType) during
> construction or validation, the type annotation element(*, xs:typed)
> would be assigned.

Although this might eliminate nodes with type annotation xs:anyType,
it doesn't eliminate expressions with static type element(*,xs:anyType).
For example, consider:

    1 + <f><e>2</e></f> / e

(which, according to the XQuery spec, also yields the value 3). Even though
the 'f' element node might be given a type annotation of xs:typed, static
analysis would still infer a static type of element(e,xs:anyType)* for the
PathExpr. If 'data on element(e,xs:anyType)' is xs:anyAtomicType*, then the
'+' would be obliged to raise XPTY0004 during static analysis.

Comment 39 Tim Mills 2009-02-13 08:55:59 UTC

Thanks for your corrections.

From what you've said, it looks as though using the static typing feature renders XQuery unusable.  Can you suggest anything which might offer some hope?

Comment 40 zhen hua liu 2009-02-13 17:11:53 UTC

for commment #39, here are my two cents of static typing:
in my implementation and customer experiences, I found supporting optimistic static typing is more useful. This allows me to leverage static typing as
query optimzation techniques to eliminate as much dynamic typing as possible, report type errors that otherwise would be dynamic type errors statically, catch invalid xpath,  etc. Such strategies work the best when the input XML node is
either untyped or conforming to particular XML schema. That is, users have to
be cooperative.  However, for XML node is xs:anyTyped statically, then the value of static typing diminishes quickly
and  the implementation has to use dynamic typing technique. That is why I advocate the concept of optimistic static typing.

Comment 41 Michael Dyck 2009-02-13 19:07:41 UTC

(In reply to comment #40)
> in my implementation and customer experiences, I found supporting
> optimistic static typing is more useful.

Okay, but that doesn't really help resolve this issue.

(1) Even with optimistic static typing, the typing rules still need to be
    correct, which they aren't currently.

(2) I'm not going to introduce optimistic static typing as an erratum to
    FS 1.0.

Comment 42 Michael Rys 2009-02-13 22:08:56 UTC

Note that optimistic static typing is an optimization of the current dynamic typing rules and does not belong into the standard... We have had this discussions a dozen time and always decided in this way.

Comment 43 Tim Mills 2009-02-16 08:35:00 UTC

Regarding Comment #38, although it fails type checking under the current rules, is there a reason why a query such as

1 + <f><e>2</e></f> / e

can't pass static typing?

Comment 44 Michael Rys 2009-02-16 10:33:17 UTC

It could pass static typing if we would go and do some more precise inference for anonymous types... then we could... but given that writing such an expression is rather bad for many different reasons, I don't think we should.

Comment 45 Michael Dyck 2009-02-17 06:49:17 UTC

(In reply to comment #43)
> Regarding Comment #38, although it fails type checking under the current rules,
> is there a reason why a query such as
> 
> 1 + <f><e>2</e></f> / e
> 
> can't pass static typing?

Actually, I believe it does succeed under the current rules, if
construction mode is 'strip'.

When construction mode is 'preserve', it can't succeed because the XQuery
spec mandates a type annotation of xs:anyType for a constructed node, so
soundness dictates that FS can't infer anything more specific for the
constructor expression. I don't know what Michael Rys means when he refers
to doing "more precise inference for anonymous types", unless he's also
imagining changing the XQuery spec. (If, in some cases, the XQuery spec
assigned a more specific type annotation, then the FS might be able to
reflect that in its static inferences.)

(I also don't know what he means when he says that "writing such an
expression is rather bad for many different reasons".)

Comment 46 Tim Mills 2009-02-17 09:30:34 UTC

Regarding Comment #45, yes - I was talking about the 'preserve' case.  As this is the default construction mode, I'm guessing casual users not realise the benefits of 'strip'.

"When construction mode is 'preserve', it can't succeed because the XQuery
spec mandates a type annotation of xs:anyType for a constructed node, so
soundness dictates that FS can't infer anything more specific for the
constructor expression."

...and yet at runtime we can be sure that fn:data applied to such an element will be xs:untypedAtomic.  Is there nothing that can be done to improve matters?

Comment 47 zhen hua liu 2009-02-17 15:49:43 UTC

for #45, #46, some implementations use the idea of structured typing techniques to resolve the issue. However, I won't list details here as this is beyond the scope of what current pessimesitc static typing spec can do. If certain parts of
the standard does not make sense and serve user's application need, I have to choose to ignore those parts.

Comment 48 Michael Dyck 2009-02-17 20:43:52 UTC

(In reply to comment #46)
> 
> "When construction mode is 'preserve', it can't succeed because the XQuery
> spec mandates a type annotation of xs:anyType for a constructed node, so
> soundness dictates that FS can't infer anything more specific for the
> constructor expression."
> 
> ...and yet at runtime we can be sure that fn:data applied to such an element
> will be xs:untypedAtomic.  Is there nothing that can be done to improve
> matters?

The problem is that the FS rules don't know that the type annotation of
the node is xs:anyType, they just know that it's some subtype of xs:anyType
(including xs:anyType itself), and the (proper) subtypes lead to typed
values with types other than xs:untypedAtomic.  I think it would be
difficult to make the type system capable of distinguishing the two.

Comment 49 Michael Dyck 2009-02-17 20:44:51 UTC

Here is (I hope) my final summary of the possible resolutions for this bug,
in order of increasing impact (as judged by me):

0) Do nothing, leave the 'data on' judgment unsound.

1) In the case of an element type whose type annotation is a complex type
   of mixed content, where the 'data on' judgment currently has a rule that
   infers xs:untypedAtomic (which is unsound), instead raise a type error.
   (So, remove the rule and put in a paragraph saying it's an error.) This
   is the minimal change that achieves soundness.

2) Write rules to examine the type hierarchy in the ISSD. In some cases
   (where the ISSD contains no complex types of element-only content),
   static analysis could avoid raising a type error from 'data on'. But
   inferring xs:untypedAtomic would still be unsound, we would have to
   infer xs:anyAtomicType* (which might cause subsequent type errors,
   depending on the context).

3) Change fn:data's type error err:FOTY0012 to a dynamic error, so static
   analysis is no longer required to raise it statically.

4) Changes beyond that are probably out of scope for the 1.0/2.0 second
   edition timeframe.

Comment 50 Michael Dyck 2009-02-17 20:46:07 UTC

At their joint meeting this morning, the Working Groups approved solution
#1 in comment #49. Therefore, I'm marking this issue resolved-FIXED.

We realize that this only partially satisfies the concerns raised over the
long history of this issue.  In the 1.1 timeframe we can increase our scope
to find a better solution.

If you accept this resolution, please mark the issue CLOSED.

Comment 51 Michael Dyck 2009-02-18 00:35:01 UTC

This issue has been entered as FS erratum E061, and the proposed fix has
been committed to the source files for the next edition of the FS document.

Comment 52 Tim Mills 2009-02-18 09:35:09 UTC

I'd have hoped for option (3), but if it is to be sorted out in XQuery 1.1, I'll close it.  Thanks for looking into it.