This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 3818 - [FS] Static typing of $input-context1/works[1]/employee[1]/empnum[1]
Summary: [FS] Static typing of $input-context1/works[1]/employee[1]/empnum[1]
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Formal Semantics 1.0 (show other bugs)
Version: Candidate Recommendation
Hardware: PC Windows XP
: P2 normal
Target Milestone: ---
Assignee: Jerome Simeon
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard: consider for 1.1
Keywords:
Depends on:
Blocks:
 
Reported: 2006-10-12 10:25 UTC by Tim Mills
Modified: 2008-07-09 02:01 UTC (History)
1 user (show)

See Also:


Attachments

Description Tim Mills 2006-10-12 10:25:03 UTC
We've found that we can't statically type check:

(: insert-start :)
declare variable $input-context1 as document-node()? external;
(: insert-end :)

count(() is $input-context1/works[1]/employee[1]/empnum[1])

(XQTS test nodeexpression3).

The expression expands to:

count(op::is-same-node((), 
  fs:apply-ordering-mode(
    fs:distinct-doc-order-or-atomic-sequence(
      let $fs::seq as node()* := 
        fs:apply-ordering-mode(
          fs:distinct-doc-order-or-atomic-sequence(
            let $fs::seq as node()* := 
              fs:apply-ordering-mode(
                fs:distinct-doc-order-or-atomic-sequence(
                  let $fs::seq as node()* := $input-context1 return 
                  ( let $fs::last := count($fs::seq) return 
                    ( for $fs::dot at $fs::pos in $fs::seq return 
                      let $fs::seq := child::works return 
                      ( subsequence($fs::seq, 1, 1) ) 
                    )
                  )
                )
              ) return 
              ( let $fs::last := count($fs::seq) return
                ( for $:fs::dot at $fs::pos in $fs::seq return 
                  let $fs::seq := child:::employee return 
                  ( subsequence($fs::seq, 1, 1) )
                )
              )
            )
          ) return 
          ( let fs::last := count($:s::seq) return 
            ( for $:fs::dot at $:s::pos in $:fs::seq return 
              let :fs::seq := child:::empnum return 
              ( subsequence($fs::seq, 1, 1) )
             )
          )
        )
      )
    )
  )
}

The sxpressions works[1], employee[1] and empnum[1] can all be typed as element()?, (under the rules of 7.2.13).  However the use of "let fs::seq as node()*" means that fs:seq gets typed as node()*, rather than the more specific type element()?.

By the end of this, we can only tell that:

$input-context1/works[1]/employee[1]/empnum[1]

is of type node()*, when it is obviously element()?.

In 4.8.3 Let Expression, the typing rule states that:

statEnv |-  Expr1 : Type1
Type0 = [ SequenceType ]sequencetype
statEnv  |-  Type1 <: Type0
statEnv  |-  VarName of var expands to Variable
statEnv + varType(Variable1 => Type0 )  |-  Expr2 : Type2
------------------------------------------------------------------------
statEnv |-  let $VarName1 as SequenceType := Expr1 return Expr2 : Type2

i.e. $VarName1 gets treated as having type Type0, not the more specific type that Type1.  Shouldn't Variable1 have type Type1?
Comment 1 Liam R E Quin 2006-11-02 01:12:27 UTC
The XML Query Working Group thanks you for your comment.

You're right, there is an issue here that we will need to address.  We have left it open and recorded it as a potential erratum against the specification.
Comment 2 Tim Mills 2006-11-02 13:25:15 UTC
Thanks.

Similar problems exist with other expressions which use TypeDeclarations, e.g. every, some, for.
Comment 3 Jerome Simeon 2006-11-02 17:29:08 UTC
I'm not sure I understand whether explicit type declarations in some every for are the same problem. In that case, I think we assume the user wants to enforce a particular type and that we should use that type.
- Jerome
Comment 4 Tim Mills 2006-11-02 17:45:30 UTC
If not, then a whole bunch of XQTS tests will fail static typing.

My reading of "treat as" is that its purpose is to ensure that its argument type is at least a subtype of the specified type, i.e. a narrowing of the type from type T to type S where T is a subtype of S.  Would there ever be any reason to use "treat as" to broaden a type from type S to type T?

The use as a type narrower adds type information.  Use as a type broadener throws away type information.  Is that ever helpful?
Comment 5 Michael Rys 2006-11-02 18:16:15 UTC
It use to broaden the static type would make sense if the type system provides data hiding/encapsulation which it does not. However, it also does not do harm and so we decided to not forbid it.
Comment 6 Michael Kay 2006-11-02 18:59:00 UTC
In fact we explicitly decided to permit a broadening "treat as". There was a rule against it at one stage, and it placed a great and unnecessary burden on users to understand the static type inferencing. When I assert EXP treat as element(employee) I am asserting that EXP will always return an employee. I want the system to tell me if EXP doesn't return an employee, but I don't want it to reject my query because the system was able to work out for itself that EXP was an employee without my telling it.
Comment 7 Tim Mills 2006-11-03 08:17:14 UTC
"When I assert EXP treat as element(employee) I am asserting that EXP will always return an employee. I want the system to tell me if EXP doesn't return an employee, but I don't want it to reject my query because the system was able to work out for itself that EXP was an employee without my telling it."

In your example above, the user has told the system explicitly that EXP will always return an employee.  However, in some cases the system is able to determine that not only would EXP return an employee, but it would return some specific subtype of employee.  In doing so, the system is capable of _not_ rejecting some queries.  However, the current type rules are preventing use of this information.  So there exist situations in which a user, in trying to add extra type information, is actually forcing to the system to throw away more specific type information and _causing_ a query to be rejected.


Consider:

Type S, type T where S is a subtype of T.
Function f($arg as S) as xs:boolean
Variable $s, proven to be of type S*.
Query: some $x as T in $s satisfies f($x)

With the current typing rules, $x is treated as if it had type T, so this statement fails static typechecking because function f requires an argument of type S.

With static type checking turned off, this query will succeed.

However, since the system already knows that $s is of type T*, it can infer that:
1. each $x must be of type S
2. S is a subtype of T, therefore no dynamic typing errors will be raised.
3. f($x) is therefore type-checkable.

The current typing rules mean that it is forced to forget (1) above, because the typing rule states that we must treat $x as being of type T.  This leads to the unnecessary rejection of this query.

I'd argue that 

$s treat as T 

should have static type S if we know that type($s) = S and S is a subtype of T; otherwise, it should have static type T.
Comment 8 Michael Kay 2006-11-03 09:04:16 UTC
"In doing so, the system is capable of _not_ rejecting some queries.  However, the current type rules are preventing use of this information.  So there exist situations in which a user, in trying to add extra type information, is actually forcing to the system to throw away more specific type information and _causing_ a query to be rejected."

If a user is calling a function that expects an element(employee) and says "treat as element(employee)" then they are effectively saying "please do dynamic type checking on this" (probably because they are bewildered by the static type error that they got before they added the "treat as"). This case should never give a static error. 

On the other hand, if they call a function that expects element(employee) and they say "treat as node()", then a static error seems perfectly reasonable. The rule is designed for the former case, which is much more likely to occur in practice.

I'm afraid I don't have time, however, for debate on the details of pessimistic static typing. I think the whole edifice is pretty unusable, which is why most implementors are ignoring it (even those who claim to support static typing are using their own custom inference rules). In fact, I think that users of static typing systems will be obliged to litter their code with "treat as" expressions, forcing the system to behave exactly like one with dynamic typing. All your comments pointing out perfectly reasonable expressions that fail under static typing serve merely to reinforce my views on this.

That's a personal comment of course.
Comment 9 Tim Mills 2006-11-03 10:18:12 UTC
"On the other hand, if they call a function that expects element(employee) and
they say "treat as node()", then a static error seems perfectly reasonable. The
rule is designed for the former case, which is much more likely to occur in
practice."

Any implementation not implementing static typ checking could not raise such an error, because "treat as" does not change the dynamic type of a value.

The use of static typing can alert users to the possibility of unforseen dynamic errors.  While I appreciate your viewpoint, my personal opinion is that I'd rather know of a potential problem at query compile time than at some point in the future when someone sends in the "wrong sort" of data to the query and triggers a dynamic error.

We are pointing out the static type checking problems we find in XQTS in the hope that the formal semantics type checking rules can be made as precise and usable as possible.  We are currently failing about 200 of the 14000 or so minimal conformance tests.  I'd argue this shows that the static typing approach is fairly workable.  The formal semantics specification gave us the confidence that XQuery/XPath had been well thought through.

Comment 10 Michael Kay 2006-11-03 10:36:13 UTC
"The use of static typing can alert users to the possibility of unforseen
dynamic errors.  While I appreciate your viewpoint, my personal opinion is that
I'd rather know of a potential problem at query compile time than at some point
in the future..."

No quarrel with that. But the vast majority of these tests are schemaless. Without a schema, mistyping an element name in a path such as /a/b/c is not going to give you a compile time error. The only thing it will give you is a cardinality error if you use it somewhere that a singleton is required. Nine times out of ten that error is spurious because the user knows that x/@price is actually a singleton, and having to assert it with a "treat" doesn't add much value. Static typing in the absence of a schema, in my view, detects more spurious errors than real errors, and it fails to detect most real errors. In the presence of a schema, by contrast, most real errors (such as mistyping an element name) can be picked up just as well (at compile time) by an optimistic static typing system as by a pessimistic one, while also reporting far fewer spurious errors.
Comment 11 Tim Mills 2006-11-03 10:52:43 UTC
I think we might be roughly in agreement!

We intend that, if static type checking is turned off by the user, the system will use static type checking so that only the minimal number of run time type checks need to be added to the query representation.

However, to ensure we get the minimal number of run time type checks, we want the static typing rules to be as good as is possible.  This is why we have been reporting the problems we have encountered.
Comment 12 Jerome Simeon 2007-02-20 01:15:19 UTC
A couple of comments on this thread.

First, the type assertion let $x as Type := Expr ... has intentionally a semantics different from 'treat as'. There used to be a distinction and a separate operator in the language for it (called 'assert as') to allow user to enforce a specific type and check that this is the correct type at both static and run time. I think a large part of the thread (from comment #4 on) is related to treat as. Although I agree with most of what is being said, I don't think it should apply to the particular issue that is being reported here.

Second, I do think there is a problem with the particular query that Tim reports. I think the intent is that his particular query should pass static typing. The issue seems to be with the 'as node()' type assertion introduced during normalization of step expressions. (5th normalization rule in Section 4.2 Path Expressions -- just before 4.2.1). I think a way to fix this issue which is consistent with the rest of the spect is to introduce a special function fs:node-sequence() which checks whether the input is a sequence of nodes, and if yes propagates the corresponding input type as output. Such a function could go after Section 7.1.4 and have the following typing rules:

statEnv  |-   Type <: node*
---------------------------------------------------------
statEnv |-  (FS-URI,"node-sequence") ( Type ) : Type

Comments?
- Jerome
Comment 13 Tim Mills 2007-02-20 14:04:33 UTC
Although your workaround would fix this specific problem, my reading of the FS document gives me the impression that the static typing
rules for constructs:

some $x as ...
every $x as ...
let $x as ...
treat as ...

ensure that a static-typing implementation _must_ reject perfectly reasonable 
queries whenever the static type can be proven to be a subtype of the user-specified type declaration.  

In our implementation, we have solved the problem by having type rules such that
if the computed static type is a subtype of the type declaration, then the
static type is taken as the computed static type.  Otherwise, the static type
is taken as the user-specified type declaration.  In the first specific case I gave, this gives the same behaviour as your suggested node-sequence function.

I'm convinced that this gives the most usable and consistent behaviour.




Comment 14 Jim Melton 2007-02-26 00:07:49 UTC
The fix for this bug does not appear in the Recommendation of 23 January 2007.  It will be considered for a future publication (either an Errata document or some possible future version of the specification). 
Comment 15 Michael Dyck 2007-03-06 08:21:47 UTC
(In reply to comment #12)
> I think a way to fix this issue
> which is consistent with the rest of the spect is to introduce a special
> function fs:node-sequence() which checks whether the input is a sequence of
> nodes, and if yes propagates the corresponding input type as output. Such a
> function could go after Section 7.1.4 and have the following typing rules:
> 
> statEnv  |-   Type <: node*
> ---------------------------------------------------------
> statEnv |-  (FS-URI,"node-sequence") ( Type ) : Type

In the premise, I think "node" should be
    [[ node() ]]_sequencetype
as in 7.1.4.
Comment 16 Tim Mills 2007-03-14 13:41:12 UTC
Since treat as throws away type information, use of "/" also throws away type information because:

[/]Expr
==
[(fn:root(self::node()) treat as document-node())]Expr

unless you alter the semantics of "treat as" to:

statEnv |- Expr: T1
statEnv |- T1 <: T2
-------------------------------
statEnv |- Expr treat as T2 : T1


statEnv |- Expr: T1
statEnv |- not(T1 <: T2)
-------------------------------
statEnv |- Expr treat as T2 : T2

Comment 17 Michael Dyck 2007-05-03 02:07:49 UTC
With respect to the original problem, the working groups have approved the solution given in Comment #12 (as amended by Comment #15). Consequently, I am marking this bug FIXED. Please indicate your acceptance of this resolution by marking the bug CLOSED.
Comment 18 Tim Mills 2007-05-03 07:19:06 UTC
Sorry, but my view is that the proposed fix is a hack to get around the more general problem I described in Comment 13 ( "treat as" / "as" throws away static type information).

Close the bug if you like, but I would be grateful for an explanation of why the type rules actively throw away information, the effect of which appears only to have the purpose of making static typing implementations reject perfectly reasonable queries.
Comment 19 Michael Dyck 2008-05-15 08:34:00 UTC
(In reply to comment #18)
> Sorry, but my view is that the proposed fix is a hack to get around the more
> general problem I described in Comment 13 ( "treat as" / "as" throws away
> static type information).

I'm inclined to agree.
 
> Close the bug if you like, but I would be grateful for an explanation of why
> the type rules actively throw away information, the effect of which appears
> only to have the purpose of making static typing implementations reject
> perfectly reasonable queries.

I haven't yet found an explanation. I think we should reconsider those type rules for the 1.1 doc (and I have marked the Status Whiteboard accordingly).

Comment 20 Michael Dyck 2008-07-09 02:01:16 UTC
This issue has been entered as FS erratum E021, and its fix has been committed to the source files for the next edition of the FS document.

Consequently, I'm marking this issue CLOSED.