This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 11352 - %nondeterministic and independent compilations of modules
Summary: %nondeterministic and independent compilations of modules
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: XQuery 3.0 (show other bugs)
Version: Member-only Editors Drafts
Hardware: Macintosh Mac System 9.x
: P2 blocker
Target Milestone: ---
Assignee: Jonathan Robie
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-11-19 18:31 UTC by Daniela Florescu
Modified: 2011-02-05 22:29 UTC (History)
6 users (show)

See Also:


Attachments

Description Daniela Florescu 2010-11-19 18:31:05 UTC
Dear all,

the way the %deterministic and %non-deterministic functionality is designed right now
disallows independent compilation of XQuery modules, and this is a blocker (I think).

http://www.w3.org/XML/Group/qtspecs/specifications/xquery-11/html/xquery-11.html#FunctionDeclns

The reason is: %nondeterministic can ONLY be added on external functions, and it is not allowed 
on user-defined functions.

Yes, it is true that for user-defined functions, this property can be inferred by the compiler.

But in case of modules that invoke other modules, that requires the bodies of ALL modules to be
compiled together. Hence, no independent  compilation of XQuery
modules is allowed by the current version of the specification, and that's a serious problem.

The solution is to require the %nonderministic to be added for ALL functions that are nondeterministic
(user-defined as well as externals).
The compiler should double check and give an error in case of inconsistency between what the programmer says and what can be automatically inferred.

Note: that is the reason of why modern languages like Java require the exceptions to be declared by hand, 
even though this can also be detected by the compiler. That's for independent compilation of code.

Am I missing something ?

Best regards
Dana
Comment 1 Michael Kay 2010-11-19 23:48:35 UTC
I think it's usual in most modern language systems that when module A does "import module B", the compiler has access to information about the public objects exported by B. For example, I would expect it to know what public functions are present in B and what their signatures are. Equally, I would expect it to know which of these functions are deterministic. This doesn't require the modules to be compiled at the same time; it merely requires the compiler of module A to have access to information generated by the compiler of module B, for example some kind of interface definition, often added to the object code for this purpose.
Comment 2 Daniela Florescu 2010-11-20 01:23:49 UTC
Mike,

everything you said WOULD work, if modules would be required to disclose
that information about their publicly exposed functions (signature AND semantic
properties/annotations), and then stick to what they declared in the first place.

But they DON'T today. Only signature is required. Semantic properties are left in the air.

If a function f() is calling another function g() in an another module, and g's body definition
changes (e.g. changes from being deterministic to being non-deterministic), so does f()'s, as a side-effect. And changes in the semantic definition of f() mean different optimization/indexing/caching/etc.

Let me try to be more explicit: changes in the BODY of a function in ONE module (not visible signature)
in a module can have effect on the semantics/compilation/optimization of ANOTHER module.

That's not what I call independent compilation of modules.

Hence my blocking bug request.

This is not a good foundation for a modern programming language.

Example:

module 1.
==========

module namespace ns1= uri1;

import module namespace ns2=uri2;

declare function ns1:foo1()
{
    ns2: bar1()
}


declare function  ns1: foo2()
{	
	ns2:bar2()
}



module 2
========


module namespace ns2= uri2;

import module namespace ns1=uri1;

declare function ns2:bar1()
{
    ns1: foo2()
}


declare % nondeterministic function  ns2:bar2() external;


================

Neither one of the two modules can be compiled without the ***content**** of the other module,
(I am talking here about signatures and/or headers) which is terribly bad.

If I change the ns2:bar2 to be %deterministic, this changes the semantics/optimization/runtime/indexing/
caching/etc of both ns1:* functions !

I deal with tens of thousands of lines of XQuery right now, and that's a blocking bug.

Thanks for taking it seriously, best regards
Dana
Comment 3 Michael Dyck 2010-11-20 07:48:16 UTC
(In reply to comment #2)
> 
> everything you said WOULD work, if modules would be required to disclose
> that information about their publicly exposed functions (signature AND
> semantic properties/annotations), and then stick to what they declared
> in the first place.
> 
> But they DON'T today. Only signature is required. Semantic properties are left
> in the air.

The XQuery spec doesn't explicitly say what information a module is "required to disclose" under separate compilation (since the spec doesn't address separate compilation). Rather, it's implicitly required to disclose whatever information the implementation needs in order to behave conformantly. For public functions, this clearly has to be more than just signatures, as that's not enough to detect if a variable depends on itself (for instance).
Comment 4 Jonathan Robie 2010-11-30 18:43:36 UTC
In the current internal working draft, we allow user-defined functions to be declared %deterministic or %nondeterministic, we also allow implementations to use static analysis to determine whether a function is deterministic, and we do not say what to do if there is a conflict between the two:

<quote>
A function declaration may use function annotations to specify that a function is %deterministic (which is the default) or %nondeterministic. [Definition: A deterministic function is a function that always evaluates to the same result if it is invoked multiple times with the same arguments during the evaluation of a query.] [Definition: A nondeterministic function is a function that is not guaranteed to always return the same result when it is invoked multiple times with the same arguments during the evaluation of a query.] It is a static error [err:XQST0106] if a function's annotations contain more than one annotation named %deterministic or %nondeterministic. An XQuery processor can use static analysis to determine whether a user-defined function is deterministic.
</quote>

F&O has a more precise definition of nondeterministic. At Oxford, we agreed to use this definition in both specifications, changing it from 'stable' to 'nondeterministic' (see http://www.w3.org/Bugs/Public/show_bug.cgi?id=8221#c2).

See:

http://www.w3.org/XML/Group/qtspecs/specifications/xpath-functions-30/html/Overview.html#stable
http://www.w3.org/XML/Group/qtspecs/specifications/xpath-functions-30/html/Overview.html#dt-identical

So you can declare whether a function is deterministic, and we have a precise definition of determinism. At present, we allow users to declare a nondeterministic function %deterministic, and we give implementations no guidance whether to raise an error in this case, or what error to raise if they so choose. 

I think we should do one of the following:

1. Require an implementation to raise an error if the functions determinism (as determined by static analysis) does not match its declaration. This is my preferred option.
2. If we can't get consensus for that, allow an implementation to raise an error in this case, and specify the error. Make it implementation-defined whether the error is actually raised.

I'm not convinced that the default should be 'deterministic'. Given the definition of deterministic in F&O, it's easy enough to determine, statically, whether a function is deterministic or not. Perhaps we should simply require implementations to do so?
Comment 5 Mary Holstege 2010-12-02 15:00:37 UTC
(In reply to comment #4)
> 
> 1. Require an implementation to raise an error if the functions determinism (as
> determined by static analysis) does not match its declaration. This is my
> preferred option.
> 2. If we can't get consensus for that, allow an implementation to raise an
> error in this case, and specify the error. Make it implementation-defined
> whether the error is actually raised.
> 
> I'm not convinced that the default should be 'deterministic'. Given the
> definition of deterministic in F&O, it's easy enough to determine, statically,
> whether a function is deterministic or not. Perhaps we should simply require
> implementations to do so?

I am opposed to either requiring the error or changing the default.  It is perfectly reasonable to label a deterministic function as non-deterministic: maybe it is a stub that is expected to be non-deterministic when the full implementation is done, or maybe it is an attempt to see what the performance implications of the use of a non-deterministic function would be.  It is also perfectly reasonable to label a function which is technically non-deterministic as deterministic. Consider a function that constructs a sequence of nodes and then applies some path expression on them, so they return in document order. Well, document order is undefined and therefore technically non-deterministic in this case. Maybe, however, the developer knows that in the full context of the use of this application that non-determinism makes no material difference to the execution of the program. I see no reason why we should force an error because a human was smarter than a static analyzer and wanted to provide an optimizer hint.  

For usability and compatibility reasons, the default should not be non-deterministic, as that requires people with existing modules to either have to rewrite every single function in them in order to use that module in a 3.0 processor just to get the same performance characteristics as before.  This forced change would be doubly bad because it would only show up as a mysterious performance hit and the fact is that most functions in XQuery are deterministic. We should choose defaults to match reality.
Comment 6 Michael Kay 2010-12-02 15:08:38 UTC
I agree with comment #5.

I think %deterministic is telling the processor it is allowed to treat a function as deterministic. For example

declare %deterministic function local:x() { <a/> }

is telling the processor that you don't care whether several calls on local:x return the same node rather than different nodes. That is, it overrides what the processor can learn by static inference.
Comment 7 Daniela Florescu 2010-12-07 15:29:41 UTC
There are two questions about the semantics of %(non)deterministic annotations.

1. What is the required semantics in case the user's declaration is inconsistent
with what is automatically inferred by the compiler, and
2. What is the default annotation

My understanding of the current status quo is that:
1. Non defined
2. Deterministic

My own answers to each question is bellow:

1. About the inconsistency: there are two cases:
a. The user specifies  (explicitly or using the default behavior) that a function
is non-deterministic, while the compiler infers that it is deterministic.
This is not a semantic problem, it is just an optimization problem. A deterministic function 
can be used anywhere a non-deterministic function can be used (but not the other
way around), but better optimization can be done in case of deterministic functions.

The standard should allow  an XQuery implementation to use
the most restrictive annotation (aka deterministic) that is inferred by the compiler,
in the same way we allow XQuery implementations to use a more precise type
inference (as opposed to the standard one) to improve optimization.

b. The user specifies  (explicitly or using the default behavior) that a function
is deterministic, while the compiler infers that it is non-deterministic.

This is a semantic error: user had wrong expectations about the XQuery code
she/he has written.

The standard should require XQuery implementations to raise an error in this case.

2. In order to be backwards compatible with the semantics of XQuery 1.0 functions that does
allow non-determinism (as Andrew pointed out), I do not see any other solution then to make the
default  non-deterministic.

( I do not think that making the default different in XQuery 1.0 vs. XQuery 3.0 
is an acceptable solution.)
Comment 8 Daniela Florescu 2010-12-07 15:34:04 UTC
> The standard should allow  an XQuery implementation to use
> the most restrictive annotation (aka deterministic) that is inferred by the
> compiler,
> in the same way we allow XQuery implementations to use a more precise type
> inference (as opposed to the standard one) to improve optimization.
> 

BTW, this should answer the complain in comment #5 about people needing to
revisit their existing code to get the performance they used to have. They don't need to do that.
Comment 9 Martin Probst 2011-01-12 10:15:55 UTC
I just want to note that Dana is correct in her assessment that automatic inference of %nondeterministic does require inspection of method bodies, and together with cyclical dependencies between modules this means that requires to inspect method bodies of other modules during compilation.

This cannot be resolved by just including the inferred %nondeterministic property of a function in its signature, because of the cyclical dependency.

This could be resolved by including its non-local transitive dependencies in its signature, but that feels to me like extending the concept of "function signature" a bit too much.
Comment 10 Michael Kay 2011-01-12 11:03:24 UTC
(In reply to comment #9)
> I just want to note that Dana is correct in her assessment that automatic
> inference of %nondeterministic does require inspection of method bodies, and
> together with cyclical dependencies between modules this means that requires to
> inspect method bodies of other modules during compilation.
> 

But this analysis has nothing to do with %deterministic or %nondeterministic annotations. When a module is compiled, the compiler needs information about the modules that it imports, and in consequence, if two modules M1 and M2 each imports the other, then they must be compiled together. To break such a cyclic dependency, we would need to introduce some kind of interface definition as an intermediary (so neither module refers to the other, but both refer to the interface definition).

And separate compilation is still possible to the extent that modules do not have cyclic dependencies.

Equally, the inferencing of %[non-]deterministic is problematic in the case of recursive functions, whether or not they appear in different modules. So perhaps there is a problem, but the title of the bug doesn't characterize it correctly?
Comment 11 Martin Probst 2011-01-12 11:36:09 UTC
> Equally,  the inferencing  of %[non-]deterministic  is problematic  in the
> case  of recursive  functions, whether  or  not they  appear in  different
> modules. So perhaps there  is a problem, but the title  of the bug doesn't
> characterize it correctly?

Depends on what you mean by problematic. There is a relatively straightforward algorithm to resolve the issue (mark directly nondeterministic functions as, well, nondeterministic; build transitive closure of functions; mark functions as nondeterministic that call nondeterministic functions. Either build complete transitive closure, or do the marking step for as long as nothing changed anymore).

If you apply some dynamic programming techniques to this, the algorithm isn't even terribly inefficient. So I don't really see an issue there. Or am I missing something?

In any case, I agree, the problem this bug should be about is cyclical imports and their impact on the dependency chain for compilation, it's unrelated to nondeterminism annotations.
Comment 12 Jim Melton 2011-02-05 22:23:24 UTC
In its meeting of 2011-01-25 (minutes at [1]), the XML Query WG decided to remove the %deterministic and %non-deterministic annotations (and associated annotation assertions) from XQuery 3.0.

[1] http://lists.w3.org/Archives/Member/w3c-xsl-query/2011Jan/0290.html (member only)
Comment 13 Jim Melton 2011-02-05 22:29:26 UTC
Because you were present when this bug was considered by the WG and you agreed with the resolution, I'm marking this bug CLOSED.