13674 – [XQ30] schema-element() types behave differently in different modules.

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 13674 - [XQ30] schema-element() types behave differently in different modules.

Summary: [XQ30] schema-element() types behave differently in different modules.

Status:	CLOSED FIXED

Alias:	None

Product:	XPath / XQuery / XSLT
Classification:	Unclassified
Component:	XQuery 3.0 (show other bugs)
Version:	Member-only Editors Drafts
Hardware:	PC Windows NT

Importance:	P2 normal
Target Milestone:	---
Assignee:	Jonathan Robie
QA Contact:	Mailing list for public feedback on specs from XSL and XML Query WGs

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2011-08-04 17:22 UTC by Oliver Hallam
Modified:	2011-11-11 08:48 UTC (History)
CC List:	2 users (show)

See Also:

Attachments

Description Oliver Hallam 2011-08-04 17:22:02 UTC

I have been trying to modify our typing judgements to allow for separate compilation of modules (there were several places in the FS rules where this was not possible), and have run into the following issue with schema-element() types.

Consider the following:

1. A schema with target namespace "urn:S1" which declares an element {urn:S1}E.
2. A schema with target namespace "urn:S2" which declares an element {urn:S2}E which substitutes for {urn:S1}E.


Now, consider the module (urn:M1):


declare module namespace m1="urn:M1";
import schema namespace s1="urn:S1";

declare function m1:function($x as schema-element(s1:E)) as xs:boolean
{
    return $x instance of schema-element(s1:E);
}


The function m1:function accepts any element in the in-scope schema elements that substitutes for s1:E.  In this case it only accepts elements named s1:E.  Clearly this function should always return true.



Consider the following query.

import schema namespace s1="urn:S1";
import schema namespace s2="urn:S2";
import module namespace m1="urn:M1";

let $function := m1:function#1 as function(schema-element(s1:E)) as xs:boolean
let $argument := validate { <s2:E /> } as schema-element(s1:E)
return $function($argument)


In this query the type schema-element(s1:E) also matches an s2:E element, as this is in the substitution group for s1:E.

Following the rules given in the XQuery spec, both these type judgements are fine.

Since the function takes a schema-element(s1:E) and we are passing it a schema-element(s1:E) no argument conversion takes place.

However, the actual value of the function call is false, which indicates that something odd is going on, and the type system is not sound.


I can see several ways of resolving this, none of which seem entirely satisfactory:

1. schema-element(E) matches any element that substitutes for E, regardless of whether this schema was imported in the current module.
  That is, when matching a schema-element(E) an implementation should use all the schema elements that it is aware of, and not just those that have been imported.

2. It is a static error if schema-element(E) is used ambiguously in different modules.
  If a module imports a schema which contains an element E' in namespace N' that substitutes for an element E in namespace N then it is a static error if any module contains a schema-element(E) without importing the namespace E', or more simply if any module imports N without importing N'.

3. schema-element(E) is shorthand for the union of matching element types.
  Specifically, schema-element(E) is expanded to the union of element types element (E', T') where E' and T' are the name and type respectively of an in-scope schema element that substitutes for E.

4. Importing a schema also imports any other namespaces the implementation is aware of that substitute elements in it.

A few notes on each suggestion:
1:
  This agrees with the static typing rules given in FS.  (FS apples all its type judgements based on the fact that it knows every type that can appear).
  When compiling modules seperately this hampers static analysis, as you cannot say anything about the name of an element that matches a type T without knowing which other schemas are available in other modules.
  Changing a schema import in one module could break another module.  This may not be a problem in practice!

2:
  This is the simplest option to implement in the spec - just adding one more error condition.
  This prevents the following use case:
  * There is a simple schema S
  * There are two incompatible extensions to S, S1 and S2.
  * There are two queries Q1 and Q2 importing S1 and S2 which both use a common module M inporting S.
  Now, to use M in Q1 it must import S1.  To use M in Q2 it must import S2.  But it is an error if M imports both.
  Whilst the extra error may seem somewhat arbitrary to users, I suspect that it will very rarely be seen.

3:
  Painful to explain in the spec, as we don't have arbitrary union types
  This could lead to more costly type checking when calling functions in precompiled modules.
  The meaning of the type schema-element(E) would depend on its context.  This would cause issues when exposing types to the outside world, and would hurt APIs.
  Potential for bugs.

4:
  Implementations are already allowed to do this since the schemas that are imported by a "schema import" are implementation defined.
  This is similar to option 1, but performs the same task by adding more types to the in-scope schema types.
  This can also be viewed as similar to option 2, but implicitly imports missing schemas rather than raising an error.
  This prevents a module from being compiled until all schemas are known.

My opinion is that whilst option 3 seems cleanest in theory, it is probably the worst in practice!  Option 1 seems nicest from a "no suprises" user perspective, but currently my preference would lie with option 2 as this allows the greatest flexibility to implementors.

Comment 1 Michael Kay 2011-08-05 23:07:19 UTC

I think you are describing a particular instance of what is actually a more general problem. Although we define that a type named T must be identical in every module, it remains true that an element may be valid against type T in the context of one schema S1, and invalid against the same type T in the context of a different schema S2. Substitution groups are one reason for that; strict and lax wildcards are another; types derived by extension are another case; in XSD 1.1, notQName in wildcards provides yet another. 

Saxon's approach to the problem you describe is to extend the consistency constraints so that in all static and dynamic schemas used for a particular query, an element declaration E must have the same substitution group, and a type T must have the same set of types derived by extension.

(As for wildcards, I think I avoid making inferences about wildcards that might be invalidated if element declarations are added to or removed from the schema. But I wouldn't be 100% confident I've got this right.)

Comment 2 Oliver Hallam 2011-08-08 17:30:35 UTC

Saxon's behaviour sounds like what I wanted to implement, but this condition proves difficult to enforce if you are using a 3rd party schema library.

So am I right in saying that Saxon's behaviour matches the behaviour in suggestion 1 above (schema-element matches any element in the substitution group, not just those in the imported schemas)?

Comment 3 Michael Kay 2011-08-08 22:42:56 UTC

Certainly in Saxon, schema-element(E) will match any element known to be in the substitution group of E - I think the spec is clear on that:

<quote>
schema-element(E) matches an element if (inter alia) "The name of the candidate node matches ... the name of an element in a substitution group headed by an element named ElementName." - it doesn't say that the element name has to be declared in an imported schema or that the element declaration has to be "in scope" (whatever that might mean).
</quote>

To make this work, Saxon imposes a rather crude constraint: as soon as a schema namespace is referenced in an "import schema" declaration in any query module, that namespace is "sealed". Sealing a namespace prevents certain changes being made to the schema components in that namespace, in particular, it prevents the substitution group of elements in that namespace being subsequently extended. This is a rather draconian way of ensuring consistency between the compile-time and run-time schemas. The effect is that the full substitution group of an element named in a schema-element() test is always known statically.

Comment 4 Oliver Hallam 2011-08-09 11:47:30 UTC

I didn't find the meaning of the spec to be as clear as you suggested.
Section 2.1.1 defines the substitution groups as a component of the In-scope schema definitions, which quite specifically vary from module to module, which lead to my original interpretation.

4.12 makes it clear that the schema definitions are not imported from another module, and hence the substitution groups vary from module to module.


In the end I have gone in the direction of weakening some type checking rules when running with optimistic typing, to assume that you know nothing statically about the name of a schema-element match, and not to make any inferences about the child axis on a known schema type.

Without having full control over the schemas we have had to go for the even more draconian approach of disallowing dynamic schemas entirely if the query is compiled in static-typing mode.

Comment 5 Jonathan Robie 2011-09-10 16:13:48 UTC

(In reply to comment #3)

> <quote>
> schema-element(E) matches an element if (inter alia) "The name of the candidate
> node matches ... the name of an element in a substitution group headed by an
> element named ElementName." - it doesn't say that the element name has to be
> declared in an imported schema or that the element declaration has to be "in
> scope" (whatever that might mean).
> </quote>

The spec also says this:

2.5.5 SequenceType Matching

An XQuery 3.0 implementation must be able to determine relationships among the types in type annotations in an XDM instance and the types in the in-scope schema definitions (ISSD). An XQuery 3.0 implementation must be able to determine relationships among the types in ISSDs used in different modules of the same query.

Together with what Mike quoted, I think that makes the spec clear here. Am I missing something? Is there a bug that we need to fix in the spec?

Comment 6 Jonathan Robie 2011-09-27 16:41:09 UTC

We do not know if there is a bug here. Please reopen it if you believe there is a bug, and state clearly what the bug is.

Comment 7 Michael Kay 2011-09-29 22:46:30 UTC

Reopening because I don't think the issue has been adequately addressed.

Arguably, the statements on consistency in 2.2.5 already cover the issue. But they gloss over some of the difficulties, and fail to draw attention to some significant pitfalls.

I propose adding to the penultimate bullet of 2.2.5 Consistency Constraints (which starts "For a given query, define a participating ISSD ...") the following: 

<add>
"Equivalence" here means that validating an instance against one ISSD will always have the same effect as validating the same instance against the other ISSD (that is, it will produce the same PSVI, insofar as the PSVI is used during subsequent processing). This means, for example, that the membership of the substitution group of an element declaration in one ISSD must be the same as that of the corresponding element declaration in the other ISSD; that the set of types derived by extension from a given type must be the same; and that in the presence of a strict or lax wildcard, the set of global element (or attribute) declarations capable of matching the wildcard must be the same.
</add>


Michael Kay

Comment 8 Tim Mills 2011-09-30 07:26:14 UTC

With Mike's text in place, is it therefore a static error if these conditions are not met?

Comment 9 Michael Kay 2011-09-30 08:11:51 UTC

(In reply to comment #8)
> With Mike's text in place, is it therefore a static error if these conditions
> are not met?

No, the intro to section 2.2.5 explains it:

In order for XQuery 3.0 to be well defined, the input XDM instance, the static context, and the dynamic context must be mutually consistent. The consistency constraints listed below are prerequisites for correct functioning of an XQuery 3.0 implementation. Enforcement of these consistency constraints is beyond the scope of this specification. This specification does not define the result of a query under any condition in which one or more of these constraints is not satisfied.

In other words, if the consistency constraints aren't satisfied, anything might happen. In practice I think you have several options: (a) design your product so inconsistencies can't happen, (b) detect the inconsistencies and report them nicely as static (or perhaps link-time) errors, (c) tell the user it's their job to get it right, and if they don't, you're not answerable for the consequences, (d) ensure that your product can give some kind of reasonable result even if inconsistencies are present (e.g. by merging substitution groups at query link time, and avoiding pessimistic static type checking).

Comment 10 Jonathan Robie 2011-09-30 13:14:15 UTC

(In reply to comment #7)

> I propose adding to the penultimate bullet of 2.2.5 Consistency Constraints
> (which starts "For a given query, define a participating ISSD ...") the
> following: 
> 
> <add>
> "Equivalence" here means that validating an instance against one ISSD will
> always have the same effect as validating the same instance against the other
> ISSD (that is, it will produce the same PSVI, insofar as the PSVI is used
> during subsequent processing). This means, for example, that the membership of
> the substitution group of an element declaration in one ISSD must be the same
> as that of the corresponding element declaration in the other ISSD; that the
> set of types derived by extension from a given type must be the same; and that
> in the presence of a strict or lax wildcard, the set of global element (or
> attribute) declarations capable of matching the wildcard must be the same.
> </add> 

I think this rule changes behavior, and would be incompatible with XQuery 1.0.

Suppose module A imports schema A, and module B imports no schema at all. Validation will work differently in the two modules.

It's important to allow different modules to use different sets of schemas. This has been possible since XQuery 1.0, and it's very useful when querying mixed sets of documents.

Comment 11 Jonathan Robie 2011-09-30 13:20:02 UTC

(In reply to comment #7)
> Reopening because I don't think the issue has been adequately addressed.
 
What is the issue? Could you please clearly state what problem needs to be solved?

I closed the issue because nobody has yet said what the bug is. You reopened it without saying what the bug is, referring to "the issue" as though it were clear what the issue is.

Comment 12 Michael Kay 2011-10-07 14:44:44 UTC

(In reply to comment #11)
> (In reply to comment #7)
> > Reopening because I don't think the issue has been adequately addressed.
> 
> What is the issue? Could you please clearly state what problem needs to be
> solved?

It is explained in great detail in the initial bug report, and is then generalised to a wider problem in the first paragraph of comment #1. In summary, different modules can have different schemas and this means that an element validated with type T in one module can be invalid against type T in another module, which violates type soundness. My proposal in comment #7 is to "fix" this by strengthening the existing rules requiring the schemas used by different modules to be consistent with each other.

Comment 13 Jonathan Robie 2011-10-07 15:09:48 UTC

(In reply to comment #12)
> (In reply to comment #11)
> > (In reply to comment #7)
> > > Reopening because I don't think the issue has been adequately addressed.
> > 
> > What is the issue? Could you please clearly state what problem needs to be
> > solved?
> 
> It is explained in great detail in the initial bug report, and is then
> generalised to a wider problem in the first paragraph of comment #1. In
> summary, different modules can have different schemas and this means that an
> element validated with type T in one module can be invalid against type T in
> another module, which violates type soundness. 

I believe the following text - which already exists - says that types that are known in one module must be consistent with types known in another module, and that types known in a module must be consistent with types in an instance that it queries:

<quote>
An XQuery 3.0 implementation must be able to determine relationships among the
types in type annotations in an XDM instance and the types in the in-scope
schema definitions (ISSD). An XQuery 3.0 implementation must be able to
determine relationships among the types in ISSDs used in different modules of
the same query.
</quote>

> My proposal in comment #7 is to
> "fix" this by strengthening the existing rules requiring the schemas used by
> different modules to be consistent with each other.

I think your proposal does more than that, it would require all modules to use the same schema definitions. That's something we've never required in the past, and it would significantly change behavior. For instance, with your proposal, I can't have a module that imports no schemas so that I can do some things with weaker typing assumptions. That's something that's very useful to do sometimes, e.g. when dealing with different kinds of HTML I may want one module to make very few assumptions so I can read various dialects, but another module might want to import a specific schema so I can create valid HTML of some kind and work with HTML known to be a valid instance of that kind.

So I don't think it's currently broken, and I think that your proposal would break it.

Comment 14 Ghislain Fourny 2011-10-07 15:38:15 UTC

Like Jonathan, I feel that the specification addresses this issue already.

However, clarifying along the lines of Michael's formulation in comment 7 might help. If we decide to do so, I would like to make the following suggestions:

1. Remove the last two sentences of the bullet point mentioned in comment 7 (as in the current specification)

("Furthermore, if two participating ISSDs each contain a definition of a schema type T, the set of types derived by extension from T must be equivalent in both ISSDs. Also, if two participating ISSDs each contain a definition of an element name E, the substitution group headed by E must be equivalent in both ISSDs.")

i.e., replace these by the formulation proposed in comment 7, because (i) this is covered by the latter, and (ii) these sentences also use the word "equivalent" for something else than the definition of a schema type (set of types, substitution group). Comment 7 uses the terminology "same as" instead.

2. Reformulate (with the purpose of addressing Jonathan's concern in comments 10 and 13) the first sentence suggested in comment 7 as:

The "equivalence" of two schema type definitions here means that validating an instance according to the first definition will always have the same effect as validating the same instance according to the second definition.

This way, if I am correct, the absence of a schema type definition in one of the ISSDs would not prevent the two ISSDs from being consistent, as the bullet point says "If two participating ISSDs contain a definition for the same schema type, ...".

Comment 15 Michael Kay 2011-10-07 17:47:15 UTC

Rereading this, I think that Oliver's original issue is in fact covered by the rule in 2.2.5 ("if two participating ISSDs each contain a definition of an element name E, the substitution group headed by E must be equivalent in both ISSDs.")

But there are other potential inconsistencies that are not covered, for example the fact that strict wildcards may be matched in one schema and not the other. It's very unfortunate that adding an element declaration to a schema can make an unrelated element invalid.

It's possible that an implementation may be able to tolerate such inconsistencies, provided it avoids making too many inferences from knowledge of types. For example, it might refrain from making any inferences about things that match wildcards, or it might assume statically that every element has an open-ended substitution group. It still feels very painful that revalidating an element that is already annotated as valid, against the same type, could fail, and we should certainly allow implementations to prevent this happening. The section on inconsistencies is essentially pointing out a list of conditions under which all bets are off, and unless we're prepared to specify exactly what happens when these inconsistencies arise, we should play safe by including things in the list.

Comment 16 Jonathan Robie 2011-10-11 14:03:12 UTC

Here's a formulation, based on comment #14, that may address Mike Kay's concerns and mine at the same time.

<keep>
For a given query, define a participating ISSD as the in-scope
schema definitions of a module that is used in evaluating the
query. If two participating ISSDs contain a definition for the
same schema type, element name, or attribute name, the
definitions must be equivalent in both ISSDs. 
</keep>

<del> 
Furthermore, if two participating ISSDs each contain a definition
of a schema type T, the set of types derived by extension from T
must be equivalent in both ISSDs. Also, if two participating
ISSDs each contain a definition of an element name E, the
substitution group headed by E must be equivalent in both ISSDs.
</del>

<add>
"Equivalence" here means that validating an instance against type
T in one ISSD will always have the same effect as validating the
same instance against type T in the other ISSD (that is, it will
produce the same PSVI, insofar as the PSVI is used during
subsequent processing). This means, for example, that the
membership of the substitution group of an element declaration in
one ISSD must be the same as that of the corresponding element
declaration in the other ISSD; that the set of types derived by
extension from a given type must be the same; and that in the
presence of a strict or lax wildcard, the set of global
element (or attribute) declarations capable of matching the
wildcard must be the same.  
</add>

Comment 17 Jonathan Robie 2011-10-11 14:17:26 UTC

(In reply to comment #15)
> It still feels very painful that revalidating an
> element that is already annotated as valid, against the same type, could fail,
> and we should certainly allow implementations to prevent this happening.

I'm not sure how generally you mean this.

Suppose Module A creates a validated element, and Module B - which does not import any schemas - tries to validate the element with strict validation. That will fail. It has to. I don't think we should leave that up to the implementation.

Comment 18 Michael Kay 2011-10-25 17:03:53 UTC

Decided to solve this by improving the text on consistency constraints on the lines proposed in comment #16.

Comment 19 Tim Mills 2011-11-11 08:48:43 UTC

Thanks.