5738 – [XQuery] Constraints on schemas

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 5738 - [XQuery] Constraints on schemas

Summary: [XQuery] Constraints on schemas

Status:	RESOLVED REMIND

Alias:	None

Product:	XPath / XQuery / XSLT
Classification:	Unclassified
Component:	XQuery 1.0 (show other bugs)
Version:	Recommendation
Hardware:	PC Windows NT

Importance:	P2 normal
Target Milestone:	---
Assignee:	Jonathan Robie
QA Contact:	Mailing list for public feedback on specs from XSL and XML Query WGs

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2008-06-10 10:35 UTC by Michael Kay
Modified:	2009-02-24 16:01 UTC (History)
CC List:	3 users (show)

See Also:

Attachments

Description Michael Kay 2008-06-10 10:35:13 UTC

Section 2.2.5 (Consistency Constraints) defines some constraints relating the statically-known schema definitions to the dynamically-known schema definitions. Specifically:

# For every node that has a type annotation, if that type annotation is found in the in-scope schema definitions (ISSD), then its definition in the ISSD must be equivalent to its definition in the data model schema. Furthermore, all types that are derived by extension from the given type in the data model schema must also be known by equivalent definitions in the ISSD.

# For every element name EN that is found both in an XDM instance and in the in-scope schema definitions (ISSD), all elements that are known in the data model schema to be in the substitution group headed by EN must also be known in the ISSD to be in the substitution group headed by EN.

I think that as written, these constraints are too strong. The can be read as meaning that if you import a schema for namespace X into a module, then you must also import every namespaces that contains (a) a type derived by extension from a type that you use at run-time, or (b) an element in the substitution group of an element that you use at run-time.

I think there are two things wrong here. Firstly, static knowledge of types derived by extension, and of substitution groups, is needed only by processors that want to do static type inferencing or pessimistic static type checking. If you don't use the information then it doesn't matter if it's not there, or if it is wrong. Secondly, although the processor might need this information at compile time, that doesn't mean that it has to be in the ISSD (i.e., explicitly imported into each query module). It could, for example, be in a schema document that is imported transitively by a schema document that has been imported into the ISSD, or one that is imported into other query modules.

I would suggest changing the introductory paragraph from

<old>
Some of the consistency constraints use the term data model schema. [Definition: For a given node in an XDM instance, the data model schema is defined as the schema from which the type annotation of that node was derived.] For a node that was constructed by some process other than schema validation, the data model schema consists simply of the schema type definition that is represented by the type annotation of the node.
</old>

to

<new>
Some of the consistency constraints use the term *data model schema*. [Definition: For a given node in an XDM instance, the data model schema is defined as the schema that was used during the validation episode that caused the node to acquire its type annotation.] For a node that was constructed by some process other than schema validation, the data model schema consists simply of the schema type definition that is represented by the type annotation of the node.

Some of the consistency constraints also use the term *static schema*. [Definition: The static schema consists of all schema components that are explicitly included in the ISSD of a query module, together with all other schema components that the query processor makes use of during static analysis.]
</new> 

Then change the two rules cited above to the following three rules:

<new>
# For every node that has a type annotation, if that type annotation is found in the static schema, then its definition in the static schema must be equivalent to its definition in the data model schema.

# If the processor performs any static analysis using information in the static schema, then for every type T that appears in both the static schema and the data model schema, all types appearing in the data model schema that are derived by extension (directly or indirectly) from T must also be present, with equivalent definitions, in the static schema.

# If the processor performs any static analysis using information in the static schema, then for every element name EN that is found both in an XDM instance and in the static schema, all elements that are known in the data model schema to be in the substitution group headed by EN must also be present in the static schema as members of the substitution group headed by EN.
</new>

Arguably this is still a bit paternalistic (and untestable). What we really want to say is that if the processor makes compile-time inferences based on its static knowledge of the schema, then its inferences had better be correct and apply equally to the runtime schema.

Comment 1 Don Chamberlin 2008-07-15 22:42:24 UTC

On 15 July 2008 the Query Working Group considered this bug report and agreed to make the proposed changes. Since the submitter of the proposal was present for the discussion, I am marking this bug report as closed.
--Don Chamberlin (for the Query Working Group)

Comment 2 Jim Melton 2008-07-29 17:25:38 UTC

During the XML Query WG's teleconference on 2008-07-29, this bug report and its previously-accepted solution were discussed extensively.  The WG is no longer certain that the previously-accepted solution is the right way to resolve the bug report and is thus re-opening the bug for further consideration.

Comment 3 Jonathan Robie 2008-07-29 17:57:05 UTC

I don't think there's a technical bug in the original spec, but I do think some further clarification would be good. I think the best solution is to simply add the following note in 2.2.5, before the bulleted list:

<add>
Note: The in-scope schema definitions can be augmented by the implementation, and may include schema components not explicitly imported in a query. See C.1 Static Context Components.
</add>

An implementation could choose to add all schemas associated with a collection to the ISSD, or to add all schemas used to validate a given document before processing that document. I think that's what the document says today, and I don't think we can change that as a bug fix.

Here's my trace of the spec::

2.1.1 Static Context
<Jonathan> [Definition: The static context of an expression is the information that is available during static analysis of the expression, prior to its evaluation.] This information can be used to decide whether the expression contains a static error. If analysis of an expression relies on some component of the static context that has not been assigned a value, a static error is raised [err:XPST0001].
The individual components of the static context are summarized below. Rules governing the scope and initialization of these components can be found in C.1 Static Context Components.

-->
http://www.w3.org/TR/xquery/#id-xq-static-context-components

That table clearly says the ISSD is augmentable by a query processor, and may include types not in the built-in types and not explicitly imported by the user.

This flexibility is intentional - we discussed it at great length early in the history of the Working Group. It does lead to query portability issues, like everything else that is augmentable in Appendix C.1. A query that relies on any of that is not portable.

Comment 4 Michael Kay 2008-07-29 18:00:09 UTC

In discussion, there were clearly a number of people who felt that the term
ISSD was intended to mean what I meant by "static schema", that is, all the
schema definitions available to the processor during static analysis.

I think there are a couple of difficulties with this. Firstly, it means that
"import schema" loses any connection with the <xsd:import> element on which it
was originally based; it can no longer be described as giving the query access
to a particular set of component names in a particular namespace. If we want to
do this we should probably say explicitly that "import schema" IS transitive,
that is, it imports all the schema components in and referenced by the
identified schema document, irrespective of their namespace, so that the user
has some kind of predictability in knowing which names can be used in a query
as a consequence of a given "import schema" declaration.

This incidentally would make "import schema" very different from "import
module", which is explicitly NOT transitive.

(The current spec of "import schema" is very muddled, and it's not surprising,
because the schema composition story in XSD itself is so muddled. We talk about
a schema "being identified by its target namespace": but schemas don't have
target namespaces, only schema documents (and schema components) do. XSD itself
has the notion of "the schema corresponding to a schema document" - we could
use that term, though unfortunately it is itself pretty weakly-defined.)

Secondly, I think that it we adopt this wider interpretation of ISSD, then the
contents of the ISSD become very variable from one implementation to another,
which means that many queries are going to fall foul of rules like this one:

<rule>
It is a static error [err:XQST0036] to import a module if the importing
module's in-scope schema types do not include definitions for the schema type
names that appear in the declarations of variables and functions (whether in an
argument type or return type) that are present in the imported module and are
referenced in the importing module.
</rule>

when queries are moved from one implementation to another. Now it's true enough
that we already say the ISSD is "augmentable", but my interpretation of that
has always been that as an alternative to importing a schema namespace in the
prolog, you can do it from the API. An interpretation that puts the contents of
the ISSD largely in the hands of the implementation rather than the user is
going to cause a lot of interoperability problems.

I think it's actually useful to make a distinction between "the set of types
that the schema processor knows about at analysis time" and "the set of types
that the query is allowed to mention by name". It means that when you read the
"import schema" declarations in the prolog, you know something about the
module's dependencies, just as you do if you read the "import module"
declarations.

Finally: the consistency constraints that are the subject of the original bug
clearly apply only in the case of a processor that is doing static type
inferencing.

Comment 5 Michael Dyck 2008-07-29 22:40:45 UTC

(In reply to comment #3)
>... C.1 Static Context Components.
> 
> That table clearly says the ISSD is augmentable by a query processor, and
> may include types not in the built-in types and not explicitly imported
> by the user.

For the three components of the ISSD, the table says "augmentable" in the
column labelled "Can be overwritten or augmented by implementation?", but
this must be interpreted in light of the definition for that column in the
preceding bulleted list, which I believe says that the "augmentability"
only refers to the initial value (setting) of each component.

So, while it's quite true that the 'in-scope schema types' component can
contain types other than the xs types and explicitly imported types,
(a) the implementator is obliged to specify what those extra types are, and
(b) they're in the in-scope schema types from the start (rather than being
    added as the result of a schema import).

In particular, if a processor were to implement a transitive schema import,
I don't think it could claim conformance by virtue of C.1.

Comment 6 Jonathan Robie 2008-07-30 15:04:34 UTC

(In reply to comment #4)
> In discussion, there were clearly a number of people who felt that the term
> ISSD was intended to mean what I meant by "static schema", that is, all the
> schema definitions available to the processor during static analysis.


I think the spec says that, I gave a trace in comment #3.


> I think there are a couple of difficulties with this. Firstly, it means that
> "import schema" loses any connection with the <xsd:import> element on which it
> was originally based; it can no longer be described as giving the query access
> to a particular set of component names in a particular namespace. If we want to
> do this we should probably say explicitly that "import schema" IS transitive,
> that is, it imports all the schema components in and referenced by the
> identified schema document, irrespective of their namespace, so that the user
> has some kind of predictability in knowing which names can be used in a query
> as a consequence of a given "import schema" declaration.


I think http://www.w3.org/TR/xquery/#dt-schema-import is completely vague about what schema components are imported, and whether import schema is transitive or not. Do you see clear language that says it is not transitive? This seems like something worth fixing.

> Secondly, I think that it we adopt this wider interpretation of ISSD, then the
> contents of the ISSD become very variable from one implementation to another,

I think that's true for every single thing that is augmentable by an implementation. But here is the interpretation the spec gives us:

# Default initial value: This is the initial value of the component if it is not overridden or augmented by the implementation or by a query.

# Can be overwritten or augmented by implementation: Indicates whether an XQuery implementation is allowed to replace the default initial value of the component by a different, implementation-defined value and/or to augment the default initial value by additional implementation-defined values.

# Can be overwritten or augmented by a query: Indicates whether a query is allowed to replace and/or augment the initial value provided by default or by the implementation. If so, indicates how this is accomplished (for example, by a declaration in the prolog).

As long as the implementation documents how it augments the value, it's free to do so. I don't think we should try to change this as a bug fix to a published spec.

> I think it's actually useful to make a distinction between "the set of types
> that the schema processor knows about at analysis time" and "the set of types
> that the query is allowed to mention by name". It means that when you read the
> "import schema" declarations in the prolog, you know something about the
> module's dependencies, just as you do if you read the "import module"
> declarations.

We can make changes to the model for our type system in future versions of the spec if we choose to, but please not as a bug fix to a published spec.

> Finally: the consistency constraints that are the subject of the original bug
> clearly apply only in the case of a processor that is doing static type
> inferencing. 
 
I'm not sure that I believe this. Suppose the data model instance and the ISSD define the same name, but do so in two different ways. Wouldn't that affect SequenceType Matching and casting in ways that our spec does not account for?

Jonathan

Comment 7 Michael Kay 2008-07-30 15:34:40 UTC

> In discussion, there were clearly a number of people who felt that the 
> term ISSD was intended to mean what I meant by "static schema", that 
> is, all the schema definitions available to the processor during static analysis.

# I think the spec says that, I gave a trace in comment #3.

No, I don't think it does say that. The definition of ISSD in 2.1.2 says it includes the types/declarations "in imported schemas". Also, the use of the term "in-scope" gives a strong suggestion that there are definitions and declarations that are not "in-scope"; the language is similar to that used for variables and functions, where the concept of "scope" indicates whether names are available for use within a part of the query.

# I think http://www.w3.org/TR/xquery/#dt-schema-import is completely vague about what schema components are imported, and whether import schema is transitive or not. Do you see clear language that says it is not transitive?

Not as clear as I would like. But the phrase "...a schema import specifies the target namespace of the schema to be imported" comes close: it suggests strongly that the behaviour is analogous with xsd:import (and with import module) in that it causes all names in a particular namespace to be referenceable. Neither of those other facilities, also called import, are transitive.

> then the contents of the ISSD become very variable from one 
> implementation to another,

# I think that's true for every single thing that is augmentable by an implementation.

I would prefer the context to be less open-ended, but my interpretation of this "augmentation" has always been that an implementation may provide API configuration facilities that give the user an alternative to the Query Prolog as a way of specifying such context information.

This is all straying a long way from the original bug report, however. If it's true that the user doesn't have any control over what's in the ISSD and what isn't, then it becomes even more true that the consistency rules as currently stated in 2.2.5 make very little sense.

Comment 8 Jonathan Robie 2008-07-30 15:50:11 UTC

> No, I don't think it does say that. The definition of ISSD in 2.1.2 says it
> includes the types/declarations "in imported schemas". Also, the use of the
> term "in-scope" gives a strong suggestion that there are definitions and
> declarations that are not "in-scope"; the language is similar to that used for
> variables and functions, where the concept of "scope" indicates whether names
> are available for use within a part of the query.

But C.1. clearly says this is augmentable.

> # I think http://www.w3.org/TR/xquery/#dt-schema-import is completely vague
> about what schema components are imported, and whether import schema is
> transitive or not. Do you see clear language that says it is not transitive?
> 
> Not as clear as I would like. But the phrase "...a schema import specifies the
> target namespace of the schema to be imported" comes close: it suggests
> strongly that the behaviour is analogous with xsd:import (and with import
> module) in that it causes all names in a particular namespace to be
> referenceable. Neither of those other facilities, also called import, are
> transitive.

OK, I'll buy that.

> I would prefer the context to be less open-ended, but my interpretation of this
> "augmentation" has always been that an implementation may provide API
> configuration facilities that give the user an alternative to the Query Prolog
> as a way of specifying such context information.

I like that interpretation, and I actually tried to get us to say something specific like that, but I lost that battle. The spec does not narrow it down like that. And I don't think we should change that as a bug fix.

> This is all straying a long way from the original bug report, however. If it's
> true that the user doesn't have any control over what's in the ISSD and what
> isn't, then it becomes even more true that the consistency rules as currently
> stated in 2.2.5 make very little sense. 

A user can clearly do schema import. We don't seem to even specify a way for a user to do a transitive schema import. Perhaps providing that mechanism would be helpful?

Jonathan

Comment 9 Jonathan Robie 2009-02-24 16:01:41 UTC

In the 2ed errata, I will propose a non-normative note that clarifies that schema import imports the assembled schema.