This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 11095 - Lack of clarity on 'unknown' types
Summary: Lack of clarity on 'unknown' types
Status: RESOLVED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: XQuery 3.0 (show other bugs)
Version: Working drafts
Hardware: PC Linux
: P2 normal
Target Milestone: ---
Assignee: Jonathan Robie
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-10-19 15:35 UTC by Mary Holstege
Modified: 2011-02-22 17:17 UTC (History)
2 users (show)

See Also:


Attachments

Description Mary Holstege 2010-10-19 15:35:30 UTC
For additional context, see http://www.w3.org/Bugs/Public/show_bug.cgi?id=10291

The issue is that there is a main module and a library module.
The library imports a schema that declares a subtype of xs:int and has a function
that that returns an instance of the subtype.
The main module does not import the schema, but tests whether the result of
the library module function call is an instance of xs:int.

The intended result is that this return true and not raise an error, but the
text is not make clear why the error is forbidden.

See also http://lists.w3.org/Archives/Member/w3c-xsl-query/2010Sep/0062.html 
and following (member only)
Comment 1 John Snelson 2010-11-02 18:26:18 UTC
I believe this problem can be fixed by modifying XQuery 3.0 section 2.5.4 as follows:

1) Define S as containing every type from every module in an XQuery program.

2) Remove the first bullet from the definition of derives-from().
Comment 2 Jonathan Robie 2010-11-04 13:52:51 UTC
(In reply to comment #1)
> I believe this problem can be fixed by modifying XQuery 3.0 section 2.5.4 as
> follows:
> 
> 1) Define S as containing every type from every module in an XQuery program.

I don't like that, because it means that the schemas in all modules used in an XQuery must be consistent. That breaks an important aspect of XQuery's design - we need to be able to have modules written for different versions of a schema that evolves over time, or modules for different XML vocabularies that use the same names but are not in a namespace.

Implementations are already allowed to know the types in an instance document and to use them in these judgements, but whether they do this is implementation-defined:

> potentially, the schema used for validating the instance document; 
> whether a processor adds this schema to S is implementation-defined.

If a library creates derived types in an instance, the type information in that instance can be used to determine whether the type is an instance of xs:int.

If we want to say an implementation is not allowed to raise the error, we should require this, and not leave it implementation-defined.
Comment 3 Jonathan Robie 2011-01-25 20:43:52 UTC
We already define the term Data Model Schema, which is currently used only for consistency constraints. We can expand the definition to include atomic types:

* Definition: Data Model Schema

 [Definition: For a given node <add>or atomic value</add> in an
 XDM instance, the data model schema is defined as the schema
 from which the type annotation of that node was derived.] For a
 node <add>or atomic value</add> that was constructed by some
 process other than schema validation, the data model schema
 consists simply of the schema type definition that is
 represented by the type annotation <del>of the node</del>.

We can then use this concept to extend derives-from() in SequenceType matching, saying that the Actual Type is a definition in the Data Model Schema, and the Expected Type is a definition in S:

* SequenceType Matching 

The definition of SequenceType matching relies on a pseudo-function
named derives-from( AT, ET ), which takes an actual simple or complex
schema type AT <add>from a data model schema</add> and an expected
simple or complex schema type ET <add="alt2">from S</add>, and either
returns a boolean value or raises a type error [err:XPTY0004]. This
function is defined as follows:

# derives-from( AT, ET ) raises a type error [err:XPTY0004] if ET is
  not present in S. If AT is not present in S, derives-from( AT, ET )
  returns derives-from(gcd(AT), ET), where gcd(AT) is the most
  specific base type of AT in the data model schema that is
  present in S.


If we go this route, I would be inclined to require an error if the schemas in S are not consistent, instead of merely allowing one:

* Composing S

 This determination is done by reference to a schema
 S (that is, a set of schema components). This schema S is the
 union of:

 1. the in-scope schema definitions in the static context of the
  module.

 2. potentially, the schema used for validating the instance document;
 whether a processor adds this schema to S is implementation-defined.

 3. potentially, further schema components that have been made
 available to the processor in an implementation-defined way.


A type error [err:XPTY0004] <del>may</del><add>must</add> be raised if this union does not
constitute a valid schema (for example, if there are conflicts
between types present in the static context and types used
dynamically for validating instances.)
Comment 4 Michael Dyck 2011-01-25 21:35:35 UTC
[Editorial...]

> * Definition: Data Model Schema
> 
>  [Definition: For a given node <add>or atomic value</add> in an
>  XDM instance, the data model schema is defined as the schema
>  from which the type annotation of that node was derived.]

After "that node", insert "or atomic value".
Maybe change occurrences of "node or atomic value" to "item".

Moreover, "type annotation" is currently only defined for (element and attribute) nodes, so we'd need to define it for atomic values.

Also (this is pre-existing language but) using the word "derived" might suggest derived types, so I wonder if:
    the schema from which X was derived
should be changed to something like:
    the schema that defines X

>  For a node <add>or atomic value</add> that was constructed by some
>  process other than schema validation,

(Again, this is existing language, but) I don't think it's correct to talk about nodes and atomic values being constructed by validation. Instead of:
    constructed by some process other than schema validation
maybe it should say:
    constructed from something other than a PSVI
Comment 5 Jonathan Robie 2011-01-25 22:11:27 UTC
(In reply to comment #4)
> [Editorial...]

Yes ... but you're right on both points.
Comment 6 Jonathan Robie 2011-02-22 17:17:55 UTC
In today's telcon, we agreed to resolve this bug as follows:




1. Remove reference to the schema S entirely

2. Replace consistency constraints for the data model schema with a
statement that if your data model instance can contain new types
derived by extension, or new members of substitution groups, there are
limits to the amount of static inference that can be done.

3. Require implementations to know type derivations used in the data
model.

Make the changes listed below - (from Mike Kay's email
http://lists.w3.org/Archives/Member/w3c-xsl-query/2011Feb/0085.html,
member only).

A. change section 2.7 ("Types") as outlined below.

A1. Change the title to "Schema Information".

A2. Change the intro to

The data model supports strongly typed languages such as [XML Path
Language (XPath) 3.0] and [XQuery 3.0: An XML Query Language] that have
a type system based on [Schema Part 1]. To achieve this the data model
includes (by reference) the Schema Component model described in [Schema
Part 1].

Note: The Schema Component Model includes a number of kinds of
component, such as type definitions and element and attribute
declarations, and defines the properties and relationships of these
components. Many of these components and properties are not used by the
language specifications that rely on XDM, and where this is the case,
there is no requirement for implementations to make them visible.
However, this specification makes no attempt to define the minimal
subset of the schema component model that is needed to support the
semantics of XPath and XQuery processing.

There are two main areas where the language semantics depend on
information in schema components:

(a) Expressions are evaluated with respect to a static context, which
includes schema components, specifically type definitions, element
declarations, and attribute declarations. The names of such components
may be used in language constructs only if the components are present in
the static context.

(b) Values including element and attribute nodes, and atomic values,
have a property called a type annotation whose value is a type: this is
a reference to a type definition in the Schema Component Model.

Every item in the data model has both a value and a type. In addition to
nodes, the data model can represent atomic values like the number 5 or
the string Hello World. For each of these atomic values, the data
model contains both the value of the item (such as 5 or Hello World)
and its type. The property that holds the type is sometimes referred to
as the type annotation: its value is a type definition component as
defined in the SCM model. This may be a built-in type (a type with a
name such as xs:integer or xs:string), or a user-defined type.

There is a constraint that the total set of components used during
expression processing (both statically and dynamically) must constitute
a valid schema. This implies, for example, that this total set does not
include two different types with the same expanded name.

Note: this makes it the responsibility of the processor to ensure that
the schema components used in the static context of a query or
expression during static analysis are consistent with the schema
components used to validate documents during query or expression
evaluation. This specification does not say how this should be achieved.

It is also a constraint that the schema available to the processor must
contain at least the components and properties needed to correctly
implement the semantics of the XPath and XQuery language. For example,
this means that given a node with a particular type annotation T, and a
function that expects an argument of type S, there must be sufficient
information available to the processor to establish whether or not T is
derived from S. As with other consistency constraints described in this
data model, it is a precondition that these constraints are satisfied;
the specifications do not speculate on what happens if they are not.

A3. Section 2.7.1. Since the type annotation property of a value is now a
type, rather than a type name, I think we can dispense with the need to
discuss the construction of names for anonymous types; and we can remove
the statement that "The data model does not represent element or
attribute declaration schema components".

A4. Throughout, wherever we refer to the "type-name" property of a node, change this to refer instead to the "schema type" property; this may require slight changes to the wording of the surrounding text.

A5. Throughout, retain "type-name" as an accessor. The definition in 5.14 does not need to change (it already says it is the name of the schema type of the node).  But add "If the schema type of the node is an anonymous type, the accessor returns a synthesized name that is distinct from the names of all other types. If the node is of a kind that does not have a schema type (for example, text nodes), the accessor returns the empty sequence. The prefix of the returned QName is implementation-dependent.

A6. Change the title of section 3.3.1.1 from "Element and Attribute Node Type Names" to "Element and Attribute Node Types".