This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 6513 - [XQuery] inconsistent terminology in definition of derives-from()
Summary: [XQuery] inconsistent terminology in definition of derives-from()
Status: RESOLVED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: XQuery 1.0 (show other bugs)
Version: Recommendation
Hardware: All All
: P2 minor
Target Milestone: ---
Assignee: Jonathan Robie
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-02-01 22:17 UTC by Michael Dyck
Modified: 2009-10-06 21:21 UTC (History)
1 user (show)

See Also:


Attachments

Description Michael Dyck 2009-02-01 22:17:54 UTC
XQuery 2.5.4 [SequenceType Matching] defines the concepts of "known" and
"unknown" schema types:
    The given schema type may be "known" (defined in the in-scope
    schema definitions), or "unknown" (not defined in the in-scope
    schema definitions).

The definition of pseudo-function 'derives-from' then uses these terms, but
it also uses the phrases
    "a schema type [not] found in the in-scope schema definitions"
suggesting that this is a distinct concept from known/unknown, which I
don't believe is intended.

Therefore, please change
    "AT is a schema type found in the in-scope schema definitions"
to
    "AT is a known type"
and
    "AT is a schema type not found in the in-scope schema definitions"
to
    "AT is an unknown type"

(Alternatively, you could unfold all uses of "known" and "unknown", and get
rid of the definitions, but they seem like useful abbreviations.)
Comment 1 Michael Kay 2009-02-01 23:06:50 UTC
See also bug #5738, which discusses similar inconsistencies of terminology in section 2.2.5.

I think that using "known" to mean "present in the ISSD" is unfortunate, since the whole idea behind the rules in 2.5.4 is that the processor may have knowledge of types that have not been explicitly imported, and may use this knowledge. I suspect it is because of this difference between the defined meaning of "known" and its intuitive meaning that the word is not used more widely. So rather than using "known" more widely, I would prefer to use a more helpful term like "declared".
Comment 2 Michael Kay 2009-02-01 23:22:13 UTC
Note particularly my observation in comment #4 of bug #5738: "In discussion, there were clearly a number of people who felt that the term
ISSD was intended to mean what I meant by "static schema", that is, all the
schema definitions available to the processor during static analysis."

If, as this view would suggest, "ISSD" and "the set of types known to the processor at compile time" are synonyms, then a lot of the discussion in 2.5.4 becomes nonsense.

Comment 3 Michael Dyck 2009-02-02 01:01:04 UTC
(In reply to comment #1)
> 
> I think that using "known" to mean "present in the ISSD" is unfortunate, since
> the whole idea behind the rules in 2.5.4 is that the processor may have
> knowledge of types that have not been explicitly imported, and may use this
> knowledge.

Agreed.

> So rather than using "known" more widely, I would prefer to use a more
> helpful term like "declared".

I'd be in favour of a better term than "known", but I don't think
"declared" is it, because I can easily imagine a type being declared
(in a schema somewhere) but not present in the ISSD. Instead, I think
the clearest abbreviation would be "in scope". That is:

    AT is in scope  (or, AT is an in-scope type)
      ==
    AT is a schema type found/defined in the 'in-scope schema definitions'

Comment 4 Jonathan Robie 2009-03-16 14:34:35 UTC
(In reply to comment #1)
> See also bug #5738, which discusses similar inconsistencies of terminology in
> section 2.2.5.
> 
> I think that using "known" to mean "present in the ISSD" is unfortunate, since
> the whole idea behind the rules in 2.5.4 is that the processor may have
> knowledge of types that have not been explicitly imported, and may use this
> knowledge. I suspect it is because of this difference between the defined
> meaning of "known" and its intuitive meaning that the word is not used more
> widely. So rather than using "known" more widely, I would prefer to use a more
> helpful term like "declared".


I'm confused.

If I understand this correctly, the ISSD can be augmented by the implementation, so the ISSD contains all statically known types. There are three components in the ISSD (http://www.w3.org/TR/xquery/#dt-issd), and each of these can be augmented according to Appendix C (http://www.w3.org/TR/xquery/#id-xq-static-context-components).

I think the terms "statically known" and "statically unknown" would be more precise than "known" and "unknown". I think the term "declared" would be misleading, because this includes statically known types that are known to the implementation but not explicitly declared.

So I think the clearest change would be to use "statically known" and "statically unknown", and to use these terms consistently as suggested on comment #1.

Jonathan
Comment 5 Michael Kay 2009-03-16 15:00:10 UTC
>the ISSD contains all statically known types

This may be your understanding, but it's not mine. My understanding is that import-schema (like import-module, and like xs:import) imports a namespace, and licenses the use of names in that namespace. It does not license (or make "in-scope") any names from any other namespace, for example a namespace that is imported transitively, or a namespace that is imported by a different query module.

Although we allow implementations to augment the ISSD, my understanding is that this is designed so that, like the rest of the query prolog, an API can provide equivalent facilities to the declarations in the prolog, for example an importSchemaNamespace() method. It wasn't intended to make the effect of import schema transitive.

I agree that there's nothing explicit in the spec to say that import-schema is expected to behave in the same way as import-module and xs:import, that is, non-transitively. The sooner we clear this up, the better. And we certainly shouldn't leave it so that import-schema is transitive in some implementations, and not in others.
Comment 6 Jonathan Robie 2009-03-16 15:53:23 UTC
(In reply to comment #5)
> >the ISSD contains all statically known types
> 
> This may be your understanding, but it's not mine. My understanding is that
> import-schema (like import-module, and like xs:import) imports a namespace, and
> licenses the use of names in that namespace. It does not license (or make
> "in-scope") any names from any other namespace, for example a namespace that is
> imported transitively, or a namespace that is imported by a different query
> module.

Are you talking about the same issue that this bug is about?

Any type that is statically known is in the ISSD. Whether schema import is transitive or non-transitive, statically known definitions are in the ISSD.

Jonathan
Comment 7 Michael Kay 2009-03-16 16:24:22 UTC
>Any type that is statically known is in the ISSD.

So you keep saying. But saying it doesn't make it true.
Comment 8 Michael Kay 2009-03-16 16:57:16 UTC
>Any type that is statically known is in the ISSD.

Here's some evidence that I'm not the only one who considers this statement to be wrong. Look at error err:XQST0036

<quote>
    It is a static error to import a module if the importing module's in-scope schema types do not include definitions for the schema type names that appear in the declarations of variables and functions (whether in an argument type or return type) that are present in the imported module and are referenced in the importing module.
</quote>

Because this is a static error, it is presumably using only information known statically. And it seems to distinguish two sets of types: the types in the ISSD of module M, and the types named in the declarations of variables and functions referenced by M. Now, if the ISSD of module M includes all statically known types, then this error cannot possibly occur. So whoever wrote this rule (it wasn't me) clearly believed that it was possible for statically known types to be absent from the ISSD of a particular module.
Comment 9 Jonathan Robie 2009-03-16 17:55:35 UTC
(In reply to comment #8)
> >Any type that is statically known is in the ISSD.
> 
> Here's some evidence that I'm not the only one who considers this statement to
> be wrong. Look at error err:XQST0036
> 
> <quote>
>     It is a static error to import a module if the importing module's in-scope
> schema types do not include definitions for the schema type names that appear
> in the declarations of variables and functions (whether in an argument type or
> return type) that are present in the imported module and are referenced in the
> importing module.
> </quote>
> 
> Because this is a static error, it is presumably using only information known
> statically. And it seems to distinguish two sets of types: the types in the
> ISSD of module M, and the types named in the declarations of variables and
> functions referenced by M. Now, if the ISSD of module M includes all statically
> known types, then this error cannot possibly occur. So whoever wrote this rule
> (it wasn't me) clearly believed that it was possible for statically known types
> to be absent from the ISSD of a particular module.

We may be talking past each other. When you raised the issue of whether schema import is transitive, I assumed you were asking whether the assembled schema is imported, after the schema processor performs any imports. Now I suspect you're really asking whether a schema import in one module affects the ISSD of other modules. Do I understand you correctly?

As I read this, there are two ISSDs, that of the importing module and that of the imported module. Any name used in the variables and functions must be in the ISSD of the imported module. Any name used in the declaration of variables and functions in the imported module that are actually used in the importing module must be defined in the importing module's ISSD.

Each module has its own static context:

http://www.w3.org/TR/xquery/#id-module-import says:

> Each module has its own static context. A module import imports only 
> functions and variable declarations; it does not import other objects from 
> the imported modules, such as in-scope schema definitions or statically known 
> namespaces. Module imports are not transitivethat is, importing a module 
> provides access only to function and variable declarations contained directly 
> in the imported module. For example, if module A imports module B, and module 
> B imports module C, module A does not have access to the functions and 
> variables declared in module C.

Comment 10 Michael Kay 2009-03-16 18:32:55 UTC
Let's suppose module M1 does "import schema namespace S1 at s1.xsd", and s1.xsd in turn does <xs:import namespace="S2" schemaLocation="s2.xsd"/>. 

Let's suppose s1.xsd contains the definition

<xs:simpleType name="heptagon">
  <xs:restriction base="s2:polygon">
    <xs:length value="7"/>
  </xs:restriction>
</xs:simpleType>

Is M1 allowed to use the expression "$x instance of s2:polygon"?

I think our current specification doesn't give a clear answer to this question. My preference is that the answer should be no, because that seems to be suggested by the analogy with "import module" and <xs:import> (Though not with <xsl:import>). However, it could go either way. Clearly if the answer is "no", then s2:polygon is a statically known type even though it's not an in-scope type: the processor knows about it even though the user isn't allowed to mention it by name in that particular module.

Now suppose that M1 does "import module namespace M2 at m2.xq". The rules discussed in the last couple of comments make it clear (a) that schema namespaces imported into M2 are not automatically imported into M1, and (b) that the names used in functions and variables of M2 that are referenced from M1 must be in the ISSD of both modules. This at least creates the possibility that the processor for module M1 statically knows about types imported into M2 even though it does not make them part of the ISSD of M1.

There may also be other types known statically to the processor. For example, it may have a cache of types lying around from previous compilations of unrelated queries. It may even have access to a database containing a vast selection of all known types. The fact that these types are known does not and should not make them available for reference in M1 - that is, the type names are not "in scope" unless their namespace is imported.

So, the above discussion suggests three reasons why there may be types known to the processor that are not in the ISSD.

Now, getting back to this bug, examine the text:

"An unknown schema type might be encountered, for example, if a source document has been validated using a schema that was not imported into the static context. In this case, an implementation is allowed (but is not required) to provide an implementation-dependent mechanism for determining whether the unknown schema type is derived from the expected schema type. For example, an implementation might maintain a data dictionary containing information about type hierarchies."

it seems on the face of it to be saying that the processor might have knowledge about unknown types. That's pretty contradictory, until you realize that it's using "known" to mean "types in the ISSD". If we rewrite it to say:

"The given schema type may be "in-scope" (defined in the in-scope schema definitions), or "out-of-scope" (not defined in the in-scope schema definitions). An out-of-scope schema type might be encountered, for example, if a source document has been validated using schema components in a namespace that was not imported into the static context. In this case, an implementation is allowed (but is not required) to provide an implementation-dependent mechanism for determining whether the not-in-scope schema type is derived from the expected schema type. For example, an implementation might maintain a data dictionary containing information about type hierarchies."

then it seems to me to make a lot more sense.

Comment 11 Jonathan Robie 2009-03-17 09:52:45 UTC
(In reply to comment #10)
> Let's suppose module M1 does "import schema namespace S1 at s1.xsd", and s1.xsd
> in turn does <xs:import namespace="S2" schemaLocation="s2.xsd"/>. 
> 
> Let's suppose s1.xsd contains the definition
> 
> <xs:simpleType name="heptagon">
>   <xs:restriction base="s2:polygon">
>     <xs:length value="7"/>
>   </xs:restriction>
> </xs:simpleType>
> 
> Is M1 allowed to use the expression "$x instance of s2:polygon"?
> 
> I think our current specification doesn't give a clear answer to this question.
> My preference is that the answer should be no, because that seems to be
> suggested by the analogy with "import module" and <xs:import> (Though not with
> <xsl:import>). However, it could go either way. Clearly if the answer is "no",
> then s2:polygon is a statically known type even though it's not an in-scope
> type: the processor knows about it even though the user isn't allowed to
> mention it by name in that particular module.


To me, the open question is whether the assembled schema is imported, after all schema import, include, redefine, etc. has been done. When we've discussed this before, my memory is that we intended the answer to be "yes". Unlike modules, XML Schema was not designed to do encapsulation.

But if the answer were "no", this would not be a statically known type in M1 as defined in the XQuery specification. It's not in the ISSD of the module. I don't think the specification is ambiguous on this.

> Now suppose that M1 does "import module namespace M2 at m2.xq". The rules
> discussed in the last couple of comments make it clear (a) that schema
> namespaces imported into M2 are not automatically imported into M1, and (b)
> that the names used in functions and variables of M2 that are referenced from
> M1 must be in the ISSD of both modules. This at least creates the possibility
> that the processor for module M1 statically knows about types imported into M2
> even though it does not make them part of the ISSD of M1.

The term "known types" for a given module does not refer to known types in a different module. If a processor takes advantage of these types when processing M1, it is using a static typing extension. If the same base schema is extended in different ways, though, this may be a dangerous static typing extension. One module may import one extended version of the schema, another may import a different extended version of the same base schema. If M1 uses only the base schema, it can use either module. This was a design criterion way back when we did this initially.

> There may also be other types known statically to the processor. For example,
> it may have a cache of types lying around from previous compilations of
> unrelated queries. It may even have access to a database containing a vast
> selection of all known types. The fact that these types are known does not and
> should not make them available for reference in M1 - that is, the type names
> are not "in scope" unless their namespace is imported.

Sure, these types may exist in an implementation, and the processor may have static typing extensions that we know nothing about. Our specification does not describe them. 

Or it might leverage these types for optimizations. It can do that without documenting anything. Our specification does not need to say anything about that.

> So, the above discussion suggests three reasons why there may be types known to
> the processor that are not in the ISSD.

But if they are not in the ISSD, they are not statically known types for the module, as understood in our specification. If there are static typing extensions in a processor, that processor should document them. It's not our job to anticipate what kinds of static typing extensions might exist.

> Now, getting back to this bug, examine the text:
> 
> "An unknown schema type might be encountered, for example, if a source document
> has been validated using a schema that was not imported into the static
> context. In this case, an implementation is allowed (but is not required) to
> provide an implementation-dependent mechanism for determining whether the
> unknown schema type is derived from the expected schema type. For example, an
> implementation might maintain a data dictionary containing information about
> type hierarchies."
> 
> it seems on the face of it to be saying that the processor might have knowledge
> about unknown types. 

I believe the original intent was to say that a processor may have dynamic knowledge of types that are not in the static context. It might discover types by examining the schema for a document that is being queried, for instance. This was known as the "winged horse" proposal, because it said that an implementation need not be completely circumscribed by the ISSD.

To me, what is confusing about this is the "For example" part, because it specifies information that would be available statically. If there were a data dictionary, I would prefer to implement this using static typing extensions rather than a winged horse. I would change this text to say:

"For example, an implementation might explore the schema that was used to validate a document to discover type hierarchies dynamically."

> That's pretty contradictory, until you realize that it's
> using "known" to mean "types in the ISSD". If we rewrite it to say:
> 
> "The given schema type may be "in-scope" (defined in the in-scope schema
> definitions), or "out-of-scope" (not defined in the in-scope schema
> definitions). An out-of-scope schema type might be encountered, for example, if
> a source document has been validated using schema components in a namespace
> that was not imported into the static context. In this case, an implementation
> is allowed (but is not required) to provide an implementation-dependent
> mechanism for determining whether the not-in-scope schema type is derived from
> the expected schema type. For example, an implementation might maintain a data
> dictionary containing information about type hierarchies."
> 
> then it seems to me to make a lot more sense.
 
I don't think this "in-scope", "out-of-scope" distinction adds clarity. Nothing in the specification of our language depends on this distinction. An implementation can choose to make this distinction in its documentation of static typing extensions, or it's documentation of the implementation-defined winged horse extensions.

Jonathan
Comment 12 Michael Kay 2009-03-17 10:50:05 UTC
>To me, the open question is whether the assembled schema is imported, after all schema import, include, redefine, etc. has been done. 

I think it's best to avoid the verb "imported" unless it's very precisely defined. The "import schema" declaration certainly causes the instantiation of a schema (=set of schema components) that many contain components in many namespaces, and the question is whether the ISSD of the module should contain the names of all those components, or only those names that are in the namespace that is the target of the import. The analogy with xs:import and with "import module" suggests that it should only be the names in the target namespace.

>But if the answer were "no", this would not be a statically known type in M1 as defined in the XQuery specification. It's not in the ISSD of the module. I don't think the specification is ambiguous on this.

The specification doesn't actually use the phrase "statically known type", let alone define it. In 2.5.4 it uses "known type" as a synonym for "type present in the ISSD", and the confusion/ambiguity in this bug report arises because of the tacit acknowledgement in the text that the processor might also know about types that are not in the ISSD.

Meanwhile, as highlighted in bug #5738, section 2.2.5 imposes a constraint that if type T is in the ISSD, then all types derived by extension from T, if they appear in a run-time instance, must also be in the ISSD. This is too strong; the section should recognize, as 2.5.4 does, that the processor may know about things even though they are not in the ISSD. (It is also too strong because a processor that doesn't know about all types derived by extension can function perfectly well provided it avoids making inferences on the assumption that it does know about all such types.)
Comment 13 Jonathan Robie 2009-03-17 13:04:33 UTC
(In reply to comment #12)
> >To me, the open question is whether the assembled schema is imported, after all schema import, include, redefine, etc. has been done. 
> 
> I think it's best to avoid the verb "imported" unless it's very precisely
> defined. The "import schema" declaration certainly causes the instantiation of
> a schema (=set of schema components) that many contain components in many
> namespaces, and the question is whether the ISSD of the module should contain
> the names of all those components, or only those names that are in the
> namespace that is the target of the import. The analogy with xs:import and with
> "import module" suggests that it should only be the names in the target
> namespace.

Actually, this is now clear. We resolved bug 5738 (w3c-xml-query-wg/2009Feb/0098.html):

ACTION: Jonathan to add a non-normative note clarifying that the
assembled schema is imported. Goal: 2ed. Mark bug as resolved.

 
> >But if the answer were "no", this would not be a statically known type in M1 as defined in the XQuery specification. It's not in the ISSD of the module. I don't think the specification is ambiguous on this.
> 
> The specification doesn't actually use the phrase "statically known type", let
> alone define it. In 2.5.4 it uses "known type" as a synonym for "type present
> in the ISSD", and the confusion/ambiguity in this bug report arises because of
> the tacit acknowledgement in the text that the processor might also know about
> types that are not in the ISSD.

As I said in an earlier comment, I think that we should use the term "statically known types" rather than just "known types".

There are two well defined ways for a processor to make use of such types:

- static typing extensions for statically known types
- the "winged horse" for dynamically known types

How they do that is implementation defined. We don't have to define that.

> Meanwhile, as highlighted in bug #5738, section 2.2.5 imposes a constraint that
> if type T is in the ISSD, then all types derived by extension from T, if they
> appear in a run-time instance, must also be in the ISSD. This is too strong;
> the section should recognize, as 2.5.4 does, that the processor may know about
> things even though they are not in the ISSD. (It is also too strong because a
> processor that doesn't know about all types derived by extension can function
> perfectly well provided it avoids making inferences on the assumption that it
> does know about all such types.)

I think you are saying that your implementation wants a static typing extension to allow extended types that are not imported into a given module, because you have a data dictionary of some sort that you know the document conforms to. I think you can do that by documenting it as a static typing extension. I don't think we have to change the spec to make that possible.

In your static typing extension, there are statically known types that are not in the ISSD, they are used for static inferences not defined in our specification.
Comment 14 Michael Dyck 2009-04-19 03:28:58 UTC
(In reply to comment #11)
>
> > "The given schema type may be "in-scope" (defined in the in-scope
> > schema definitions), or "out-of-scope" (not defined in the in-scope
> > schema definitions).
> 
> I don't think this "in-scope", "out-of-scope" distinction adds clarity.
> Nothing in the specification of our language depends on this distinction.

I don't understand how you can say this when several errors (XPST0008,
XQST0034, XQST0036, XPST0051, XQDY0084) are raised depending on whether
or not a particular name is defined in the in-scope schema definitions.
Comment 15 Jonathan Robie 2009-09-15 02:00:21 UTC
I propose that we solve this as follows:

1. In 2.5.4 SequenceType Matching, adopt the terms "in-scope" and "not in-scope", similar to what Mike Kay proposed in comment #10:

Old:

  The given schema type may be "known" (defined in the in-scope schema
  definitions), or "unknown" (not defined in the in-scope schema
  definitions). An unknown schema type might be encountered, for
  example, if a source document has been validated using a schema that
  was not imported into the static context. In this case, an
  implementation is allowed (but is not required) to provide an
  implementation-dependent mechanism for determining whether the
  unknown schema type is derived from the expected schema type. 

New:

  The given schema type may be "in-scope" (defined in the in-scope
  schema definitions), or "not in-scope" (not defined in the in-scope
  schema definitions).  An schema type that is not in-scope might be
  encountered, for example, if a source document has been validated
  using schema components in a namespace that was not imported into
  the static context. In this case, an implementation is allowed (but
  is not required) to provide an implementation-dependent mechanism
  for determining whether the not-in-scope schema type is derived from
  the expected schema type.
  

2. In 2.5.4 SequenceType Matching, change the example used for the implementation-dependent mechanism, using one that involves dynamic discovery:

Old:

  For example, an implementation might maintain a data dictionary
  containing information about type hierarchies.

New:

  For example, an implementation might explore the schema that was
  used to validate a document to discover type hierarchies
  dynamically.


3. The existing definition of schema import already says that the element declarations, attribute declarations, and type definitions are imported into the ISSD, and does not give any exceptions. Our resolution of http://www.w3.org/Bugs/Public/show_bug.cgi?id=5738 asked me to add a note clarifying that we intend this to be a transitive schema import:

Existing:


  [Definition: A schema import imports the element declarations,
  attribute declarations, and type definitions from a schema into the
  in-scope schema definitions. For each user-defined atomic type in
  the schema, schema import also adds a corresponding constructor
  function. ]

Add:

  NOTE: Unlike XML Schema's import feature, an XQuery schema import
  imports all the schema components that are in or referenced by the
  identified schema document, irrespective of their namespace.
Comment 16 Michael Dyck 2009-09-27 06:30:29 UTC
>   An schema type

"An" -> "A"

>   that is not in-scope might be encountered, for example, if
>   a source document has been validated using schema components
>   in a namespace that was not imported into the static context.

The spec has very little support for the phrasing "import a namespace". Moreover, its only mention of "schema components" is in appendix F. So I'm not clear on the benefit of changing
    a schema that was not imported
to
    schema components in a namespace that was not imported


> 3. The existing definition of schema import already says that the element
> declarations, attribute declarations, and type definitions are imported into
> the ISSD, and does not give any exceptions. Our resolution of
> http://www.w3.org/Bugs/Public/show_bug.cgi?id=5738 asked me to add a note
> clarifying that we intend this to be a transitive schema import:
> 
> ...
> Add:
> 
>   NOTE: Unlike XML Schema's import feature, an XQuery schema import
>   imports all the schema components that are in or referenced by the
>   identified schema document, irrespective of their namespace.

I believe this is independent of the original issue, and it seems to me that it belongs in Bug 5738. But since it's here...

(a) I think the use of the phrase "the schema document" is incorrect. A schema import identifies a schema, not a schema document. A schema might come from multiple schema documents, or might not come from any schema documents at all.

(b) What about components that are referenced by components that are referenced by the identified schema? To me, the given wording doesn't indicate transitive closure.

-----------

The proposed changes do not actually address the original issue, namely that the  definition of 'derives-from' fails to use defined terms where I believe it should. In fact, as given, change #1 makes things even worse, because the uses of "known" and "unknown" in 'derives-from' would now be undefined. I'm hoping that was just an oversight, and the proposal should have included the following collateral changes:

In the definition of 'derives-from', change:
   "a known type"
   and
   "a schema type found in the in-scope schema definitions"
   to
   "an in-scope type"

and change:
   "an unknown type"
   and
   "a schema type not found in the in-scope schema definitions"
   to
   "a not-in-scope type"
Comment 17 Michael Kay 2009-09-27 20:38:40 UTC
I don't think the proposed Note sufficiently clarifies what "import schema" does. In particular, it leaves the text still completely confused about the distinction between a schema document (which has a location URI and a target namespace) and a schema (which has neither).

I have opened a new bug #7737 to address this specific issue.
Comment 18 Michael Kay 2009-09-27 21:59:58 UTC
I believe a better approach to solving this problem would be along the following lines:

<new>
Some of the rules for SequenceType matching require determining whether a given schema type encountered as a type annotation in an instance document is the same as or derived from an expected schema type. 

This determination is done by reference to a schema S (that is, a set of schema components). This schema S is the union of:

(1) the in-scope schema definitions in the static context of the query module

(2) the schema used for validating the instance document

(3) potentially, further schema components that have been made available to the processor in an implementation-dependent way.

A type error [err:XPTY0004] *may* [or *must*?] be raised if this union does not constitute a valid schema (for example, if there are conflicts between types present in the static context and types used dynamically for validating instances.)
</new>

We can then reduce the definition of derives-from to:

<new>
# derives-from(AT, ET) raises a type error [err:XPTY0004] if either AT or ET is not present in S
# derives-from(AT, ET) returns true AT and ET are both present in S, and if one or more of the following three conditions is true:

   1. AT is the same type as ET 
   2. AT is derived by restriction or extension from ET
   3. S contains some schema type IT such that derives-from(IT, ET) and derives-from(AT, IT) are true.
# Otherwise, derives-from(AT, ET) returns false
</new>

 
Comment 19 Jonathan Robie 2009-09-28 12:56:11 UTC
(In reply to comment #18)
> I believe a better approach to solving this problem would be along the
> following lines:

Yes, I think this is mostly better.


> This determination is done by reference to a schema S (that is, a set of schema
> components). This schema S is the union of:
> 
> (1) the in-scope schema definitions in the static context of the query module
> 
> (2) the schema used for validating the instance document
> 
> (3) potentially, further schema components that have been made available to the
> processor in an implementation-dependent way.


In the past, some implementers have objected requiring (2), because they wanted to be able to determine all type information statically. The main motivation for the "winged horse" proposal was to at least allow an implementation to use (2).

We have much more implementation experience now. 


> A type error [err:XPTY0004] *may* [or *must*?] be raised if this union does not
> constitute a valid schema (for example, if there are conflicts between types
> present in the static context and types used dynamically for validating
> instances.)

We currently cover this under consistency constraints:

http://www.w3.org/TR/xquery/#id-consistency-constraints

The consistency constraints section doesn't give guidelines for what to do in the face of an inconsistency, it simply says that the language is undefined in the face of inconsistencies.

I'm not sure which of three approaches is best:

* must
* may
* defer to the status quo in the consistency constraints section

> We can then reduce the definition of derives-from to:
> 
> <new>
> # derives-from(AT, ET) raises a type error [err:XPTY0004] if either AT or ET is
> not present in S
> # derives-from(AT, ET) returns true AT and ET are both present in S, and if one
> or more of the following three conditions is true:
> 
>    1. AT is the same type as ET 
>    2. AT is derived by restriction or extension from ET
>    3. S contains some schema type IT such that derives-from(IT, ET) and
> derives-from(AT, IT) are true.
> # Otherwise, derives-from(AT, ET) returns false
> </new>

Nice.
Comment 20 Michael Kay 2009-09-29 16:03:35 UTC
A further observation on comment #18: the current text permits statically-unknown run-time types that are derived by restriction from statically-known types, but it does not allow "statically-unknown" run-time types that are derived by extension from statically-known types. This is because static type checking needs to know statically about all types derived by extension from types used in the query. This distinction is not retained in the new text.
Comment 21 Jonathan Robie 2009-10-06 21:21:13 UTC
The Working Group adopted a modified form of this in today's telcon. I'll mark the modifications inline.

(In reply to comment #18)
> I believe a better approach to solving this problem would be along the
> following lines:
> 
> <new>
> Some of the rules for SequenceType matching require determining whether a given
> schema type encountered as a type annotation in an instance document is the
> same as or derived from an expected schema type. 
> 
> This determination is done by reference to a schema S (that is, a set of schema
> components). This schema S is the union of:
> 
> (1) the in-scope schema definitions in the static context of the query module
> 
> (2) the schema used for validating the instance document.

Whether or not (2) is included in S is implementation-defined.

> (3) potentially, further schema components that have been made available to the
> processor in an implementation-dependent way.

(3) is also implementation-defined (not implementation-dependent).

> A type error [err:XPTY0004] *may* [or *must*?] be raised if this union does not
> constitute a valid schema (for example, if there are conflicts between types
> present in the static context and types used dynamically for validating
> instances.)
> </new>

This is MAY.

> We can then reduce the definition of derives-from to:
> 
> <new>
> # derives-from(AT, ET) raises a type error [err:XPTY0004] if either AT or ET is
> not present in S
> # derives-from(AT, ET) returns true AT and ET are both present in S, and if one
> or more of the following three conditions is true:
> 
>    1. AT is the same type as ET 
>    2. AT is derived by restriction or extension from ET

We discussed whether "or extension" should be implementation-defined or required of all implementations, and opinion was more or less evenly divided. 

Some argued that implementations need the freedom to use dynamic schemas, but also need the freedom to optimize. Some argued that this is too confusing for the user, and that an implementation should either use the schemas from (2) and (3) or not, but using them only partially is not a clean model. We agreed to put a note in the document asking for feedback on this point.

>    3. S contains some schema type IT such that derives-from(IT, ET) and
> derives-from(AT, IT) are true.
> # Otherwise, derives-from(AT, ET) returns false
> </new>