This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 4908 - 'The schema corresponding to a schema document' and QName resolution
Summary: 'The schema corresponding to a schema document' and QName resolution
Status: RESOLVED LATER
Alias: None
Product: XML Schema
Classification: Unclassified
Component: Structures: XSD Part 1 (show other bugs)
Version: 1.0/1.1 both
Hardware: Macintosh All
: P2 normal
Target Milestone: ---
Assignee: C. M. Sperberg-McQueen
QA Contact: XML Schema comments list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-08-02 02:27 UTC by C. M. Sperberg-McQueen
Modified: 2008-09-02 11:26 UTC (History)
1 user (show)

See Also:


Attachments

Description C. M. Sperberg-McQueen 2007-08-02 02:27:54 UTC
The Schema Representation Constraint: QName resolution (schema
document) says in clause 1 that for a given QName in a schema document
to resolve to a component, it must be true that:

    That component is a member of the value of the appropriate
    property of the schema which corresponds to the schema document
    within which the ·QName· appears ...

Some problems exhibited by this passage may need attention.

First, in this context, "the schema which corresponds to the schema
document" clearly denotes the schema within which the reference is to
be resolved; typically that will be the schema currently being
constructed.  And typically (or necessarily), that schema will include
components for all the source declarations of the schema document in
which the QName occurs.

But the use of the article "the" suggests wrongly that there is just
one such schema, whereas in fact there are an infinite number of such
schemas.

So the word "the" in the phrase "the schema which ..." should probably
be removed.

Second, the notion of "the schema which corresponds to the schema
document" is also used pervasively in section 4 of Structures, where
however it appears to mean not "the current schema" but specifically
"the schema containing the components declared by the source
declarations in this schema document, plus those declared in documents
included or redefined ... from this one".  (Alternatively, in section
4 the phrase does mean "the current schema", and all of the schema
representation constraints that say things like

    The XSDL schema corresponding to <schema> contains not only the
    components corresponding to its definition and declaration
    [children], but also all the components of all the XSDL schemas
    corresponding to any <include>d schema documents.

are all vacuous, because they amount to saying that the schema being
constructed contains components identical to those in the schema being
constructed.)

In the QName resolution rule, however, "the schema which corresponds
to the schema document" cannot have this restricted sense.  It is not
(I think) required that the QName denote a component declared in the
same schema document, or in some schema documents included (or
redefined or imported) by the current schema document (or by any other
schema document in the transitive closure of the
schema-document-reference relation.  That would make cross-document
component reference much harder than it should be.  (Alternatively, I
have misunderstood, QNames ARE supposed to resolve only if declared in
the same schema document or an included/redefined/imported schema
document, and cross-document component reference is only allowed in
those circumstances.)

Unless I have misunderstood our spec badly, the phrase is correct (if
a little underspecified) in the context of section 4, and out of place
in the QName resolution rule.

Third, even in the context of schema composition, our spec does not
uniquely determine a single schema corresponding to a given schema
document: the schema is a function of (a) a given set of schema
documents that the process may start with, (b) the policy adopted for
following inter-schema-document references, (c) the availability, to
the validation process, of material at the locations where material is
sought, at the moment when it is sought (network outages and
restricted access policies might both affect this).  There may be
other factors.

So again the word "the" entails guarantees on which XSDL is not
prepared to make good.  Either we need to specify conditions or
simplifying assumptions which make the result of schema construction
unique, or else we need to avoid making the claim (implicit in
"the") that the result is unique or deterministic.

Fourth, I'll just note in passing that the function of this rule is to
specify what is denoted by a QName in a schema document, interpreted
as referring to a component in a given schema.  I have trouble
understanding how it could be understood to be a constraint, or what
it is intended to constrain.  (The QName? the schema? the schema
document? Huh?  It might be taken to constrain the processor, but
nothing in the text mentions the processor or expresses a rule which a
processor could follow or not follow.)  Like many of the paragraphs
labeled "constraint" in XSDL, this one specifies the meaning of a
construct and not a constraint.
Comment 1 C. M. Sperberg-McQueen 2007-08-03 18:29:33 UTC
On its telcon today, the Working Group discussed this and other
recently opened issues in the issues database and concluded (not
without some pangs of regret) that for scheduling reasons it is not
feasible for us to resolve this issue, or any of the others in the
group, before we go to Last Call.

On whether the issue / proposal discussed here is worth pursuing or
not, the WG has taken no formal decision. Accordingly I am closing
this issue with a disposition of LATER, not WONTFIX.  That means the
Working Group believes that the issue may be resolved in some future
version of the spec, and encourages whatever Working Groups are
responsible for future versions of the spec to consider this issue
at an appropriate time.  (If this bug relates both to 1.0 and 1.1,
this resolution applies only to 1.1 and leaves undetermined how to
handle it vis-a-vis 1.0.)
Comment 2 Michael Kay 2008-03-19 11:30:16 UTC
This bugzilla entry seems an appropriate place to record a report from a user today in which different schema processors were producing different results. The scenario is that the instance document references A, which imports B and C; B redefines D, while C includes D without redefinition.

Xerces and MSXML both reject this on the basis that the resulting schema contains duplicate components with the same name (presumably the redefined and unredefined versions). Saxon-SA accepts it on the theory that the redefinition is pervasive: all references to components in D througout the whole schema are treated as references to the post-redefinition versions of the components. However, it must be said that there is nothing in "Schema Representation Constraint: Redefinition Constraints and Semantics" that appears to back up this interpretation of the requirement for pervasiveness, unless one applies a very imaginative interpretation of the phrase "the schema correponding to [schema document] SII'".

Which is why I've added it to this bugzilla entry. I simply don't believe that this phrase is compatible with the goal of late binding, let alone the goal of pervasive redefinition. It's based on an incorrect belief that import and include work by converting a schema document to a set of schema components and then adding those components to the final schema. It can't work that way, because the included/imported document can be incomplete, by virtue of references to components that aren't present in the included document or its recursively included documents.
Comment 3 Michael Kay 2008-09-02 11:26:15 UTC
Further to comment #2, another situation has arisen in the field: this time it appears that Xerces and XSV accept the schema, while Saxon rejects it.

To simplify the situation, schema document r.xsd contains two xs:redefine elements: first redefining a.xsd, then b.xsd. Schema document a.xsd includes b.xsd. The second redefine includes a redefinition of a type T that is defined in b.xsd.

Saxon rejects this on the following basis. The first redefine constructs a schema SII' correponding to a.xsd, which includes the original type T, and then replaces some of these components with redefined components. The second redefine constructs a schema SII' corresponding to b.xsd, replacing the original type T with a different type T. The schema corresponding to r.xsd is the union of these two schemas, and this union is not a valid schema because it contains two different types named T.

I think that Saxon is strictly following the rules as written in the spec, though it clearly does not have the desired outcome that redefinition should be "pervasive". However, to make this work as desired, without also allowing things that are clearly wrong (such as both redefines containing different redefinitions ot T) it is necessary to invent some rules which bear no resemblance to anything written in the specification.