This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 12185 - Conditional Type Assignment and substitutability
Summary: Conditional Type Assignment and substitutability
Status: CLOSED FIXED
Alias: None
Product: XML Schema
Classification: Unclassified
Component: Structures: XSD Part 1 (show other bugs)
Version: 1.1 only
Hardware: PC Windows NT
: P2 normal
Target Milestone: ---
Assignee: David Ezell
QA Contact: XML Schema comments list
URL:
Whiteboard:
Keywords: decided
Depends on:
Blocks:
 
Reported: 2011-02-25 13:33 UTC by Michael Kay
Modified: 2011-07-28 19:13 UTC (History)
2 users (show)

See Also:


Attachments

Description Michael Kay 2011-02-25 13:33:41 UTC
We have an "urgent feedback request" open on the rules concerning Conditional Type Assignment and substitutability. This bug report responds to this request for feedback.

First, note that there are no rules that constrain the use of elements with conditional types in substitution groups. The following (test cta0041) is entirely legal: <appendix> is substitutable for <chap> despite the fact that in <chap type="dateTime"> the value is guaranteed to contain a timezone, while in <appendix type="dateTime"> there is no such guarantee:

         <xs:complexType name="dateWithTypeAttribute">
            <xs:simpleContent>
                <xs:extension base="xs:date">
                    <xs:attribute name="type" type="xs:NCName"/>
                </xs:extension>
            </xs:simpleContent>
         </xs:complexType>
                 
         <xs:complexType name="dateTimeStampWithTypeAttribute">
            <xs:simpleContent>
                <xs:extension base="xs:dateTimeStamp">
                    <xs:attribute name="type" type="xs:NCName"/>
                </xs:extension>
            </xs:simpleContent>
         </xs:complexType>
         
         <xs:complexType name="dateTimeWithTypeAttribute">
            <xs:simpleContent>
                <xs:extension base="xs:dateTime">
                    <xs:attribute name="type" type="xs:NCName"/>
                </xs:extension>
            </xs:simpleContent>
         </xs:complexType>
  
  <xs:element name="chap">
    <xs:alternative test="@type='date'" type="dateWithTypeAttribute"/>
    <xs:alternative test="@type='dateTime'" type="dateTimeStampWithTypeAttribute"/>
  </xs:element>
  
  <xs:element name="appendix" substitutionGroup="chap">
    <!-- invalid restriction of the base type (not detected until validation time) -->
    <xs:alternative test="@type='date'" type="dateWithTypeAttribute"/>
    <xs:alternative test="@type='dateTime'" type="dateTimeWithTypeAttribute"/>
  </xs:element> 

So, given a complex type definition B that allows <chap> within its content, it is perfectly acceptable for an <appendix> to appear as a substitute for <chap>.

However, if I define a complex type R that restricts B, then within the content of an element whose declared type is R, it seems I can no longer substitute an arbitrary <appendix> for a <chap>: I am liable to fall foul of the "Conditional Type Substitutable" rules.

Firstly, it seems illogical to impose a restriction on the second case but none on the first. We should either require CTA type tables to be consistent in both cases, or in neither.

Secondly, the current rule is very tough to implement without imposing a significant overhead on run-time validation performance; and it's very unsatisfactory that problems with the schema should be detected during instance validation.

I would argue in favour of removing the restriction entirely. The impact of this would be that processors can no longer draw inferences by examining the type table: when the type table says that an event with location="GB" will have a time-with-timezone, they cannot take this as an invariant that applies to all subclasses. If you want to make this an invariant, you need to make it an assertion. Making this change would remove a lot of complexity with (in my view) very little loss of useful functionality.

An alternative which I could also accept would be to insist that when subtyping, the type tables must be the same. This brings the rule into line with Element Declarations Consistent, where we appeal to "sameness" of type tables. This is a bigger loss of functionality, but in practice I find it hard to see many people being affected by it.
Comment 1 David Ezell 2011-02-28 13:50:17 UTC
From the telcon 2001-02-25:
The WG spent quite a bit of time attempting to come to a resolution.  At the end of the telcon MSM accepted an action to attempt to create test cases for xs:override.
Comment 2 David Ezell 2011-04-01 16:10:02 UTC
Note:  comment #1 is on the wrong bug.  Ignore.
Comment 3 David Ezell 2011-04-01 16:33:52 UTC
RESOLUTION: produce a wording proposal based on the penultimate paragraph of the initial bug report.
Comment 4 C. M. Sperberg-McQueen 2011-04-22 00:10:06 UTC
To make my comments a little easier to follow, I'll break them up
into smaller units.  First, on the premise.  The bug description
says

    So, given a complex type definition B that allows <chap> within
    its content, it is perfectly acceptable for an <appendix> to
    appear as a substitute for <chap>.

    However, if I define a complex type R that restricts B, then
    within the content of an element whose declared type is R, it
    seems I can no longer substitute an arbitrary <appendix> for a
    <chap>: I am liable to fall foul of the "Conditional Type
    Substitutable" rules.

I don't think so.  

You don't say so explicitly, but I understand you to be thinking of
a pair of complex types B and R each of which has a content model
including the top-level 'chap'.  But in that case, the Conditional
Type Substitutable rules will have no objection.

[Details, for those who feel obligated to check my work: In type B,
the type table for any element named 'chap' (T_B in the validation
rule) will be the type table specified in the top-level declaration
for 'chap'; in type R, the responsible type table (T_T) will be the
same type table.  Whatever 'chap' element is presented to that type
table will get the same type both in the B context and the R
context, so the S_T and S_R mentioned in the validation rule will be
the same type definition.  Since by hypothesis R is a restriction of
B, clause 2.2 fails on its first condition, so clause 2.1 must be
satisfied; it requires that S_T by validly substitutable as a
restriction for S_B.  After a short trip through the definitions for
"validly substitutable as a restriction" and "validly substitutable"
subject to a set of blocking keywords, we fetch up eventually at
Schema Component Constraint: Type Derivation OK (Complex) in section
3.4.6.5, which tells us (if we are patient and determined) that any
type definition counts as being validly derived from itself.
Popping the stack of questions and definitions, this turns out to
mean that any type is validly substitutable for itself, and in
particular S_T is validly substitutable for S_B, given that they are
the same type.

The same argument applies to any element named 'app':  since B and R
use the same top-level element declaration for 'app', any 'app'
element would be assigned the same governing type definition in R as
in B, and there is no violation of restriction in that.]

This does not address the question of the run-time checking being
hard on performance or of the current rule being overnice, only
the proposition that the spec imposes restrictions on the use of
substitution groups in restrictions that it does not impose on their
use in the base type.
Comment 5 C. M. Sperberg-McQueen 2011-04-22 00:51:25 UTC
A second comment, on a side point.  The bug description says 

    it's very unsatisfactory that problems with the schema should be
    detected during instance validation.

It may be unsatisfactory to find schema errors at instance
validation time (it doesn't bother me much, but I agree that it does
bother some intelligent observers), but violations of Conditional
Type Substitutable are not defined as problems with the schema; they
lead to the determination that the instance is invalid, not to the
determination that the schema is non-conforming.

This is pointed out by the note at the end of 3.4.6.3.

The bug description may be taking the view that the 'real' problem
is in the schema, not the instance, and that by making the
constraint affect instance validity instead of schema conformance
the WG was simply lying to itself.  But that presupposes some clear
accepted rule for deciding what problems are schema problems and
what problems are instance problems; if the WG has ever found
consensus on such a rule, I don't remember it.  Many things in the
spec might be clearer and cleaner if we had.  But in the absence of
such a rule, I think this is an appeal to a Platonic reality of
schemas that is not accessible to most of the WG, let alone to
readers of our spec.
Comment 6 C. M. Sperberg-McQueen 2011-04-22 01:04:22 UTC
A third comment (preparatory to actually drafting the wording
requested by the WG), concerning the relation of the proposed change
to our definition of restriction.

In XSD 1.0 the WG experimented with a set of rules that essentially
supplied an algorithm for checking restriction; in the aftermath we
discovered that the algorithm was flawed in various ways, and we
encountered difficulty understanding the algorithm well enough to
modify it reliably.  In XSD 1.1, the WG replaced the constructive
rules for restriction checking with somewhat higher-level rule which
essentially requires that a restriction:

  (a) count things as locally valid only if they are locally valid
      against the base type, and

  (b) associate types with all attributes and children which are
      subsumed by those associated with those attributes or children
      under the base type.  

      At a first approximation, one can think of 'associating' a
      type with a child element or attribute as assigning the type
      to it, but the story is complicated by (1) skip and lax
      wildcards and (2) the difference between the declared type of
      an item and its governing type: the rules of complex type
      restriction ignore the effect of xsi:type and the effect of
      conditional type assignment.

Rule (a) guarantees that the restriction won't allow sequences of
children or sets of attributes not allowed by the base type; rule
(b) guarantees that the type of a child or attribute won't be
broadened if the enclosing complex type is restricted.

The explicit statement of what guarantees restriction is supposed to
make are given in 3.4.6.4 Content Type Restricts (Complex Content),
which I'll call CTRCC from now on.

I do not believe remember that we explicitly thought about
conditional type assignment when we drafted CTRCC and the rest of
the section containing it. I'm almost certain that we didn't,
because the core of the new rules was drafted in 2005, while
conditional type assignment was not added until 2007.  Similarly, I
do not believe that we explicitly thought about the constraint
Content Type Restricts when we worked on conditional type assignment
and the Conditional Type Substitutable (CTS) rule.  If we had, we
either would have tightened CTRCC to cover selected types, not just
declared types, or else we would have realized that CTS was
unnecessary, because "the usual principles of complex type
restriction" as outlined by CTRCC do not require it.

Essentially, it now appears to me in retrospect that when the WG
first addressed the issue that led to CTS being drafted, the
question was raised in a form that presupposed that the rules in
CTRCC were more stringent than in fact they are, and that we needed
to do something special with condition type assignment, in order to
avoid violating XSD 1.1's new story about restriction.  But that
presupposition was a false one, and it was an error to allow it to
be smuggled into our work and taken as a requirement without
examination or debate.

When this bug was first opened, I was worried about the possibility
that in loosening the rules for conditional type assignment in the
context of complex type restriction of the parent, we might be
blasting a hole in the 1.1 rules governing restriction.  Having
studied CTRCC and the contexts which refer to CTS, I now think that
the change proposed will not break our rules for restriction or
require any change.  

If we make the change, restriction will not guarantee that the
governing type of any item I in a restriction will be identical to
or a restriction of the type I would have in the base type (in the
cases where both the base and the restriction assign a type to I);
that guarantee will be made of the declared type, instead.  That
guarantee has *never* been made for the governing type; it has
*always* been made only for the declared type of I.
Comment 7 C. M. Sperberg-McQueen 2011-04-22 01:41:09 UTC
A fourth comment, intended solely to offer an example illustrating
the class of schemas and instances affected by the proposed change.

The details will be tedious for some; sorry.  If the example proves
anything it is only that there ARE some schemas and instances which
will be affected by the change.  It also suggests but does not prove
that the class of affected schemas is not a large one: it consists
of schema with two complex types B and R, each defining a local
element E, in such a way that the declared type of B/E is
substitutable for R/E as a restriction, with both local elements E
using conditional type assignment but doing so in different ways 
which don't guarantee substitutability of the selected types.

To create an example in which the Conditional Type Substitutable
rule will object to a type definition, we must postulate two element
declarations with the same name and different type tables.  For
example (to avoid confusion, I'll use the name 'zap' and assume the
existence of the various ___WithTypeAttribute types shown in the
description):

 <xs:complexType name="B">
    <xs:sequence maxOccurs="unbounded" minOccurs="0">
      <xs:element name="zap">
        <xs:alternative test="@type='date'" 
          type="dateWithTypeAttribute"/>
        <xs:alternative test="@type='dateTime'"
          type="dateTimeStampWithTypeAttribute"/>
      </xs:element>
    </xs:sequence>
  </xs:complexType>
  
  <xs:complexType name="R">
    <xs:complexContent>
      <xs:restriction base="B">
        <xs:sequence maxOccurs="1991" minOccurs="991">
          <xs:element name="zap">
            <xs:alternative test="@type='date'" 
              type="dateWithTypeAttribute"/>
            <xs:alternative test="@type='dateTime'"
              type="dateTimeWithTypeAttribute"/>
          </xs:element>
        </xs:sequence>        
      </xs:restriction>
    </xs:complexContent>
  </xs:complexType>

The Conditional Type Substitutable (CTS) rule will fail to be
satisfied for any 'zap' child of an element with type tns:R, if (and
only if) the element has type='dateTime'.  Applying the CTS rule to
such an element information item, we have

  E = <tns:zap type='dateTime'>...</tns:zap>
  T = tns:R  // enclosing type

  B = tns:B // base of T
  T_T = sequence of:
           /type::tns:R/schemaElement::tns:zap/alternative::*[1]
           /type::tns:R/schemaElement::tns:zap/alternative::*[2]
        // i.e. the type table used for zap in an instance of tns:R
  T_B = sequence of:
           /type::tns:B/schemaElement::tns:zap/alternative::*[1]
           /type::tns:B/schemaElement::tns:zap/alternative::*[2]
        // i.e. the type table used for zap in an instance of tns:B
  S_T = tns:dateTimeWithTypeAttribute
  S_B = tns:dateTimeStampWithTypeAttribute

Clause 1 is not satisified, so clause 2 must be true.

Clause 2.2 is not satisfied, so 2.1 must be true.

Clause 2.1 requires:

  - T.{derivation method} = restrction (ok)
  - S_T is validly substitutable for S_B (see below)
  - E and B together satisfy CTS (ok)

S_T (i.e. tns:dateTimeWithTypeAttribute) is not validly
substitutable as a restriction for S_B
(tns:dateTimeStampWithTypeAttribute), though.

In consequence, under the status quo the parent element of the zap
elements will be invalid, because the parent has been too permissive
with the children and has allowed them to do things its base type
would not have allowed.  (O tempora! O mores!)

If we eliminate CTS, the zap elements in question will no longer
cause their parent to be invalid.  The only relevant question will
be: does the content type of R restrict the content type of B
according to Schema component constraint: Content type restricts
(Complex Content) (CTRCC)?

The requirements of CTRCC are two:

  1 Every sequence of element information items which is ·locally
    valid· with respect to R is also ·locally valid· with respect to
    B.

True.  R accepts anywhere between 991 and 1992 zap elements; B
accepts zero or more.

  2 For all sequences of element information items ES which are
    ·locally valid· with respect to R, for all elements E in ES, B's
    ·default binding· for E ·subsumes· that defined by R.

Also true.  The default binding of any zap element in R is the local
element declaration /type::tns:R /schemaElement::tns:zap, and in B
it's /type::tns:B /schemaElement::tns:zap.  Since both are element
declarations, they must satisfy clause 4 of the definition of
subsumption:

  4.1 Either G.{nillable} = true or S.{nillable} = false.

True by first disjunct.

  4.2 Either G has no {value constraint}, or it is not fixed, or S
      has a fixed {value constraint} with an equal or identical
      value.

True by first disjunct.

  4.3 S.{identity-constraint definitions} is a superset of
      G.{identity-constraint definitions}.

True (vacuously).

  4.4 S disallows a superset of the substitutions that G does.

True (vacuously).

  4.5 S's declared {type definition} is ·validly substitutable as a
      restriction· for G's declared {type definition}.

True: xsd:anyType is validly substitutable as a restriction for
xsd:anyType.

Having examined the example, I do not find myself unhappy with the
proposed change; I hope others agree.
Comment 8 Michael Kay 2011-04-23 18:24:16 UTC
Responding to comment #4, in particular:

Q1: You don't say so explicitly, but I understand you to be thinking of
a pair of complex types B and R each of which has a content model
including the top-level 'chap'.  But in that case, the Conditional
Type Substitutable rules will have no objection.

A1: I'm not entirely sure what I was thinking of, but it might have been this: a complex type B that allows a sequence of chap elements whose element declaration is akin to that of element chap in my original example, and a complex type R that allows a sequence of chap elements whose element declaration (necessarily local) is akin to that of element appendix in my original example. I think this is the situation in which (under the status quo) we discover at "validation time" that R is not a valid restriction of B. But I might be wrong - I'm writing this without re-reading the rules, which is probably not a good idea.

Q2: A second comment, on a side point.  The bug description says 

    it's very unsatisfactory that problems with the schema should be
    detected during instance validation.

It may be unsatisfactory to find schema errors at instance
validation time (it doesn't bother me much, but I agree that it does
bother some intelligent observers), but violations of Conditional
Type Substitutable are not defined as problems with the schema; 

A2: I'm not saying that the spec says it's a problem with the schema. I'm saying that in practice, it's going to be the schema that has to be fixed. Just as if you define an assertion that's always false, the schema is technically correct but in practice unusable; anyone who discovers this is going to complain to the author of the schema. I'm concerned with the practicality, not with the letter of the law.
Comment 9 Sandy Gao 2011-04-25 16:01:48 UTC
1. Substitution group vs. Content type restriction

Suggestion: no change.

> Firstly, it seems illogical to impose a restriction on the second case but 
> none on the first. We should either require CTA type tables to be consistent
> in both cases, or in neither.

The treatments of CTA for these 2 cases are different. Sub-group only require consistency between the declared type, while content type restriction requires the type tables to satisfy certain rules.

This doesn't *feel* right. But there are many other aspects that are different between the 2 cases. For sub-group, the only thing required is consistency between declared type, and all other aspects about an element declaration can be different (block, default, fixed, nillable ...). But for content type restriction, we make sure that the *content* of the element in the restriction type is also valid against the element in the base type.

I don't know why there is this difference (its history predates my involvement in the WG), but it's been like this for 10 years. I don't think now is the right time to revisit and change this. So I suggest we leave sub-group out of this discussion.

Also observe that sub-group allows extension, but complex type restriction (obviously) doesn't.

2. CTA and content type restriction

Suggestion: need a rule for CTA.

The goal of content type restriction (as I understood it) has always been to make sure a sequence of EIIs valid against the restriction type is also valid against the base type. Note that this is not only about local validity (the sequence of QNames is allowed), but also deep validity (value/content of the child elements), which is why we made sure they have compatible type, nillable, fixed value, IDC, etc.

Again, it's been like this for 10 years, and I don't think the introduction of CTA should break that.

From comment #6:

> Similarly, I
> do not believe that we explicitly thought about the constraint
> Content Type Restricts when we worked on conditional type assignment
> and the Conditional Type Substitutable (CTS) rule.  If we had, we
> either would have tightened CTRCC to cover selected types, not just
> declared types, ...

I believe we did consider CTR. Among the proposals from 2007, some were to change CTR to ensure the "selected types" are consistent. For example (member-only link)

http://www.w3.org/XML/Group/2004/06/xmlschema-1/structures.cta.pf.200706.html#loc-testSubP

> ... restriction will not guarantee that the
> governing type of any item I in a restriction will be identical to
> or a restriction of the type I would have in the base type ...
> That guarantee has *never* been made for the governing type; it has
> *always* been made only for the declared type of I.

True. But we have never needed to *make* that guarantee for the governing type; it has always been true. Before CTA, same declared type implies same governing type, because xsi:type was the only mechanism to make the governing to be different from the declared.

3. What rule to enforce restriction then?

There are different options mentioned at different times. From 2007 Pisa F2F [1] and MikeK's email [2], I think the following are relevant:

1) The 2 type tables must be equivalent
2) The 2 type tables have the same length, and for the corresponding entries, the XPath tests must be the same, and the type in the restriction type table restricts that in the base type table.
3) The base type table must be a prefix of the restriction type table, and any additional entries in the restriction type table have a type that restricts the base default type.
4) Combines #2 and #3
5) Status Quo: runtime rule.

[1] http://www.w3.org/XML/Group/2007/06/xml-schema-ftf-minutes#cta-problem
[2] http://lists.w3.org/Archives/Member/w3c-xml-schema-ig/2011Feb/0008.html

We chose #5 in Pisa. Given the recent discussions about its complexity, we may need to look at some of the other alternatives.
Comment 10 C. M. Sperberg-McQueen 2011-04-25 16:29:30 UTC
The wording proposal requested by the WG in comment 3 is now on the W3C server at

  http://www.w3.org/XML/Group/2004/06/xmlschema-1/structures.b12185.html
  (member-only link)

Since Sandy Gao reminds me that the WG did not actually have phase-1 consensus on the issue but asked for a written proposal to help focus the discussion, I'm marking the issue needsAgreement, not needsReview.
Comment 11 David Ezell 2011-05-20 15:15:51 UTC
RESOLUTION: use Sandy's option #1 in Comment 9 and instruct the editors to resolve the bug that way.

MSM: I'd like to register my disappointment that we can't do better than that.
Comment 12 C. M. Sperberg-McQueen 2011-06-03 00:18:06 UTC
A revised proposal for this issue, which adds a requirement for type-table equivalence (sic!) to the definition of subsumption for default bindings and deletes the validation-rule-related apparatus for conditional type substitutability, is on the W3C server at

  http://www.w3.org/XML/Group/2004/06/xmlschema-1/structures.b12185.bis.html
  (member-only link)

This is an attempt to execute the decision made by the WG majority in the meeting of 20 May 2011.
Comment 13 Sandy Gao 2011-06-03 13:57:41 UTC
Re: the proposal in comment #12. I wonder if the new rule needs to consider cases where the {type table} is absent. That is, change

4.6 S.{type table} is ·equivalent· to G.{type table}.

to

4.6 S.{type table} and G.{type table} either are both ·absent· or are both present and ·equivalent·.
Comment 14 C. M. Sperberg-McQueen 2011-06-03 14:55:20 UTC
I resist the change suggested in comment 13 because it takes a simple sentence that is easy to understand, doubles its length, and makes it harder to follow.  If a change of that kind is necessary, it's a fairly clear indication that we have done a poor job of identifying the appropriate primitive notions, and the correct solution is to do better.  Syntactic convolution is often a sign of inadequate design work.  

I think it follows from the definition of type table equivalence that if T1.{type table} is present and T2.{type table} is absent, they are not equivalent.  It ought to be obvious that if neither type table exists, the rule is satisfied, but XSD's attitude to null values is so poorly thought through that I think SG is right that  it's not obvious and needs to be stated explicitly.

Counter-proposal (still unnecessarily complicated and a sign of half-baked design): 

4.6 S.{type table} is ·equivalent· to G.{type table}, if either ·present·.
Comment 15 Michael Kay 2011-06-03 15:11:18 UTC
re "4.6 S.{type table} is ·equivalent· to G.{type table}, if either ·present·."

I find it hard to swallow a statement that something that doesn't exist can have properties and relationships, such as equivalence to something else that doesn't exist.

The statement works if "S.{type table}" is a reference to the property, but usually in our specs "S.{type table}" is a reference to the value of the property. The properties are equivalent but the type tables aren't, because there are no type tables. I prefer Sandy's wording.
Comment 16 David Ezell 2011-06-03 15:28:05 UTC
RESOLVED: adopt the proposal as ammended in comment 13.
Comment 17 David Ezell 2011-07-28 15:55:55 UTC
This bug should be resolved in the CR at:
http://www.w3.org/TR/xmlschema11-1/

The WG appreciates the effort of the commenter in reporting this bug.  Please indicate your satisfaction with the resolution by marking it as CLOSED.

Thank you.