This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 5157 - 3.4.2 example unclear
Summary: 3.4.2 example unclear
Status: RESOLVED FIXED
Alias: None
Product: XML Schema
Classification: Unclassified
Component: Structures: XSD Part 1 (show other bugs)
Version: 1.1 only
Hardware: PC Windows XP
: P2 normal
Target Milestone: ---
Assignee: C. M. Sperberg-McQueen
QA Contact: XML Schema comments list
URL:
Whiteboard: clarification cluster
Keywords: resolved
Depends on:
Blocks:
 
Reported: 2007-10-08 20:02 UTC by John Arwe
Modified: 2008-03-19 19:09 UTC (History)
0 users

See Also:


Attachments

Description John Arwe 2007-10-08 20:02:47 UTC
The final example in 3.4.2 "Without the specification of the notQName attribute, the restriction might or might not be valid, depending on whether the schema has a top-level declaration for speaker."

The example text states that the restriction would be valid if a top level decl for speaker exists (I assume top level decl == GED).  If there was a GED for speaker, then the computer CTD's speaker decl would be in a different (local) symbol space than the GED since the CTD has name= not ref=.  Is the definition of notQname telling me that it violates the partitioning of symbol spaces by treating both the local and global as "the same"? Is it telling me that the local def of speaker would still be allowed because the wildcard notQname would match ONLY the GED?

It also states that in the absence of a top level decl for speaker the restriction would not be valid because of differences in the default bindings.  Specifically it says that the base type computer maps speaker elements in the input to a ·default binding· consisting of the element declaration for speaker, and goes on to assert that this is something other than ·xs:anyType·.  Applying 3.3.2 tableau case 2 (complex type ancestor) to the example declaration, I think its {type definition} would be ·xs:anyType· according to rule 4 (1 fails due to no children, 2 fails no type, 3 fails no subs group).  So I am not seeing in this case why the default bindings are unequal.
Comment 1 C. M. Sperberg-McQueen 2008-01-04 00:49:05 UTC
Thank you for the comment.  The example and its accompanying text
(in section 3.4.2) clearly needs to be revised to address the
obscurities and confusions you identify in it.

You ask 

    Is the definition of notQname telling me that it violates the
    partitioning of symbol spaces by treating both the local and
    global as "the same"?  Is it telling me that the local def of
    speaker would still be allowed because the wildcard notQname
    would match ONLY the GED?

It's not immediately clear to me how best to change the current text
to deal with these questions.  The notQName attribute on (the source
declaration of) a wildcard identifies a set of expanded names, with
the meaning that elements bearing those expanded names cannot be
attributed to / matched up with the wildcard; I'm experiencing
difficulty formulating any sentence describing the relation of the
QNames listed in the attribute to the concept of symbol spaces,
because they seem so fundamentally unconnected to me as to verge on
the non-comparable.  (This is not helpful, I realize, as an answer
to the question "How do the QNames in the notQName attribute relate
to the names in the various symbol spaces?", which I take to be
implicit in your comment.  I'm going to have to work on is.  It's
possible that the analysis of content-model matching would be
cleaner if we did bring the notion of symbol spaces to bear more
explicitly on the issue of matching input elements to wildcards.)

But with the caveat that it doesn't seem natural to say it this way,
I suppose the answer to the first question you ask explicitly is
"yes, the expanded names associated with a wildcard are not
associated a priori with any particular symbol space", even though
when elements match a wildcard, governing declarations are sought
only among top-level (global) element declarations.

I don't understand the second question well enough to venture an
answer.

By way of trying to establish some common ground, though, consider
the alternative restriction

 <xs:complexType name="computer2">
  <xs:complexContent>
   <xs:restriction base="computer">
    <xs:all>
     <xs:element name="CPU"/>
     <xs:element name="memory"/>
     <xs:element name="monitor"/>
     <!-- Any additional information about the computer -->
     <xs:any processContents="lax"/>
    </xs:all>
   </xs:restriction>
  </xs:complexContent>
 </xs:complexType>

Like the base type 'computer', this restriction 'computer2' accepts
the sequence

  <CPU/><memory/><monitor/><speaker/>

and binds the first three elements to the local element declarations
for the elements of those names.  It does not bind the fourth child
to any element declaration at all.  (The base type 'computer', by
contrast, binds all four elements to the local declarations scoped
to /type::computer.)

Two observations about this alternative restriction seem relevant:

  (1) It is unlikely to satisfy a schema author who wants
      instances of type 'computer2' not to have 'speaker' elements.
      That's why we added the example we're discussing: to show how
      to get rid of local elements like 'speaker'.

  (2) It provides less information about the instance than does
      the base type.  (The base type, to be sure, assigns the
      'speaker' element to xs:anyType, so the restriction does not
      accept instances the base type would not have accpeted.  But
      instead of a binding to a local element declaration, we get no
      binding at all.  I hope it's clear that in some significant
      sense, 'computer2' provides less information than 'computer'.
      The type 'computer' thus fails to 'subsume' the type
      'computer2' in the fundamental sense: No thing X can subsume
      any thing Y if X contains / conveys / captures more
      information than Y does.  (The specific technical definition
      of subsumption we offer is merely an attempt to operationalize
      that basic concept for purposes of XSDL.)

On the question of default bindings, you are quite right that any
'speaker' element appearing as a child of an element with type
'computer' will have xs:anyType as its declared type definition, and
it's natural to ask, in that case, why it doesn't subsume a default
binding of xs:anyType.

There are two problems here.  

First, I fear that you have been betrayed by the plausible
assumption that 'default binding' denotes the declared type
definition of whatever element declaration governs an element
instance.  Plausible assumption, but not the case: the term 'default
binding' denotes something else, the possible values of which are a
confusing mishmash of element declarations, attribute declaration +
optional value constraint pairs, or the keywords 'strict', 'lax',
and 'skip'.

If I could think of a way to make the concept less ad hoc, I would
propose it in a heartbeat.  But thus far, I have failed to make much
of a dent.

Second, the text is faulty.  The definition of 'default binding',
whatever its flaws, does at least make clear that 'xs:anyType' is
not a possible value for the default binding of anything.  (The
prose accompanying the example reflects an earlier state of the
spec, in which the concept now called 'default binding' was referred
to as 'Test[ES,P]' and xsd:anyType figured among its possible
values.)

Not only is xsd:anyType not a possible default binding, but the type
'quietComputer' does not in fact provide any default binding at all
for elements named 'speaker': complex types provide default bindings
for children on if the sequence of children is locally valid against
the type.  No sequence of children containing a 'speaker' element is
locally valid against 'quietComputer', so no such sequence gets
default bindings.

It's not clear to me, at this point, whether the prose is correct to
say "if there is a top-level declaration for 'speaker', the
restriction is valid, but if there isn't one, it's not valid".
Certainly the reasoning given is bogus.

The relevant "if X, then ..., otherwise ..." construct may be:

  If the notQName attribute is supplied as shown, the restriction is
  valid.  If it were omitted, then the restriction might be valid or
  invalid: invalid if there is no top-level declaration for
  'speaker', because then the default bindings for 'speaker'
  elements in the input would be the local element declaration (for
  'computer') and the keyword 'lax' (for 'quietComputer'), valid if
  there is a top-level declaration for 'speaker' that is subsumed by
  the local declaration in 'computer' (see clause 4 of the
  definition of 'subsume').  

In practice that means the restriction would be valid unless the
top-level declaration for 'speaker' had identity constraints,
disallowed substitutions, or a type whose derivation from
xsd:anyType involves any extension steps, list construction, or
union construction.  Nillability and value constraints can also make
restrictions like this one invalid, but it's clear from what is
shown in the example that they cannot affect the validity of this
restriction.
  
The upshot is that although on the face of it the comment here looks
as if it ought to be editorial, the technical details are
problematic enough that this should probably be discussed by the
Schema WG.  So I'm marking this needsAgreement, rather than
editorial.

(While I'm here, I should point out that the example seems slightly
confusing to me, because as a reader I can't help thinking the
wildcard really is not very useful unless it has maxOccurs =
'unbounded'.  The use of xsd:anyType in the local declarations also
distracts me as a reader: I keep thinking "surely no one in their
right mind would do it that way!")

Comment 2 David Ezell 2008-01-25 20:12:33 UTC
The WG agrees that this example should be reworked.
Editors should:
1) fix the obvious error or,
2) time permitting, rework the example, or
3) delete it.

Comment 3 John Arwe 2008-02-15 17:21:44 UTC
Thank you for clarifying the default binding point that I missed.  If anyType is not a valid default binding, then clearly the two are different.  Since this is a concrete example, it might be beneficial to point out _why_ the two are different on the premise that if I missed it, others will, instead of relying on the perserverance of each reader to derive your conclusion independently.  While the full general case of possibilities might be a confusing mishmash, in this concrete example I think you have a pair of concrete values in mind (just state them, let the reader decide if they can/care enough to attempt to follow the reason that led to those values).

I think I follow the majority of the rest of your comment; I am equally confident that I am still missing some fine points.  Maybe I can better explain my (possibly derailed) train of thought that led to the two questions that sound so strange.

notQname="speaker".  If speaker is a Qname, since it is unqualified, its ns URI in this example must be the default for the schema (fair, fine, no problem). I then wonder about the scope of its local portion.  Mentally I think I treated it like a local element declaration, which led to the "which symbol space is it in" question, since ("clearly") when trying to assess an instance against the schema and testing the notQname condition one needs to know the {ns URI, local name portion, symbol space of local name portion} triple of both the lvalue and the rvalue in order to properly test their equality.  

It is also possible I might mis-understand the relationship between the symbol spaces of a base CTD and each of its restrictions; I assumed them to be disjoint, and it was simply part of the "valid restriction" constraints that in the case of duplicate specifications, like <monitor>, that the symbol space of the local name portion was ignored (treated as equal) in the comparison.

It might be the semantic intent of notQname="speaker", taking it as the lvalue, is that "speaker" is treated as being in the same symbol space as the rvalue during each comparison in turn to see if the instance's <speaker> is allowed to match the schema's xs:any notQname, I simply did not think of that.  If the text was trying to tell me that, obviously I missed it (i.e. a larger more impressive club is needed for my noggin).  I could see that being expressed by saying that the symbols spaces are irrelevant during wildcard matching, or always assumed to be equal (to me those are equivalent, but they may well not be to those more versed in the schema arts).

When (1.0) it was just ns URI comparisons for the wildcard matching, this issue did not exist of course.

So to wrap around to your response, the expanded names in the wildcard are connected to symbol spaces for me precisely because the local portion of the expanded name (i.e. the NCName), in the context of a schema element declaration, exists in a symbol space.  Writing that out makes my thinking evolve a bit... during assessment, the instance whose content is being matched (attributed) to schema components has no symbol spaces.  In order to possibly match a wildcard, the instance's governing type definition cannot have a potential match(?), so "by definition" each notQname list component corresponds to a "potential instance" - since they are not local element declarations, the potential instances have no symbol spaces, and I could see from your side why the question seems terribly ill-formed.

I'm willing to leave the rest of the technical issues you pointed out as something the wg will address.  Since it is an example, I might be very tempted to take the simpler and not fully precise route.  Your "if X, then ..., otherwise ..." construct seems acceptable for that purpose, and certainly no more complex than other Structures content.
Comment 4 C. M. Sperberg-McQueen 2008-02-29 19:23:12 UTC
A wording proposal intended to resolve this issue (at least the narrow
issue of making the example clearer and less confusing) was considered
at today's WG teleconference, and adopted. 

  http://www.w3.org/XML/Group/2004/06/xmlschema-1/structures.b5157.html
(member-only link).

The broader question of clarifying the relation of symbol spaces to 
various name-matching tasks, including particle attribution, is NOT
resolved by this proposal, and can be tracked using bug 5507.  But for
the immediate problem which led to the bug report, the WG believes the
issue has now been resolved; I'm marking the issue 'resolved' accordingly.

John, as the originator of the issue, can you examine the wording proposal
and see if it seems to you to resolve the problem satisfactorily?  If so,
please indicate by changing the status of the issue to 'CLOSED'; if not,
please explain what is still wrong and change the status to 'REOPENED'.
If we don't hear from you, we will eventually conclude that silence implied
conssent.  (We normally say two weeks, but if you can't turn it around
that quickly please just say you need more time.)
Comment 5 John Arwe 2008-03-19 19:09:26 UTC
High level, yes fine resolved.

Editorial level, you might consider changing the next text to avoid the double negative
from: "if there is no notQName attribute on the wildcard"
to  : "if there notQName attribute on the wildcard is omitted"

Editor's discretion as to whether or not to make the editorial change; if not, feel free to close this.