6010 – [schema11] priority feedback responses

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 6010 - [schema11] priority feedback responses

Summary: [schema11] priority feedback responses

Status:	RESOLVED FIXED

Alias:	None

Product:	XML Schema
Classification:	Unclassified
Component:	Structures: XSD Part 1 (show other bugs)
Version:	1.1 only
Hardware:	PC Windows XP

Importance:	P2 normal
Target Milestone:	---
Assignee:	C. M. Sperberg-McQueen
QA Contact:	XML Schema comments list

URL:
Whiteboard:
Keywords:	resolved

Depends on:
Blocks:

Reported:	2008-09-02 14:17 UTC by John Arwe
Modified:	2008-10-30 16:48 UTC (History)
CC List:	3 users (show)

See Also:

Attachments

Description John Arwe 2008-09-02 14:17:47 UTC

3.3.4.6 Schema-Validity Assessment (Element) - fallback to lax validation
SML found it necessary to specify fallback to lax validation in its specs because Schema 1.0 had not done so.

3.4.4.5 Conditional Type Substitutable in Restriction
Practitioners in the industry standards area have generally been arguing for schema-based mechanisms with more flexibility of late. While I am unsure how often they use restrictions, where they do this would likely be viewed as a positive decision.

3.10.1 The Wildcard Schema Component
"The keywords defined and sibling allow a kind of wildcard which matches only elements not declared in the current schema ..."
Given that "schema" is, according to 2.1 which has the closest thing I could find to a formal definition of this word, just a set of schema components, I'm not sure what the actual boundary for 'defined' is nor how interoperable its definition really is. A schema processor is allowed to put almost literally anything (extra, i.e. unused) into the schema (set of components) used for
assessment, no? If there was some concept of a "minimal schema", at say schema document granularity, it might be clearer...of course then if someone re-factors the documents, ymmv.
Conceptually I have no objection, I'm just not sure right now how wide its net casts.

3.10.1 The Wildcard Schema Component
"The keywords defined and sibling allow a kind of wildcard which matches only elements not declared in the current schema ..."
Similar question for sibling. This is somewhat better defined than schema, but the language seems loose. {ns constraint} clause 6 talks about the containing type decl; here, I wonder if that should read very literally, or to include all of what look like sibling elements in an instance but are attributed to {base type definition} items, transitively.

4.2.3 Including modified component definitions (<redefine>)
Given the many recommendations "on the street" to avoid redefine completely, I expect its deprecation to be no issue. Some of these recommendations no doubt came from lack of database support from some vendors, which may have now changed.

Comment 1 Noah Mendelsohn 2008-09-02 17:36:54 UTC

John Arwe wrote:

> 3.3.4.6 Schema-Validity Assessment (Element)
>  - fallback to lax validation
> SML found it necessary to specify fallback
> to lax validation in its specs
> because Schema 1.0 had not done so.

and

> I'm not sure what the actual boundary for
> 'defined' is nor how interoperable its
> definition really is.

I confess to finding it a bit odd to see these two comments sitting so close together, as LAX validation depends on a notion of which elements are "defined" that's pretty much the same as what's proposed for wildcards.  In the case of LAX, if the element is defined, it's validated.  In the case of #defined wildcards, it's disallowed.

> A schema processor is allowed to put almost
> literally anything (extra, i.e. unused) into
> the schema (set of components) used for
> assessment, no?

Yes, although it's assumed that the processor chosen for a particular application will or won't do such things according to the application's needs.  Most general purpose processors will use only those components that directly result from the XSD files provided.  By contrast, an HTML editor (to pick an example) might use a customized incremental validator that had built in knowledge of the XHTML schema.  

More to the point, the recommendation is very clear that assessment depends on knowing what the schema is, I.e. which components comprise it.  Once you know that, you know which elements will be validated during lax assessment, and which ones would be disallowed by a #defined wildcard.

Noah

Comment 2 Michael Kay 2008-09-02 20:35:41 UTC

> I'm not sure what the actual boundary for 'defined' is nor how 
> interoperable its definition really is.
 
I have to say I'm deeply uneasy about this one too. The idea that the validity of a document should depend on how far the garbage collector has got in clearing out unreferenced definitions from its schema cache is very unappealing. 

The fact that the problem already exists for the case of lax validation isn't really an excuse for making it worse. 

> More to the point, the recommendation is very clear that assessment depends 
> on knowing what the schema is, I.e. which components comprise it.

And yet we go out of our way to make it impossible for users to determine what the schema is (they can only give "hints"), by saying for example that an unresolved reference or an unsatisifed xs:include does not invalidate the schema, you just do validation without the relevant bits.

It's particularly the case in processing environments that are more complex than the classic standalone document validation episode (for example, an XML database scenario) that the total schema is unlikely to be under the complete control of the user who initiates the validation.

Perhaps we should make it clearer that the "schema" (the set of components used in a validation episode) is something the user must specify when initiating validation, rather than everything the schema processor can find lying around in memory. But to do that we need a clearer and more predictable story on schema composition, that is, how such a schema is constructed from a set of schema documents.

Comment 3 C. M. Sperberg-McQueen 2008-09-02 23:58:04 UTC

John, thank you for the feedback.  If I read them correctly, your responses
on the various priority-feedback requests range from vigorous approval
of the change on which the XSD 1.1 spec requests feedback to 'this could
bite some unwary schema authors, but I have no strong objection' (i.e. no
stop-the-presses unroll-this-change-right-now responses).

W.r.t. 3.10.1, I believe the wording of the note relies heavily
on the phrase 'the current complex type' being taken as short-hand
for 'conteining complex type definition' and understood as denoting a 
schema component (i.e. an abstract object) rather than an XML element 
(which is, I guess, also an abstract object, but a different kind).  
The intent is indeed to include siblings which are inherited from
the base type; they are present in the component, even if not present
in the source declaration.

If you can think of ways to recast this material to eliminate the
apparent ambiguity or vagueness and avert the unintended reading, the
editors and working group will be grateful for suggestions.

W.r.t. 3.10.1 and the interaction of the not-in-schema wildcard
with the underspecification of schema construction in XSD 1.0 and 1.1,
your point is, I think, well taken.  As you can see, however, other 
members of the working group believe that there is not really any
difficulty in this area.

Comment 4 John Arwe 2008-09-03 14:26:17 UTC

wrt MSM overall reading:
correct - nothing here I plan to lay in any tracks about, just "feedback" :-)

wrt the confusion on the proximity of the fallback to lax validation and #defined comments:

SML chose to prescribe fallback to lax in order to have more predictable assessment outcomes, i.e. to do its best to remove a point of variability in the results of the schema processors that validating SML consumers inherently rely on for their functioning. If SML were to allow processors to choose "randomly" whether or not descendants of unknown content were assessed, then SML processors (inherently dependent upon PSVI to evaluate SML constraints) would more likely differ in their SMLIF validation results wrt a single input document. This choice might not mean much to schema processors per se, but those dependent on schema processors' output need to either propagate this 'surprising' result or remove it.

not-#defined seems like a negation of fallback to lax, in the sense that you are depending pretty heavily on the ABSENCE of competing schema components. In the fallback to lax case I don't really care how much extra cruft the schema processor wants to keep around -- unused cruft is no harm, no foul (in reality, market forces incent a minimalist bias probably). With #defined, I have to care because it can affect assessment results in the sense of changing from notKnown or Valid to Invalid.

I certainly acknowledge that in a sense both are dependent on schema processors doing "reasonable" things to find "the right" schema components. It seems to me that "less reasonable" choices on the part of schema processors have a stronger effect on overall assessment results (document level, what the majority of processes care about) in the case of #defined than in the case of fallback to lax.

wrt MSM 3.10.1 If you can think of ways to recast this material...

{complex type definition} (add curly brackets)? That should at least resolve the xml vs schema component "question" simply, unambiguously, and consistently w/ the rest of the document.

As far as the implication that {complex type definition}'s {content type} includes the {base type definition}'s particles "by definition", I'll readily admit that I had to return to 3.4.2.3.3 and study it a bit to conclude I agree with that proposition. Blame the many varieties of complex type, not the presentation. And there is a Note following the tableau addressing exactly this point already, so the most change I would even consider is linking to that Note from #sibling. I leave that possibility to the editors, but I lean mildly against it on the principle that I (the reader) should read more carefully. In "real use", I think it's also easy to test and the ("implied") answer is unsurprising.

Comment 5 Michael Kay 2008-09-03 15:06:23 UTC

>As far as the implication that {complex type definition}'s {content type} includes the {base type definition}'s particles "by definition"

Which leads to the interesting observation, which I hadn't recognized before, that when type T has a notQName="##definedSibling" wildcard, and type E is derived from T by extension by permitting various additional optional child elements, then an element that is valid against T might not be valid against E, because the set of names that match ##definedSibling is different in the two cases. Not a problem, I think, but a little unexpected.

Comment 6 John Arwe 2008-09-11 18:41:35 UTC

The SML working group chose to endorse this bug on its call of 2008-09-11

The SML working group chose to endorse the following comments on its call of 2008-09-11

3.3.4.6 -- note: was adopted as part of the SML spec

3.4.4.5 

3.10.1 part a

3.10.1 part b

4.2.3 -- note: although the SML spec no longer mentions redefine, but the member submission specifically excluded it.

Comment 7 David Ezell 2008-10-30 16:47:38 UTC

The WG reviewed these comments and thanks the SML WG for their input.  The WG decided to open a new bug 6193 in order to clarify aspects of section 3.10.1.  

The WG will leave the requests for feedback at least through the transition to CR.  At transition, the WG expects to examine each request for feedback and either to retire (remove) the request, or to modify the request into a designation for a feature-at-risk.

Thanks again