3573 – Validation and invalid schemas

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 3573 - Validation and invalid schemas

Summary: Validation and invalid schemas

Status:	CLOSED FIXED

Alias:	None

Product:	XML Schema
Classification:	Unclassified
Component:	Structures: XSD Part 1 (show other bugs)
Version:	1.0/1.1 both
Hardware:	Macintosh All

Importance:	P2 normal
Target Milestone:	---
Assignee:	C. M. Sperberg-McQueen
QA Contact:	XML Schema comments list

URL:
Whiteboard:
Keywords:	resolved

Depends on:
Blocks:

Reported:	2006-08-01 23:50 UTC by C. M. Sperberg-McQueen
Modified:	2008-03-08 15:42 UTC (History)
CC List:	0 users

See Also:

Attachments

Description C. M. Sperberg-McQueen 2006-08-01 23:50:13 UTC

Section 5.1 of Structures concludes:

    With respect to the processes of the checking of schema structure
    and the construction of schemas corresponding to schema documents,
    this specification imposes no restrictions on processors after an
    error is detected. However assessment with respect to
    schema-like entities which do not satisfy all the above conditions
    is incoherent. Accordingly, conformant processors must not attempt
    to undertake assessment using such non-schemas.

Guided by the New Oxford American Dictionary that came with my
machine, I take 'incoherent' here to mean 'internally inconsistent;
illogical'.

I propose to delete the last two sentences, because I believe they are
in conflict with the rest of the spec, factually inaccurate, and
mistaken in their intent.

At some points (e.g. section 3.8.4, Validation Rule Element Sequence
Valid) the spec makes a point of observing that validation is defined
in such a way as to be possible even with content models which do not
obey the Unique Particle Attribution constraint.  But if validation is
well defined even for content models which do not obey UPA, then there
are at least some invalid schemas with which validation can be
performed in a way which appears to me not internally inconsistent or
illogical, and the spec is at pains to point out the fact.  (To avoid
confusion: by 'invalid schemas' I mean the same things mentioned by
the current text as 'schema-like entities which do not satisfy all the
above conditions'.)

The text quoted above seems to make the claim that if a schema
document has an invalid HTML element inside an xsd:documentation
element (let us say it's an element which is made legal by XHTML 2,
although we are now validating with an XHTML 1 schema), then
attempting to validate documents using components constructed from
that schema will lead to an inconsistency or is illogical.  I don't
see any inconsistency, and it doesn't seem illogical to me to want to
validate with such components, especially in view of the well known
rules of HTML and XHTML regarding behavior of software in the presence
of undeclared elements.

Requiring processors to fail when a schema document is invalid does
not now seem to me the right thing to do here.  In developing 1.0, the
Working Group was indeed unwilling to require them to soldier on
ignoring what they didn't understand, but I don't think it is wise to
*forbid* them to do so. (For that matter, I no longer think it was
wise not to *require* them to do so.)  I find I cannot remember the WG
actively deciding to forbid such behavor, so I can't remember any
arguments brought forward for such a rule.

I bleieve we should delete the two sentences without replacement, in
1.1 as a change to the status quo and in 1.0 as a bug fix.

If WG members feel that some replacement is required, I would propose
that the passage be amended to read:

    With respect to the processes of the checking of schema structure
    and the construction of schemas corresponding to schema documents,
    this specification imposes no restrictions on processors after an
    error is detected. However, any operations performed using
    schema-like entities which do not satisfy all the above conditions
    is outside the scope of this specification.

Optionally, add before the final full stop:

    and is not schema-validity assessment as that term is defined
    here

but I'm inclined to include that final bit.

Comment 1 Noah Mendelsohn 2006-08-02 00:47:30 UTC

I have some sympathy for this issue, at least insofar as I think the existing text is indeed unfortunate. I'm not quite convinced I agree with the proposed resolution.

I think the right way to slice the problem may be this:

* Our specification defines certain terminology, mappings, relations and constraints. Rather than talking about what processors must do or not do (e.g. soldiering on), I think the emphasis should be on what's defined and what isn't. So, the mapping from (purported) schema documents to components is defined only if the document in question conforms to the SfS and meets the constraints on schema documents. Assessment/validation is defined only if one has in hand a schema comprised of components that collectively meet the constraints on components, and so on.

* I agree that we should not prohibit processors from proceeding, but I don't think that means this specification has "nothing to say". The important thing it says is that what you have is (perhaps) not a schema or not a schema document, and that what you're doing is not formally assessment validation. So, what you MUST NOT do is present an output that doesn't suitably distinguish your results as being non-conforming, in the sense that they are beyond what the function for which our specification defines normative behavior. So, I agree it's OK for a processor to say: "I couldn't do conforming schema processing, but I could do something close and here's the answer." It's NOT OK for the processor to quietly patch around the problem and act as if a schema has been successfully composed, a conforming PSVI generated, etc. I believe that is something that our spec must say about this situation.

How about:

"With respect to the processes of the checking of schema structure and the construction of schemas corresponding to schema documents, this specification imposes no restrictions on processors after an error is detected, except that any results that are beyond what is specified herein (e.g. results of computed in spite of one or more constraints having been violated) must be clearly distinguished as not conforming to this specification."

Comment 2 C. M. Sperberg-McQueen 2006-08-02 17:05:40 UTC

I'm happy to say that any operations performed with invalid
schemas are distinct (and must be distinguished) from 
schema-validity assessment as defined in the XSD spec.  I
think it would be a mistake to say that they are "not conforming",
because that can be taken, and will correctly or incorrectly 
be taken as meaning that such operations are in scope and are 
covered by the spec and directly violate some rule in the spec, 
rather than as meaning that such operations are outside the scope of 
the spec.

I agree with almost all of NM's comment, except for his
implicit suggestion that he objects to the alternate text 
proposed in the initial description because it says that
the spec has "nothing to say" on the matter.  Nothing in the
proposed text, and nothing in the description of the issue,
uses those words or anything equivalent to them.

Comment 3 Noah Mendelsohn 2006-08-02 18:51:15 UTC

Michael Sperberg-McQueen writes:

> I agree with almost all of NM's comment, 

Good!

> except for his implicit suggestion that he 
> objects to the alternate text proposed in 
> the initial description because it says that
> the spec has "nothing to say" on the matter. 

I spoke a bit too loosely.  The concern was specifically with the phrase "this specification imposes no restrictions on processors after an error is detected."  I think we do want to impose a restriction, which is not to ever confuse in the output of the processor results of such post-error processing with results that might have come from non-error cases.  

Seems like we agree on the result if not quite the way I first justified it, so I'm inclined to say we're all set.  Do you agree?  Thanks.

Noah

Comment 4 C. M. Sperberg-McQueen 2006-09-23 03:24:17 UTC

The XML Schema WG discussed this issue at its call today, 22 September 2006,
and ended by adopting the following wording for the passage in
question:

     With respect to the processes of the checking of schema structure
     and the construction of schemas corresponding to schema
     documents, this specification imposes no restrictions on
     processors in the presence of errors. However, any operations
     performed in the presence of errors are outside the scope of this
     specification and are not schema-validity assessment as that term
     is defined here.

The change of "after an error is detected" to "in the presence of
errors" was adopted in order to avoid any hint of temporal dependencies
and make the declarative nature of the case clearer.  A proposal to
say that schema processors must detect and signal all errors was
discussed but not adopted, on the grounds that for some years the
Working Group has operated under the principle that when errors are
present all bets are off, so that some WG members were unwilling to
affirm that every error situation is guaranteed detectable at
acceptable cost.