3220 – Terminology: "must"

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 3220 - Terminology: "must"

Summary: Terminology: "must"

Status:	CLOSED FIXED

Alias:	None

Product:	XML Schema
Classification:	Unclassified
Component:	Datatypes: XSD Part 2 (show other bugs)
Version:	1.1 only
Hardware:	PC Windows XP

Importance:	P1 major
Target Milestone:	---
Assignee:	C. M. Sperberg-McQueen
QA Contact:	XML Schema comments list

URL:
Whiteboard:	cluster: errors and conformance
Keywords:	resolved

Depends on:
Blocks:

Reported:	2006-05-09 09:31 UTC by Michael Kay
Modified:	2008-06-07 07:49 UTC (History)
CC List:	0 users

See Also:

Attachments

Description Michael Kay 2006-05-09 09:31:28 UTC

QT-approved comment:

Section 1.6 Terminology: the definition of "must" relies on the
definition of "error", and the latter definition allows processors to do
anything they like if an error occurs: this effectively makes the term
"must" rather meaningless.

Comment 1 C. M. Sperberg-McQueen 2007-10-14 20:05:12 UTC

The WG discussed this issue ('must' and 'error') at the ftf of October
2007 in Redmond.

After some flailing around and mutual incomprehension, we converged on
the following propositions:

  - For processors, failure to obey a 'must' means the processor is
    non-conformant. (Not 'in error')

  - For data (schemas + schema documents), failure to obey a 'must'
    means error; this in turn is the same as meaning non-conformance.

  - The word 'error' denotes only things that can happen in data,
    never something that happens in processors.

Note that specs vary in their policy on what conforming processors
must do when presented with non-conforming data.  Some say, in effect,
"all bets are off" and impose no constraints on conforming processors
in this situation.  In a Venn diagram showing sets for conforming
processors, conforming data (schemas and schema documents), and valid
data (i.e. valid against the schema), these spec define required
behavior only for areas +++ and ++-.  Other specs require that
processors report or reject non-conforming data, thus also imposing
constraints on areas +-+ and +--.  

In some ways, XSDL 1.0 and 1.1 currently take the first view; in
others, they take the second.  

We achieved consensus on instructing the editors to rewrite to follow
the usages just described, and to specify that conforming processors
MUST detect and report errors in schemas and schema documents; they
MAY reject non-conforming data; they MAY recover.

On a side issue: it's clear that a schema document is in error only if
non-conformant.  Is it non-conformant only if it is in error?
Probable consensus around the answer "yes" (the two terms are
extensionally equivalent, and maybe intensionally equivalent).

Comment 2 Noah Mendelsohn 2007-10-16 23:00:41 UTC

For what it's worth, my opinion is:

* Regarding schema documents: We have a definition in the specification of the requirements for a document to be a conforming schema document. I think the editors draft of 2.4 is fine as it is regarding schema document conformance, and I don't think we should say anything about "errors" there. A document either conforms or it doesn't. Whether that's an error for one purpose or another is mostly beyond the scope of this Recommendation, except for use in building a schema (see next point). So, regarding conformance of documents, I would not task the editors with drafting new text. I think what we have is good.

* Regarding processors working on non-conforming schema documents: I think I'd say something along the lines of: "The rules provided herein for mapping information from an XML document into schema components, I.e. those provided for use in *schema-document aware* processors, apply only to conforming *schema documents*. Accordingly, any attempt to apply such rules or similar rules to non-conforming documents are beyond the scope of this specification.

"Note (this is to be included after the para above): the above is carefully worded to account for the fact that, although schema-document aware processors take at least some of their input in the form of schema documents, they may acquire additional schema information from other sources as well. Thus, although it would be unusual, it is not strictly prohibited for such a processor to acquire schema components from documents that resemble but don't quite conform to the rules for schema documents, or by applying new mapping rules that work around such "errors". What such a processor must not do is claim that such documents are conforming schema documents, and it must not claim to have (successfully) used the mapping rules provided herein to create components from them. In practice, almost all schema-document aware processors will treat as erroneous input documents that are not conforming *schema documents*."

I know that's a bit clumsy, but I think it's a correct explanation of how things work. My hope is that the last sentence clears things up, without unnecessarily claiming that the rules are fixed when in fact they're not. I think the underlying design and layering is good, and indeed one can imagine reasons for building processors that would fix up broken documents and proceed.

In principle, I think you could say something like the following as well, but I suspect it will be confusing to the 98% of readers to whom it is not directed anyway, so probably I'd leave it out. Just in case it motivates anyone to tune it up:

* Regarding errors in born-binary components: "As noted elsewhere, schema processors may use a variety of means to acquire the schema components to be used for validation. The assessment relation requires as a precondition a schema meeting the constraints on components set out herein. Processors may build in such components in their code, and/or may construct them based on various forms of processor-specific runtime input. Whether problems with such input cause the processor to reflect an error, and if so what sort of error, is beyond the scope of this recommendation."

I think I can just barely live with the workgroup's agreement at the F2F as recorded here in comment 1, but I'm not really happy with it. The above more directly reflects how I think things work and how we should explain things. Thank you.

Noah

Comment 3 Michael Kay 2007-10-17 09:06:31 UTC

>I think the editors draft of 2.4 is fine as it is regarding schema document conformance, and I don't think we should say anything about "errors" there.

I think you are referring to 2.4 in Part 1 (the bug was raised against part 2).

What we established is that the three phrases:

* A schema document (or its author) MUST NOT do X
* A schema document is in error if it does X
* A schema document is not conformant if it does X
 
are all used in the specification, and are to be treated as synonymous. Are you disagreeing with that?

I think it would be useful if we could say it the same way consistently, but agreeing that the three formulations are synonymous is a good start.

(Actually I think there's a fourth usage that creeps in occasionally: there are references to constructs in a schema document being "valid", particularly in relation to the syntax of regular expressions.)

> Whether that's an error for one purpose or another is mostly beyond the scope of this Recommendation,

You seem to be overlooking that Part 2, in particular, makes frequent use of language like: "It is an ·error· for ·minInclusive· to be greater than ·maxInclusive·."

>Regarding processors working on non-conforming schema documents: ... beyond the scope of this specification.

You seem to be suggesting that although the spec is littered with rules like "It is an ·error· for ·minInclusive· to be greater than ·maxInclusive·.", it's perfectly OK for a processor to ignore such rules and not enforce them. I find it hard to believe you can really mean that. It throws interoperability out of the window. We can chuck away half the test suite, which is written to verify that processors reject erroneous schema documents.

But reading your comment more carefully, perhaps what you are suggesting is that there is an animal called a "schema document processor", and that the obligation to report errors in a schema document is placed on a schema document processor, which is not the same thing as a schema processor? This would suggest we need to identify a new entity that can be the subject of conformance statements.

>Regarding errors in born-binary components: 

If we don't define an interface for constructing such beasts, then I think it's meaningless to discuss what happens if the contract for such an interface is broken.

Michael Kay

Comment 4 Noah Mendelsohn 2007-10-19 17:50:07 UTC

Sorry.  Not sure how I missed that this was only on part 2.  I think that's the only important point of confusion here.  So, I think we can ignore my comments, unless someone is also planning to raise similar issues on part 1.  My comments were, erroneously :-), directed at part 1.  Sorry for the confusion.

Noah

Comment 5 C. M. Sperberg-McQueen 2008-02-12 22:53:54 UTC

At last month's face to face meeting, the working group agreed on a
direction for resolving bug 5293 which involves a slight change to
the categorization of errors and to the story outlined here in comment #1.
In connection with that decision, it's clear that the wording does need
to be revisited both in part 1 and in part 2.  So I fear that I must
answer the implicit question in comment #4 by saying "Of course this
change should affect both part 1 and part 2; the alternative is to have
the two specs using fundamental terms like 'must' and 'error' in
pointlessly different ways."

The formulations criticized in the bug description are not quite the
same in Structures and in Datatypes; Structures appears not to contain
any definition of 'error'; the principle that in the presence of errors
all bets are off is enunciated not in a definition of the term 'error'
but in section 5.1.

For what it's worth, I do not understand the remarks about section
2.4 in comment #2.  The proposal appears to be either (a) that there
is no need to talk about the relation of the concept "error" and the
concept "conforming schema document" at all, or (b) that wherever that
relation is clarified, it ought not to be clarified in section 2.4 of
Structures.  Proposition (b) may be true, or may not; either way, it 
seems premature to worry about where the clarification should be made.
Proposition (a) amounts to saying that we should continue to use the
terms "conformance" and "error" without clarifying how they relate to
each other; this seems to me too harebrained an idea to merit discussion.

On the question of allowing processors to correct errors silently
(as long, presumably, as their documentation does not contain any claim
that silent processing amounts to an assurance that the schema documents
processed were OK), the idea seems to run directly counter to the sense 
of the working group when the issue was discussed in Redmond.  It also 
seems to run directly counter to the idea of interoperability among
schema processors.  But there seem to be rather different ideas in the
WG about what the word "interoperability" means.

Comment 6 Noah Mendelsohn 2008-02-19 22:49:13 UTC

Michael Sperberg-McQueen writes:

> For what it's worth, I do not understand the
> remarks about section 2.4 in comment #2.  The
> proposal appears to be either (a) that there is no
> need to talk about the relation of the concept
> "error" and the concept "conforming schema
> document" at all, or (b) that wherever that
> relation is clarified, it ought not to be
> clarified in section 2.4 of Structures.

Closer to (a), certainly not (b).  See below.

> Proposition (b) may be true, or may not; either
> way, it seems premature to worry about where the
> clarification should be made.

Yes, premature and no, not my intention.  Sorry for the confusion.

> Proposition (a) amounts to saying that we should
> continue to use the terms "conformance" and
> "error" without clarifying how they relate to each
> other; this seems to me too harebrained an idea to
> merit discussion.

Well, my real preference is to talk about conformance of schema documents and schemas (I.e. sets of components), and preconditions for producing a PSVI.  I would prefer to get out references to the word "error", and having done that, would not have to explain the relationship between "conformance" and "error".

With respect to schema documents, I think it would be fine for us to clearly define conformance criteria, and stop there.  So, with the Recommendation in my left hand and an XML Infoset in my right, I can tell you whether the Infoset is or isn't a conforming schema document.  I see no need to mention the word error.

We also, of course, have a basic treatment of processor conformance in section 2.4 (in both Schema 1.0 and Schema 1.1).  As best I can tell, those definitions also don't appeal to an explict notion of "error" (though section 5.1 does...see below).  My reading of them is that each of the conformance levels has certain preconditions to be applicable at all.  Implicitly, a minimally conforming processors operation in producing a PSVI is defined only in the case that the supplied schema meets the constraints on components.  If that isn't stated clearly (and I think it's stated sort of obliquely in both 1.0), then I'd support saying it in so many words.

My point is, I don't think we need to talk about "errors".  I think it's better to say, for processors, the output of conforming processors is defined in the cases where the input meets the following constraints (I.e. there is a schema satisfying the constrainst on components and an input Infoset).  In such cases, the PSVI is defined.  In other cases it isn't.

In short, I don't find the word error helpful in our Rec.  I think we have the abstractions we need to tell our story without appealing to that word.  Now, we do of course use the word error in some places in the Recommendation.  I've just searched for all of them, and that search confirms my feeling that in general the abstraction is not adding anything fundamental.  In many cases, I think the term is used more as a general indicator of "something not good" than as something interestingly distinct from, say, input that fails to conform.  Others, such as section 5.1 do use it in a way that's intended to have real force:  

"It is an error if a schema and all the components which are the value of any of its properties, recursively, fail to satisfy all the relevant Constraints on Schemas set out in the last section of each of the subsections of Schema Component Details (§3)."

Yes, but is calling that an error adding anything?  Would it be any less helpful to say:  "The assessment operation, and hence the determination of a PSVI, is defined only with respect to a valid schema, I.e. one in which the components comprising the values of its properties, recursively, satisfy all the relevant Constraints on Schemas set out in the last section of each of the subsections of Schema Component Details (§3)".  

Indeed, my preference would be to go through the Recommendation, looking for occurrences of the word "error", and in all or most cases, replacing them with discussions of required preconditions, conformance criteria, etc.  So, rather than explaining the relation of errors to conformance, I would try instead to making sure that we've explained conformance criteria for documents, preconditions for computing a PSVI, etc. 

That said, I'm mentioning this mainly because it's come up.  I've lived with the Schema 1.0 approach to "errors" all these years, and while I'm glad to share my opinion on how I'd fix this, I expect I can live with many compromise alternatives.  You seemed confused about what I was trying to say regarding 2.4;  I hope this clarifies it.

Noah

Comment 7 Michael Kay 2008-02-20 10:04:03 UTC

I have a personal preference for only talking about conformance in a conformance section, and using other language ("must", "error", "constraint") elsewhere. Rather than saying "A schema document is not conformant with this specification if maxOccurs is less than minOccurs", I prefer formulations like:

* minOccurs must be less than maxOccurs

* It is an error if minOccurs is not less than maxOccurs

* It is a constraint that minOccurs must be less than maxOccurs

and then have a conformance section that says errors must be reported or constraints must be enforced.

I don't have a strong feeling about the relative merits of must/error/constraint, but I think the three formulations should be equivalent - and I think the notion of "errors" is one that is familiar to many readers.

(Clearly the point about not requiring all errors to be reported is legitimate - all we should actually require for conformance is a boolean outcome that indicates whether a schema document is or or not error-free.)

A reminder about the original point of this bug report: the spec currently says that a processor can do anything it likes if minOccurs > maxOccurs, and I don't think that's acceptable.

Comment 8 Noah Mendelsohn 2008-02-20 16:56:21 UTC

> I have a personal preference for only talking about conformance in a
> conformance section, and using other language ("must", "error", "constraint")
> elsewhere. Rather than saying "A schema document is not conformant with this
> specification if maxOccurs is less than minOccurs", 

FWIW, I would have said it as eiether:

"To be a conforming *schema document*, maxOccurs must be greater than or equal to minOccurs".

Or in the conformance section:  "To a conforming *schema document* must obey all the individual contraints on schema documents", and (presumably elsewhere) "Schema Document Constraint: maxOccurs MUST BE greater than or equal to minOccurs"

I thought our status quo was pretty close in spirit to that second approach.  

In short, I don't want to say that "documents conform to our Recommendation".  I think I do want to say that a particular document does (or doesn't) conform to our rules for *schema documents*, or more briefly "this is/isn't a conforming *schema document*".

> I prefer formulations like:
> 
> [...]
> 
> * It is an error if minOccurs is not less than maxOccurs
> 
> [...]

> and then have a conformance section that says errors must be reported or
> constraints must be enforced.

Question:  don't you have to go beyond saying that "they must be enforced".  Don't you have to say "for a document to meet the definition of *schema document*, or if you prefer, to be a conforming *schema document*, there must be no errors such as the one mentioned above?  How would you make such a connection, or would you?  I'm really reluctant to lose the notion that to be a (termref) *schema document*, you must obey rules like this.  Thanks.

Noah

Comment 9 C. M. Sperberg-McQueen 2008-03-22 14:46:10 UTC

Wording proposals for Structures and Datatypes intended to resolve this
issue went to the XML Schema WG 21 March 2008:

  http://www.w3.org/XML/Group/2004/06/xmlschema-1/structures.b3220.html
  http://www.w3.org/XML/Group/2004/06/xmlschema-2/datatypes.b3220.html
  (member-only links)

Comment 10 C. M. Sperberg-McQueen 2008-04-28 14:38:31 UTC

The editors have pulled this wording proposal back for some more
revision before it goes to the WG again.

Comment 11 C. M. Sperberg-McQueen 2008-05-08 01:44:14 UTC

Revised forms of the wording proposals for this issue are now on the
server at the same locations as given in comment #9.

Comment 12 Noah Mendelsohn 2008-05-13 20:45:01 UTC

Since I expressed some concern with the draft text on error reporting, our chair has encouraged me to try and offer an alternative that would be more acceptable, at least to me, and I hope to others.  I also went back to the minutes of the Redmond meeting to see if I could unravel the source of the confusion on this, and I think I have.

Those minutes [1] say:

> We identified two question we needed to answer:

>    1. What is the relation of must, conformance, and error?
>    2. What is the obligation of a processor if data do not conform?
>           * must, or may, or should, or ..., reject the data
>           * must, or may, or should, or ..., detect and report errors

> We seemed to have consensus on the answers:

>    1. If a processor or data must do X, then a processor or
>       data which does not do X is non-conformant (and conversely
>       all conformant or conforming processors and/or data do X).
>       Also there is an error if and only if some data are non-conforming.
>   2.  The obligation of a processor, if the data do not conform,
>       is to detect and report errors.

FWIW, I believe that this was discussed after I had to leave for the airport, I'm not sure how much emphasis to put on the record that we ">seem to have< consensus" as opposed to the more typical "we >have< consensus".  Anyway, I assume that there was more or less consensus among the people in the room, but unfortunately, I would not have agreed were I there.  I leave it to our chair to decide whether my concerns should be addressed, or whether given the time pressure to ship we need to stick with the direction suggested above.

In case the chair decides that it's indeed worth some time to try and satisfy my concerns, my preferred answers to the questions would have been:

1) Fine as proposed, I think.
2) Suggested spec text:  "A conforming processor MUST NOT produce output that incorrectly indicates that nonconforming data is conforming (e.g. accepting as a valid schema document input that does not meet the pertinent constraints).  Conforming processors MUST also observe the requirements stated elsewhere in this Recommendation that certain operations, such as assessment, have as preconditions input schemas, schema documents, etc. that are conforming.  When provided with input that is not conforming, processors MAY (and indeed in most cases SHOULD) report that conformance error(s) have occurred.

I suggest that (2) replace the 2nd paragraph under "Error" (wordsmithing welcome).  Again, I do acknowledge that this proposed text does not reflect the consensus that "seemed" to be reached in Redmond, but it does reflect my preferences.  I hope this is helpful in coming to a resolution.

Thank you.

Noah


[1] http://www.w3.org/XML/Group/2007/10/xml-schema-ftf-minutes#b3220

Comment 13 C. M. Sperberg-McQueen 2008-06-07 01:00:43 UTC

The wording proposals mentioned in comment #11 were discussed and adopted
without amendment on today's WG call.  The alternative text proposed in
comment #12 was also discussed, but was not adopted.  (Since it mattered
to some WG members, let it be recorded here that the WG takes the 
requirement imposed by the wording adopted to be that conforming
processors distinguish failure of an attempted assessment due to error
in the input (whether schema or schema document) from other causes of
failure (e.g. resource problems); the text is not intended to imply, and
the WG believes it does not imply, that any particular level of granularity
in the identification of errors or their locus be achieved.  (We do
provide a concrete list of error codes, but they are recommendations not
conformance requirements.)

We believe that this resolves the issue, and I am marking this record
accordingly.

Michael, as the originator of the issue and as our contact with the QT
Working Groups, would you please convey to them the XML Schema WG's
resolution of the issue, and signal their satisfaction with the
disposition of the comment by closing the issue, or in case of their
dissatisfaction let us know what is wrong, and reopen the issue?  
Thank you.  If we don't hear from you in the next week, we will assume
that silence implies consent and that the issue is resolved satisfactorily
enough not to stand in the way of progressing the specification.