This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 5164 - validation vs assessment
Summary: validation vs assessment
Status: CLOSED FIXED
Alias: None
Product: XML Schema
Classification: Unclassified
Component: Structures: XSD Part 1 (show other bugs)
Version: 1.1 only
Hardware: PC Windows XP
: P1 minor
Target Milestone: ---
Assignee: C. M. Sperberg-McQueen
QA Contact: XML Schema comments list
URL:
Whiteboard: terminology cluster
Keywords: editorial, resolved
Depends on:
Blocks:
 
Reported: 2007-10-08 22:03 UTC by John Arwe
Modified: 2010-11-10 17:37 UTC (History)
3 users (show)

See Also:


Attachments

Description John Arwe 2007-10-08 22:03:09 UTC
It seems like a right of passage educating people on XML schema that a "valid schema" is not what the naive expectation would be.

I don't see why 1.1 is encouraging this by choosing misleading terms.  If you want folks to learn that validation is not assessment, and usually they are interested in assessment results, then define the right terms.  eg

validation root -> assessment root
type-driven (etc) validation -> type-driven (etc) assessment 

Many of the usages of "validation" I found before the term is defined really seem to be referring to "assessment".  If the specs are not consistent with their usage, it will not be realistic to think the spec consumers will be.
Comment 1 C. M. Sperberg-McQueen 2008-01-24 19:52:30 UTC
Discussion in the XML Schema Working Group seems to indicate that the WG is
divided on whether 'validation' and 'assessment' actually have different
denotations or connotations.

It is true that the value of the [validity] property and the results people
are interested in are not the same, but it's not clear to some in the WG that
that distinction is captured by the distinction between the terms 'validation'
and 'assessment'.

Can you expound a bit more on the distinction you believe the spec should be
making?

[This comment made speaking only for myself, not for the WG.]
Comment 2 David Ezell 2008-01-24 20:14:24 UTC
<MSM> [I lean toward (a) revisiting the usages of the term 'validation', and (b) adding a Note describing the distinction of connotation and pointing out that because the two are so tightly intertwingled, the terms are frequently used interchangeably.]


RESOLUTION: Bug 5164 is marked as "NO FURTHER ACTION".  There is no commitment to make any changes, but editors will investigate improvements relating to "validation" vs. "assessment" terminology on best effort basis.

Comment 3 C. M. Sperberg-McQueen 2008-02-04 16:17:19 UTC
In an effort to make better use of Bugzilla, we are going to use the
'severity' field to classify issues by perceived difficulty.  This 
bug is getting severity=minor to reflect the existing whiteboard note
'easy'. 
Comment 4 John Arwe 2008-02-15 18:58:31 UTC
I disagree with the resolution in comment 2.

wrt comment 1, I don't understand how the wg itself can be divided when the spec itself (1.0 AND 1.1) defines the two terms to mean different things explicitly in 2.1 Overview of XSDL (unless the wg asserts that "validation" is not a derivative of "valid", which I don't buy either):

Schema-validity assessment has two aspects:
1 Determining local schema-validity, that is whether an element or attribute information item satisfies the constraints embodied in the relevant components of an XSDL schema;

2 Synthesizing an overall validation outcome for the item, combining local schema-validity with the results of schema-validity assessments of its descendants, if any, and adding appropriate augmentations to the infoset to record this outcome.

Throughout this specification, [Definition:]  the word valid and its derivatives are used to refer to clause 1 above, the determination of local schema-validity.

Throughout this specification, [Definition:]   the word assessment is used to refer to the overall process of local validation, schema-validity assessment and infoset augmentation.

If you are going to define these terms, I think you are obligated to use them consistently yourselves.  As a reader, it does not appear to me that they are consistently used today.  If the wg believes they are, then it's fair to think my reading is incorrect and during the next review I will have to carefully think about the implications of this on its correctness...could be fun and informative.

Editorially, it might be a simple partial fix to copy "schema validity is not a binary predicate" from 5 to 2.1 where the existing definitions exist.

It is a separate, although very worthwhile, question to as to whether or not schema users would benefit from having a common name for the most commonly used assessment results, i.e. from the wg and spec defining a new term (or terms) to capture the most common intent when "generic" users state that an instance document is "valid" [wrt some schema].

e.g. [Definition:]   the word IYFNTH is used to refer to the condition where all of the following are true:
- an instance document's content is assessed against a set of schema components - the document's root element is the validation root
- the assessment invocation is type is element-driven validation 
- the validation root has a PSVI [validity] property value of 'valid'
- all descendants of the validation root have a PSVI [validity] value of 'valid' or 'unknown'
- the validation root has a PSVI [validation attempted] property value of 'full' 

I could see for example at least two useful definitions, one allowing for lax wildcards to be missing schema components (above) and one requiring even lax wildcards to have schema components (removing 'unknown').  My set of conditions above might be an incorrect representation of this intent... if so, it just makes my point by example.

If it is objectionable for some reason to put such definitions in Structures, other venues like a Note would be acceptable to me.  I do see value in the schema wg encouraging that kind of common understanding in practice, even if it is not normative per se.
Comment 5 C. M. Sperberg-McQueen 2008-02-28 21:00:26 UTC
[Again, speaking for myself, not the WG.  I apologize in advance for
the length of this comment.]

Thank you, I think, for inducing me to look at this topic again more
carefully.

I confess that I had mostly regarded "schema-validity assessment" as
merely a term we had invented during the inital work on XSD 1.0, in
order to avoid using the term "validation", since "validation" seemed
at the time to be tied tightly in people's minds with DTDs.  (As I
understood it, the new term also had the beneficial side effect of
replacing a familiar four-syllable word with an unfamiliar
nine-syllable phrase meaning essentially the same thing.)  As time has
gone by, it has become clear that the term "validation" is not now so
tightly connected to DTDs as to be confusing when used in connection
with other schema languages, and I (for one) have simply started
saying "validation" instead of "schema-validity assessment" because
it's shorter and clearer.

At the face to face meeting in Florida last month, the WG declined to
accept the proposition that "validation" and "schema-validity
assessment" should be regarded (and described) as synonyms. Instead,
the WG reaffirmed the view that (if I am reconstructing our thinking
correctly) "validation" is to be narrowly construed as calculating the
[validity] property, while "assessment" is to understood as the
process which results in the full PSVI.  The term "validation" may
also possibly convey the idea that ONLY the [validity] property of the
validation root is of interest, and that the [validity] of its
descendants is calculated only in the service of coming up with the
result for the root.

I believe that this distinction is essentially the one your bug report
urges the WG to take more seriously and use more consistently in our
wording.

For myself, I continue to have occasional difficulties with this
distinction, since neither [validity] nor the full PSVI can be
calculated without calculating the other -- the full PSVI includes the
[validity] property, and there is nothing in the PSVI (unless I have
forgotten something, in which case I'll fall back to saying "nothing
MUCH in the PSVI") that does not play a role, however indirect, in the
definition, and thus in the correct calculation, of the [validity]
property of the root -- so [validity] cannot be calculated correctly
without incidentally calculating the entire PSVI.

Still, even if the two terms are extensionally equivalent, in that no
one can perform validation without performing assessment, and vice
versa, still they can be distinct in their intension / connotation,
with one focusing on a simple ternary property and the other on a more
elaborate information structure.  (I shall try thinking of
'assessment' as a different way of saying 'annotation', and see
whether that helps.)

Having now spent a few hours looking at the spec and trying to align
its usage with the distinction just outlined, I have begun to fear
that making the spec consistent and clear on this matter doesn't look
likely to be easy.  The more I look at it, the less clear I am on 

 (a) what distinctions section 2.1 is trying to draw, 
 (b) what distinctions are actually made in the usage of the terms 
     in the spec, and 
 (c) what distinctions the spec SHOULD be drawing, in order to have 
     useful terminology, and what its usage SHOULD be.  

Of these, (c) seems the most important, but any discussion of (c) is
going to entail at least some clarification of, or bitter argument
over, (a) and (b).

I have begun to feel unsure, as a reader of the spec, whether the
discussion in 2.1 is trying to define two distinct terms, or three.
The two-term interpretation is the one I've tried to outline above:

  - validation = calculation of the [validity] property, more or
    less equivalent formally to what is done with DTDs and RelaxNG and
    other languages

  - schema-validity assessment (or "assessment" for short) =
    calculation of the full PSVI, thus a process which provides much
    more information than the Boolean or ternary value produced by
    validation
    
In this interpretation, the text's association of "validation"
specifically with LOCAL validity is slightly puzzling but assumed to
be of not great consequence.  (I doubt very much that the text
actually uses 'validation' and related terms ONLY with regard to local
validity -- the name of the [validity] property is one
counter-example, to start with.)  Ditto the odd inclusion of
"schema-validity assessment" as one of three things included in the
definition of "assessment"; if the one term is just a short form of
the other, this looks like a circular definition.

The three-term interpretation takes section 2.1 as trying to
distinguish, and provide terms for, three distinct ideas:

  - validation = calculation of local validity (only); for XSD 1.1
    we can say calculation of the [local validity] property

  - schema-validity assessment = calculation of the [validity]
    property.  Recall that the [validity] of an item is a function of
    both the [local validity] of that item and the [validity] of its
    dependents. Note, then, that schema-validity assessment, so
    defined, entails validation, but not vice versa

  - assessment = validation + schema-validity assessment + infoset
    augmentation.  

This interpretation seems closer to what is actually said in 2.1 than
the two-term interpretation, so for purposes of topic (a) I lean
toward it.  But the distinctions drawn and the terminology proposed
seem problematic to me.  (As a member of the WG that produced XSD 1.0,
I am of course jointly responsible with others for what's in 2.1, but
from where I now stand it looks as if I / we didn't do a very good job
here.)

Fist, since schema-validity assessment entails validation, it seems
odd to list them both as if they were separable.

Second, the augmentation of the input information set is a natural and
unavoidable consequence of either of the first two, so listing
augmentation as a separate item in the definition of 'assessment' also
seems odd.  The attitude toward "infoset augmentation" here looks, in
fact, like a relic of the view (never openly acknowledged but
pervasively smuggled into the text of XSD 1.0) that "the
post-schema-validation infoset" is not a set of information
automatically generated by validation / assessment, but a sort of API
or data structure.  We have done a lot to eliminate this error from
the spec, but there is more work to be done in section 2.1, if we are
to get the definitions of validation and assessment clear.

An example may help make the point clearer.  Consider an element,
validated against a governing type definition in the course of
validation / schema-validity assessment.  In the "infoset as API"
view, information like the identity of that governing type definition
may or may not be part of the PSVI, depending on whether it is or is
not present in the information presented by the validator to its
invoker.  Providing that information augments the set of information
available to the caller.  In the "infoset as set of information" view,
the identity of the governing type definition is always and
necessarily part of the PSVI, because it a piece of information always
and necessarily present when the element is validated.  Whether that
part of the PSVI is exposed by the validator through an API or through
messages or through a data structure or by other means is relevant to
any description or use of the validator, but not to the definition of
the PSVI.

So I lean toward the view that while the current text of 2.1 is trying
to define three distinct terms for three distinct concepts, both the
distinctions between concepts and the choice of terms for the concepts
are problematic at best.

The central point from which this bug report started -- that the
[validity] property on the root is often NOT what users of XSD will
need to care about -- is a good one, as is the suggestion in comment
#4 that we define IYFNTH or some other term for (a reasonable
approximation of) what people trying to use XSD typically mean when
they say "valid document". And so, for that matter, is the suggestion
that if the XSD spec is going to claim to make a distinction between
the terms "validation" and "assessment", the usage of the words should
reflect that distinction.

I seem once more to be uncertain (1) how to connect the the validation
/ assessment distinction) to the concept of IYFNTH, and (2) how best
to define and use the terms in the text of the spec.

I'll have to spend more time thinking about this bug.  In the
meantime, comments from John, or from anyone reading the Bugzilla
entry, may be helpful.


Comment 6 C. M. Sperberg-McQueen 2008-05-21 18:44:28 UTC
Since the remaining issue here is essentially a question of
terminology and tone, I am marking it editorial.
Comment 7 Michael Kay 2008-11-02 20:22:32 UTC
I would like to throw another problem into the arena here, namely the word "local" as in "local validity" or "locally valid". It's not really made clear what this means.

2.1 gives the impression that "local validity" is contrasted with "overall validity", where (loosely) an element is locally valid if its attributes and child elements appear where they are allowed to appear (*), and is overall-ly valid if the content of those elements and children are also (locally and overall-ly valid).

But this distinction isn't carried through. There are many confusing uses. For example, in 3.3.4.4 rule 3.2, we read "If T is a complex type definition, then E is ·valid· with respect to T  as per Element Locally Valid (Complex Type) (§3.4.4.2);". But when we follow the link, we find that 3.4.4.2 doesn't tell us what it means for an element E to be ·valid· with respect to a type T; rather it tells us what it means to be "locally ·valid·" (sic: not ·locally valid·). The reader is expected to know that ·valid· and "locally ·valid·" in this case are synonyms. There are many other such cases.

I think it would help to change many (perhaps most) uses of "·valid·" and "locally ·valid·" to "·locally valid·" with an appropriate definition.

(*) another glitch is that local validity includes checking of assertions, which may involve looking at the content: so it can't be said that local validity depends only on the sequence of element names and the set of attribute names that appear.
Comment 8 David Ezell 2009-02-20 16:47:42 UTC
related to bug 6015
Comment 9 John Arwe 2009-05-08 18:05:13 UTC
from Bug 5167 comment 2

  3 The definition of valid has not been modified; it is hoped that
    the resolution of bug 5164 will in passing repair the unsuitable
    reference to the otherwise anonymous "clause 1".
Comment 10 C. M. Sperberg-McQueen 2009-10-29 18:09:52 UTC
A wording proposal intended to resolve bug 5164 and bug 6015 is now on the server at

  http://www.w3.org/XML/Group/2004/06/xmlschema-1/structures.b5164.html
  (member-only link)

Its salient features are:

1 Drops the claim that the spec uses 'valid' only to refer to local validity; this is just not true.

2 Adds a definition of validation that makes clearer the intimate relation between 'validation' and 'assessment'.

3 Points out that in all cases, validation and assessment overlap somewhat operationally, and in many cases the two terms are extensionally equivalent.  Says that when we wish to emphasize the calculation of the [validity] property, we use v-words, and when we wish to emphasize the rich fullness of the PSVI, we use a-words.  And when no particular emphasis is intended one way or the other, the choice is arbitrary and often based on historical accident.  (E.g. the v-word not an a-word in "PSVI".)

4 Changes a few uses of 'validation' to 'assessment' to help make 3 true.

5 Changes all the cases where the spec currently says "X must be valid against Y, as defined in Rhubarb Locally Valid (section 89.23.2)" to say that X must be *locally* valid, since local validity is what the rule referred to actually defines.  These cases (ten or twenty of them) are the cases where the current text actually does use 'valid' to refer to local validity.

6 Provide a definition of 'valid document' (OR: of root-valid document, deep-valid document, and uniformly valid document) for the convenience of working groups who use XSD to define XML vocabularies and want to say "to conform to our spec, your document must be (root|deep|uniformly)? valid against our schema", instead of having to specify the initial conditions for validation and the required results of validation using terminology they often feel uncomfortable with.

In addition, some editorial changes were made. 

One aspect of the proposal that will need discussion is the choice between a single definition of 'valid document' and a set of several definitions which capture different conditions (root element has [validity] = valid, vs. root element has [validity] = valid and no node in the document has [validity] = invalid, vs. every node in the document has [validity]=valid).  And if the latter, the choice of terms must also be considered. 
Comment 11 C. M. Sperberg-McQueen 2009-10-30 20:58:37 UTC
The wording proposal mentioned in comment 10 was adopted, with amendments, by the XML Schema WG in its call today.  The amendments had the form of (a) adopting version B of the definitions of document validity, (b) moving them out of section 2.1 into a new section 2.4 on Schema-validity and documents, and making a few smaller changes.  The resulting form of the proposal (after amendment) can be seen (intermingled with the change for bug 7913, which is unrelated but which was also approved today) at 

  http://www.w3.org/XML/Group/2004/06/xmlschema-1/structures.nsq.html
  (member-only link; content subject to change, though none expected soon)

With this change the WG believes this issue has been resolved.

John, as the originator of the issue, please accept our thanks for holding our feet to the fire to make us clean this up.  Then please consider whether you are satisified with the WG's consideration and disposition of your comment, and indicate satisfaction by closing the issue or dissatisfaction by reopening it, in the usual way.  If we don't hear from you in the next two weeks we will assume that you are satisfied.
Comment 12 David Ezell 2010-11-10 17:37:08 UTC
The WG reported this bug as FIXED on 2009-10-30.  We are closing this bug
as requiring no futher work.  If there are issues remaining, you can reopen
this bug and enter a comment to indicate the problem.  Thanks very much for the
feedback.