[whatwg] On "validation"

 From the spec:
>     The term "validation" specifically refers to a subset of  
> conformance checking that only verifies that a document complies  
> with the requirements given by an SGML or XML DTD. Conformance  
> checkers that only perform validation are non-conforming, as there  
> are many conformance requirements described in this specification  
> that cannot be checked by SGML or XML DTDs.
>
>     To put it another way, there are three types of conformance  
> criteria:
>
>        1. Criteria that can be expressed in a DTD.
>        2. Criteria that cannot be expressed by a DTD, but can still  
> be checked by a machine.
>        3. Criteria that can only be checked by a human.
>
>     A conformance checker must check for the first two. A simple  
> DTD-based validator only checks for the first class of errors and  
> is therefore not a conforming conformance checker according to this  
> specification.

There are three things I don't like about this note:
First, it perpetuates the "Validation means only DTD validation" mantra.
Second, it mentions SGML and XML DTDs casually together.
Third, it can be read to imply that using a DTD as part of a  
conformance checker is a good idea.

In addition to the SGML and XML specifications there are no other  
specifications used in the context of XML that to define "valid" in  
the context of each specification and define it meaning something  
other than what is meant in the SGML or XML specifications. RELAX NG,  
Schematron and W3C XML Schema are examples of specifications that use  
the word of "valid" as a technical term that does not involve any  
kind of DTD. RELAX NG and Schematron have even made it through ISO.  
(The closest definition of validation in WA1.0/WF2.0 is validation of  
form field values.)

Despite what the W3C Validator has led people to believe, if a data  
object is valid as per SGML, it could still not be even well-formed  
as per XML. Since HTML5 is not based on SGML, I think any implication  
that SGML DTDs could in any way be relevant to HTML5 (or XHTML in  
general) should be avoided.

The implication that XML DTDs could be used for partial conformance  
checking is a, in my opinion, harmful because:
  * the way DTDs are normally used and the only way that is  
sanctioned by the XML spec contaminates the document instance
  * the document itself can smuggle grammar rules of its own into the  
process
  * DTDs don't support namespaces
  * DTDs are hopelessly inadequate in expressing the conformance  
requirements

Suggested replacement text:

Note: XML DTDs cannot express all the conformance requirement of this  
specification. Therefore, a validating the XML processor and a DTD  
cannot constitute a conformance checker. Also, since the two  
authoring formats defined in this specification are applications of  
SGML, a validating SGML system cannot constitute a conformance checker.

- -

Since a large part of HTML5 involves aligning in the spec with the  
real world, perhaps the term "HTML5 validation" should be defined to  
mean the same as "HTML5 conformance checking". :-)

When I tell a friend or an acquaintance about my thesis, the  
discussion usually goes more or less like this:

Me: So the working title of my master's thesis is "A Conformance  
Checking Service for Web Applications 1.0 Documents".
Friend: Come again?
Me: "A Conformance Checking Service for Web Applications 1.0 Documents".
Friend: Web 1.0 applications?
Me: "Web Applications 1.0" is the name of a spec. The nickname is  
HTML5, but that's a politically hot name.
Friend: Oooh!

And if the friend is interested enough, it continues like this:

Friend: So what it is you are doing exactly?
Me: I'm developing a service that takes a document, which means a  
finite sequence of bytes and a Content-Type header, and checks if it  
meets the requirements of the spec.
Friend: So basically you are developing an HTML validator.
Me: Roughly, yes, but it is called a conformance checker.

And then once:

Me: The working title of my thesis is "A Conformance Checking Service  
for Web Applications 1.0 Documents".
***silence***
SemWeb guy: You mean an HTML validator?

-- 
Henri Sivonen
hsivonen at iki.fi
http://hsivonen.iki.fi/

Received on Thursday, 16 March 2006 08:46:13 UTC