Re: [ANN] W3C Markup Validator 0.6.5 Beta #1 - "Zeldman Made Us Do It!" - Is Released!

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jukka K. Korpela <jkorpela@cs.tut.fi> wrote:

>On Thu, 28 Aug 2003, Terje Bless wrote:
>>-----BEGIN PGP SIGNED MESSAGE-----
>
>Please don't. You gain nothing but confusion among some people.

Oh? I have some specific reasons for PGP signing my messages, but perhaps if
you elaborated a bit on what you mean I would have reason to reconsider?


>><http://validator.w3.org:8001/>
>
>It claims that my main page http://www.cs.tut.fi/~jkorpela/ is not
>valid, claiming that h1 is not allowed without a preceding <body>.
>
>It also claims that some end tags must not be omitted because "OMITTAG
>NO was specified".
>
>This is undescribably absurd. It ain't no 1st of April!

Well, as others have explained, you are seeing the effects of the new «Fussy
Parsing» mode. Given I know you are familiar with the concepts I'll take it
your complaint is rather that it was misleading in the test case you
attempted; correct?

In response, let me summarize the points brought forward by Nick, Jim, and
Olivier:

* The new «Fussy Parsing» mode is an optional additional check.
  It is the beginnings of adding back the equivalent of the «Weblint»
  option that used to be avilable, but which had to be removed because
  it was too badly out of touch with reality.

* That the «Fussy Parsing» is enabled quietly and by default when
  submitting an URL from the front page is a mistake (a bug).
  Note that this only affects URLs submitted through the form on
  the front page of the Beta; «/check?uri=referer» is not affected
  and on the «Advanced Form» the option is visible and can be disabled.

* At some suitable point during the beta test — most likely coinciding
  with other updates and bug fixes — the relevant checkbox will be made
  visible, but still enabled by default to garner feedback and wider
  testing of this new feature. By the time we go to final release, it
  is likely that it will be disabled by default and possibly relegated
  to only the «Advanced Form». This is exactly what beta testing is for;
  to determine the exact disposition of this type of change.

* The presentation of the results when «Fussy Parsing» is enabled
  appear to not be satisfactory. This will be rectified to the
  extent possible before release. If you have suggestions for how
  to improve it they would be most welcome.

And as final note, the «Fussy Parsing» did *exactly* what it was intended to
do with your page! It told you that you had omitted the start and end tags for
the «body» element. I'm sure you were aware of this and had quite possibly
omitted them on purpose, but then you're hardly the target of this new
feature. As the release notes said; it's not a parse mode for fussy people,
it's a parse mode that is fussy so that you don't have to be!

It is specifically intended to make the validator more usefull for the general
population of users, but _without_ compromising its objectivity. The «Fussy
Parsing» mode is implemented by fiddling with the effective SGML Declaration
on the fly — as opposed to some regex or heuristic hackery — to account for
the fact that the original SGML Declaration is badly out of touch with common
implementations.


If anyone should care to dig deeper into this, the «Fussy Parsing» mode is
implemented by passing additional warning options to OpenSP, our SGML Parser.
The specific options in use are:

* unclosed   — Warn about unclosed start and end-tags.
* empty      — Warn about empty start and end-tags.
* net        — Warn about net-enabling start-tags and null end-tags.
* refc       — Warn about ommitted refc delimiters.
* data-delim — Warn about occurances of `<' and `&' as data.

* missing-att-name
  Warn about ommitted attribute names in start tags.

* fully-tagged
  Warn if the document instance fails to be fully-tagged.  This has the
  effect of changing the SGML declaration to specify DATATAG NO, RANK NO,
  OMITTAG NO, SHORTTAG STARTTAG EMPTY NO and SHORTTAG ATTRIB OMITNAME NO.


In addition, the following options are always enabled for SGML documents:

* valid
  Has the effect of changing the SGML declaration to specify VALIDITY TYPE
  and IMPLYDEF ATTLIST NO ELEMENT NO ENTITY NO NOTATION NO.

* non-sgml-char-ref
  Warn about numeric character references to non-SGML characters.

* no-duplicate 
  Do not warn about duplicate entity declarations. 


I would be happy to discuss the details of each of these option, when and how
to enable them, as well as any other options we should add support for. I am
also open to discussion of whether there should be more fine-grained selection
possible between all of the optional switches or none of them. And as allready
mentioned, the user interface to these options and their presentation is still
up for discussion.


>>[…] — «Who the heck *writes* this stuff?!?!» — […]
>
>Sorry, it seems that some virus inserted garbage text into your message.

Oh? Are you refering to the double quote marks and the em-dash? As far as I
know, those should be sent as properly encoded UTF-8. Please let me know if
they were not. Could it be that your email client does not support UTF-8?


>>When the W3C Markup Validator is running in «Fussy Parsing» mode it
>>will complain about all sorts of things that are technically legal in
>>HTML, but which is known to be problematic in practice and probably not
>>what you wanted.
>
>It's not just "complaining". It's claiming that a valid document is
>invalid. And this is apparently intentional. Hence, it does not even try
>to be a validator any more.

That does not follow; neither from the argument nor from the observed
behaviour. Please let us know what we can do to _improve_ the situation — and
I assure you, we care very deeply about making the validator as useful a tool
as it possibly can be! — instead of assuming we are deliberatly trying to
sabotage you (well, or using that apparent assumption as a rethorical device
in any case).

- -- 
Of course we are the good guys! We define what is good and evil. All other
definitions are wrong, and possibly the product of a deranged imagination.
                                                         -- Stephen Harris

-----BEGIN PGP SIGNATURE-----
Version: PGP SDK 3.0.2

iQA/AwUBP080k6PyPrIkdfXsEQKpjACeMSX1XVlKeU8yoivZNo4QptWrIbgAoKNb
2JHLdJiUpMBieHdngvcqj8Tk
=cpLY
-----END PGP SIGNATURE-----

Received on Friday, 29 August 2003 07:10:15 UTC