On validation from Henri Sivonen on 2009-06-15 (public-html@w3.org from June 2009)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Mon, 15 Jun 2009 15:09:50 +0300
To: HTMLWG WG <public-html@w3.org>
Message-Id: <5D1706DF-0909-418F-8592-24535A5C19A1@iki.fi>
On Jun 13, 2009, at 00:08, John Foliot wrote:

> Which brings *me* back to my ongoing question: why should we care  
> about
> validity (conformance)?

You should care about running a validator, because it points out  
mistakes you didn't intend to make and, thus, helps you find mistakes  
that would otherwise be slower (i.e. more expensive) to spot.

> It makes the discussion surrounding @summary et al
> moot: if I continue to use @summary in an HTML5 the document it's
> non-conforming.  So what?  It works for my intended audience, and that
> trumps some ideal of conformance that seems to be almost meaningless  
> in
> practice.  I get that it is "bad", but what does "bad" get me (vs.  
> what
> being "good" will get me)?

*If* @summary were a total waste for authors, the good you'd get is an  
indication not to bother adding the wasteful stuff. (Note that here  
I'm not taking a stance on whether it is totally wasteful or whether  
the authoring conformance definition is the best way to signal to  
authors that that a legacy feature is wasteful.)

On Jun 13, 2009, at 00:59, Shelley Powers wrote:

> Returns us full circle back to what I thought was a compelling point
> about Sam's font option list: are there penalties for using a
> non-conforming attribute or element in HTML5?

The penalty for using a non-conforming attribute depends on what kind  
of non-conforming attribute it is.

If the attribute has never been in an HTML spec and has never been  
implemented in browsers, the penalties are:

  1) The attribute is useless without a related program (client-side  
script or special-purpose non-browser client), so unless you wrote the  
attribute for a related program, you probably made a typo and will  
waste some time trying to figure out why whatever the non-typoed  
attribute would do isn't happening..

  2) If you did write a non-conforming attribute intentionally for a  
related program to consume and the attribute doesn't get popular on  
the public Web, a future level of HTML may use that attribute for  
something and break your pages. (If the attribute becomes popular on  
the public Web first, the definers of a future level of HTML probably  
notice it in Web crawls and avoid it even though it was in principle  
reserved for future levels of HTML.)

If the attribute has been a unilateral extension of one browser  
vendor, the penalty is that your page doesn't work in software from  
other vendors who haven't cloned the unilateral extension.

If the attribute has previously been in HTML but hasn't been  
implemented in browsers despite ample time, the penalty of using the  
attribute is that it doesn't work.

If the attribute has previously been in HTML and has been implemented  
in browsers, the penalty of using it is that you or the users of your  
users suffer the badness that definers of HTML5 tried to spare you or  
your users from.

Penalties for elements are similar, although it's easier to shoot  
oneself in the foot with the parsing of unknown elements.

On Jun 13, 2009, at 01:22, Ian Hickson wrote:

> On Tue, 2 Jun 2009, John Foliot wrote:
>> What, in practical terms, will it achieve - how will it modify author
>> behavior?
>
> It's not intended to modify author behaviour, it's intended to help
> authors stay within safe boundaries.

How is making @summary or <font color=blue> non-conforming not  
intended to modify author behavior? If authors otherwise wouldn't stay  
within boundaries that you consider 'safe', isn't making/helping them  
stay within those boundaries modifying their behavior?

>> If there is not a significant penalty attached to non-conformant  
>> code,
>> why bother?
>
> By sticking only to conforming content, authors get the following
> benefits (to name but a few):
>
> * More likely to have their content be accessible. For example,  
> authors
>   will get notified when they use features like <font color="">  
> instead
>   of features like <h1>.

If the alternative is <h1>. If Sam's XSLT turns <font> into style='',  
I don't believe accessibility got helped at all.

> * More likely to avoid unfortunate behaviour in tools. For example, by
>   making <i>p<b>q</i>r</b> non-conforming, we help authors who check
>   conformance avoid the cloning parsing behaviour that this triggers,
>   thus helping authors write pages that use less memory.

I agree.

> * More likely to avoid making authoring mistakes that result in  
> different
>   behaviour than they intended. For example, by making "&foo=" non-
>   conforming, authors that care about conformance are less likely to
>   accidentally write "&copy=" at some future point (which has a very
>   different meaning).

I agree.

> * More likely to avoid hitting areas of the language that will change
>   meaning in future versions. For example, by making <color>
>   non-conforming, more authors will avoid using that element, thus  
> if we
>   later introduce such an element, we will break fewer pages.

I agree, provided that they are mostly alone with the mistake so that  
the mistake isn't a legacy cowpath showing up in the Google index by  
the time a future revision of the HTML spec is made.

> * More likely to catch flat-out errors, such as having overlapping  
> cells
>   in tables.

I agree.

On Jun 13, 2009, at 01:25, Rob Sayre wrote:

> On 6/12/09 6:22 PM, Ian Hickson wrote:
>> On Tue, 2 Jun 2009, John Foliot wrote:
>>> I pose a serious question: what is the real benefit of making  
>>> unescaped ampersands non-conformant? (Of making anything "non- 
>>> conformant"?)
>> It defines what QA tools like conformance checkers should highlight  
>> as problems, as an aid to authors who wish to catch mistakes they  
>> did not intend. That's it.
> That's called a lint tool. You don't understand what MUST means.


If the behavior is normative, authors can substitute a lint /  
validator for another so they aren't locked into one tool provider  
once they have authored a lot of content that passes a given lint/ 
validator. More importantly, if tools that generate markup and lint/ 
validators have a shared notion of what's OK, authors are more easily  
able to acquire conforming markup generators whose output doesn't  
result in a lot of messages from a conforming lint/validator. That is,  
there's no need to shop for a markup generator that interoperates (as  
in "doesn't fill your screen with errors") with a specific lint/ 
validator implementation.

On Jun 13, 2009, at 01:26, Jonas Sicking wrote:

> As I also work with Henri Sivonen, I happen to know that the validator
> that he is writing (and I believe have been contracted by W3C to
> maintain)

The W3C uses software that I've written in the HTML5 back end of the  
W3C Validator, but I don't have a maintenance contract with the W3C.  
The W3C uses the Mozilla-funded HTML5 validator software as permitted  
by Open Source licensing.

On Jun 13, 2009, at 01:46, John Foliot wrote:

> And so no gold star for me.

FWIW, Validator.nu has never issued gold stars or badges of any kind.

> Fact: using ARIA attributes today creates non-conformant (non- 
> validating)
> documents,

If you use the HTML doctype (<!DOCTYPE html>), both Validator.nu and  
the W3C Validator permit ARIA markup even though the documents aren't  
conforming per HTML 5 spec yet.

On Jun 14, 2009, at 14:31, Sam Ruby wrote:

> Also: it doesn't completely need to go away.  The current document  
> says MUST in places where at best it means SHOULD (at least in the  
> RFC 2119 sense of the word "there may exist valid reasons in  
> particular circumstances to ignore a particular item, but the full  
> implications must be understood and carefully weighed before  
> choosing a different course.")

If you push things hard enough, you can argue that nothing on the  
authoring side is totally MUST. I think the discussion on MUST vs.  
SHOULD isn't particularly useful there. A validator developer still  
needs to map MUSTs and SHOULDs to errors, so if you get rid of all  
MUSTs and use SHOULDs, validators will complain about SHOULDs instead.  
In fact, when HTML 5 delegates something to RFCs, Validator.nu already  
treats e.g. URI-related SHOULD violations as errors. But then, the are  
other RFC SHOULDs that are simply too impractical to treat as errors.

I don't think the MUST vs. SHOULD discussion solves any problems--it  
just makes some of them Someone Else's Problems. (Often mine in the  
case of HTML5.)

> Alternately, the current document contains text that may ultimately  
> be split out.  If the authoring conformance requirements were split  
> out into what the IETF calls a "Best Current Practices" document,  
> those interested in those discussions could proceed separately.


I'll try to avoid repeating my earlier remarks on the topic of such a  
split, so this goes into title bikeshedding: I find both "Best  
Practices" and "Guidelines" to be weasel words when writing stuff that  
is phrased as requirements. Two reasons:

  1) Saying that they are 'only' guidelines or best practices allows  
the writers of such documents be overly strict, because they can  
always go "well, we didn't mean it that seriously, these are only  
guidelines or *best* practices" but others will still end up taking  
the requirements as hard requirements.

  2) Saying that something is "best practice" when you are writing new  
stuff instead of actually analyzing existing practice seems less  
honest than just claiming the authority to write new rules.

On Jun 15, 2009, at 01:26, Shelley Powers wrote:
> For FONT, there is no author conformance requirements, because it
> looks like this element is either missing, or has been pulled.

No, silence means it's not allowed. The spec is also silent on the  
specific authoring requirements on  
<supercalifragilisticexpialidocious>, which means that it's not allowed.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Monday, 15 June 2009 12:10:30 UTC