Re: Validation error frequencies

On Jan 31, 2008, at 14:26, Henri Sivonen wrote:

> I ran an analysis on recent error messages from Validator.nu.
> http://hsivonen.iki.fi/test/moz/analysis.txt


I reran the numbers.

> Validator.nu doesn't support <font> but supports style='' on every  
> element.


I added <font>, since it is in the draft and having the validator  
recognize it will cause data about the attributes to be logged.

> 0198 / 400	Bad value “Content-Type” for attribute “http-equiv” on  
> element “meta” from namespace “http://www.w3.org/1999/xhtml”.
> 0056 / 400	Bad value “content-type” for attribute “http-equiv” on  
> element “meta” from namespace “http://www.w3.org/1999/xhtml”.
> 0004 / 400	Bad value “Content-type” for attribute “http-equiv” on  
> element “meta” from namespace “http://www.w3.org/1999/xhtml”.
> 0002 / 400	Bad value “content-Type” for attribute “http-equiv” on  
> element “meta” from namespace “http://www.w3.org/1999/xhtml”.
> 0001 / 400	Bad value “CONTENT-TYPE” for attribute “http-equiv” on  
> element “meta” from namespace “http://www.w3.org/1999/xhtml”.

I think we should allow the old internal encoding declaration syntax  
for text/html as an alternative to the more elegant syntax. Not  
declaring the encoding is bad, so we shouldn't send a negative message  
to the authors who are declaring the encoding. Moreover, this is  
interoperable stuff.

I think we shouldn't allow this for application/xhtml+xml, though,  
because authors might think it has an effect.

> 0120 / 400	Bad value (redacted) for attribute “href” on element “a”  
> from namespace “http://www.w3.org/1999/xhtml”: Bad IRI reference:  
> WHITESPACE in QUERY.
> 0036 / 400	Bad value (redacted) for attribute “href” on element “a”  
> from namespace “http://www.w3.org/1999/xhtml”: Bad IRI reference:  
> DOUBLE_WHITESPACE in QUERY.
> 0042 / 400	Bad value (redacted) for attribute “src” on element “img”  
> from namespace “http://www.w3.org/1999/xhtml”: Bad IRI reference:  
> DOUBLE_WHITESPACE in PATH.
> 0024 / 400	Bad value (redacted) for attribute “href” on element “a”  
> from namespace “http://www.w3.org/1999/xhtml”: Bad IRI reference:  
> WHITESPACE in PATH.
> 0019 / 400	Bad value (redacted) for attribute “src” on element “img”  
> from namespace “http://www.w3.org/1999/xhtml”: Bad IRI reference:  
> WHITESPACE in PATH.
> 0019 / 400	Bad value (redacted) for attribute “href” on element “a”  
> from namespace “http://www.w3.org/1999/xhtml”: Bad IRI reference:  
> DOUBLE_WHITESPACE in HOST.
> 0012 / 400	Bad value (redacted) for attribute “href” on element “a”  
> from namespace “http://www.w3.org/1999/xhtml”: Bad IRI reference:  
> DOUBLE_WHITESPACE in PATH.
> 0007 / 400	Bad value (redacted) for attribute “href” on element “a”  
> from namespace “http://www.w3.org/1999/xhtml”: Bad IRI reference:  
> WHITESPACE in FRAGMENT.
> 0003 / 400	Bad value (redacted) for attribute “href” on element  
> “link” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI  
> reference: WHITESPACE in PATH.
> 0001 / 400	Bad value (redacted) for attribute “src” on element  
> “script” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI  
> reference: DOUBLE_WHITESPACE in PATH.
> 0001 / 400	Bad value (redacted) for attribute “src” on element  
> “input” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI  
> reference: WHITESPACE in PATH.
> 0001 / 400	Bad value (redacted) for attribute “src” on element “img”  
> from namespace “http://www.w3.org/1999/xhtml”: Bad IRI reference:  
> WHITESPACE in QUERY.
> 0001 / 400	Bad value (redacted) for attribute “href” on element  
> “link” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI  
> reference: WHITESPACE in QUERY.
> 0001 / 400	Bad value (redacted) for attribute “href” on element  
> “link” from namespace “http://www.w3.org/1999/xhtml”: Bad IRI  
> reference: WHITESPACE in FRAGMENT.
> 0001 / 400	Bad value (redacted) for attribute “href” on element “a”  
> from namespace “http://www.w3.org/1999/xhtml”: Bad IRI reference:  
> DOUBLE_WHITESPACE in FRAGMENT.

Wow. The whitespace in IRI issues are far more common than I would  
have thought. To the extent U+0020 is harmless and interoperably  
handled, we should probably spec a pre-processing step that suppresses  
cases that are harmless in practice.

> 0112 / 400	Attribute “border” not allowed on element “img” from  
> namespace “http://www.w3.org/1999/xhtml” at this point.

I think it is safe to conclude that virtually no one who authors Web  
pages likes the Mosaic image border defaults.

It turns out that there are some very common "presentational"  
attributes that were allowed in HTML Transitional and that are used to  
reset bad browser defaults. That is, these attributes are not really  
used for designing pages. For example, people don't usually use the  
border attribute to set a border. Instead, these attributes are given  
extreme values (0 or in the case of width on <table> 100%) to zap  
arbitrary non-extreme-value defaults. The border attribute is  
virtually always used to get rid of the default border. Sites still  
use CSS for actual designs.

I think we would make migrating existing designs vastly easier without  
giving into presentational markup too much if we made the very common  
extreme-value resets conforming. That is, in the case of border, I  
suggest making value "0" conforming.

In the case of border, I think it would be good also to get rid of the  
default border in Gecko, since other browsers have been able to get  
rid of it without Breaking the Web.

> 0099 / 400	Attribute “cellspacing” not allowed on element “table”  
> from namespace “http://www.w3.org/1999/xhtml” at this point.

See border. Let's make "0" conforming.

Also, border-spacing doesn't work in IE7, so leaving this to CSS  
doesn't work for most authors, yet.

> 0095 / 400	Attribute “cellpadding” not allowed on element “table”  
> from namespace “http://www.w3.org/1999/xhtml” at this point.

See border. Let's make "0" conforming.

> 0097 / 400	Legacy doctype.

This means the no quirks mode with a non-HTML5 doctype.

> 0095 / 400	Almost standards mode doctype.

Limited quirks.

> 0026 / 400	Quirky doctype.
> 0062 / 400	Start tag seen without seeing a doctype first.
> 0002 / 400	Non-space characters found without seeing a doctype first.
> 0001 / 400	Bogus doctype.

Quirks.

> 0095 / 400	Attribute “size” not allowed on element “input” from  
> namespace “http://www.w3.org/1999/xhtml” at this point.

Common and can't emulate with CSS 2.1. Let's make this conforming for  
the relevant input types.

> 0094 / 400	Text after “&” did not match an entity name.

Using a markup-significant character in URLs was a bad design choice,  
but it is too late to change it. It would be great if the harmless  
cases could be made non-errors without making stuff like &copy turning  
into the copyright sign pass silently.

I don't have a concrete suggestion at this time, though.

> 0092 / 400	Attribute “xml:lang” not allowed on element “html” from  
> namespace “http://www.w3.org/1999/xhtml” at this point.
> 0018 / 400	Attribute “lang” not allowed on element “html” from  
> namespace “http://www.w3.org/1999/xhtml” at this point.
> 0002 / 400	Attribute “xml:lang” not allowed on element “q” from  
> namespace “http://www.w3.org/1999/xhtml” at this point.
> 0002 / 400	Attribute “xml:lang” not allowed on element “meta” from  
> namespace “http://www.w3.org/1999/xhtml” at this point.
> 0002 / 400	Attribute “xml:lang” not allowed on element “link” from  
> namespace “http://www.w3.org/1999/xhtml” at this point.
> 0002 / 400	Attribute “xml:lang” not allowed on element “em” from  
> namespace “http://www.w3.org/1999/xhtml” at this point.
> 0004 / 400	Attribute “xml:lang” not allowed on element “span” from  
> namespace “http://www.w3.org/1999/xhtml” at this point.
> 0004 / 400	Attribute “xml:lang” not allowed on element “a” from  
> namespace “http://www.w3.org/1999/xhtml” at this point.
> 0002 / 400	Attribute “lang” not allowed on element “span” from  
> namespace “http://www.w3.org/1999/xhtml” at this point.
> 0002 / 400	Attribute “lang” not allowed on element “div” from  
> namespace “http://www.w3.org/1999/xhtml” at this point.
> 0002 / 400	Attribute “lang” not allowed on element “a” from  
> namespace “http://www.w3.org/1999/xhtml” at this point.
> 0001 / 400	Attribute “xml:lang” not allowed on element “li” from  
> namespace “http://www.w3.org/1999/xhtml” at this point.
> 0001 / 400	Attribute “xml:lang” not allowed on element “h4” from  
> namespace “http://www.w3.org/1999/xhtml” at this point.
> 0001 / 400	Attribute “xml:lang” not allowed on element “h3” from  
> namespace “http://www.w3.org/1999/xhtml” at this point.
> 0001 / 400	Attribute “xml:lang” not allowed on element “div” from  
> namespace “http://www.w3.org/1999/xhtml” at this point.
> 0001 / 400	Attribute “xml:lang” not allowed on element “blockquote”  
> from namespace “http://www.w3.org/1999/xhtml” at this point.
> 0001 / 400	Attribute “xml:lang” not allowed on element “abbr” from  
> namespace “http://www.w3.org/1999/xhtml” at this point.
> 0001 / 400	Attribute “lang” not allowed on element “p” from  
> namespace “http://www.w3.org/1999/xhtml” at this point.
> 0001 / 400	Attribute “lang” not allowed on element “h2” from  
> namespace “http://www.w3.org/1999/xhtml” at this point.
> 0001 / 400	Attribute “lang” not allowed on element “h1” from  
> namespace “http://www.w3.org/1999/xhtml” at this point.
> 0001 / 400	Attribute “lang” not allowed on element “em” from  
> namespace “http://www.w3.org/1999/xhtml” at this point.
> 0001 / 400	Attribute “lang” not allowed on element “body” from  
> namespace “http://www.w3.org/1999/xhtml” at this point.

It seems that many people have copied XHTML boilerplate, but only few  
docs use xml:lang on non-root elements.

Perhaps we should allow xml:lang as a talisman in text/html if lang is  
present and they have the same value. This isn't going to fun for libs  
that map HTML5 to XML, though.

> 0084 / 400	Attribute “border” not allowed on element “table” from  
> namespace “http://www.w3.org/1999/xhtml” at this point.

See <img> border. Let's make "0" conforming.

> 0081 / 400	Attribute “language” not allowed on element “script” from  
> namespace “http://www.w3.org/1999/xhtml” at this point.

I think we should allow language="JavaScript" as a talisman. Allowing  
it would clear a lot of noise from validation results of top sites.

I'm less inclined to allow version designators, since we probably  
don't want to condone legacy mode targeting.

> 0077 / 400	Attribute “width” not allowed on element “table” from  
> namespace “http://www.w3.org/1999/xhtml” at this point.

See border. Let's make "100%" conforming.

> 0065 / 400	Attribute “width” not allowed on element “td” from  
> namespace “http://www.w3.org/1999/xhtml” at this point.

Pretty common...

> 0061 / 400	Attribute “valign” not allowed on element “td” from  
> namespace “http://www.w3.org/1999/xhtml” at this point.
> 0060 / 400	Attribute “align” not allowed on element “td” from  
> namespace “http://www.w3.org/1999/xhtml” at this point.

Let's make at least align conforming. Cell alignment often depends on  
content (numbers vs. text) and setting alignment in CSS as needed is a  
pain.

> 0052 / 400	Attribute “name” not allowed on element “a” from  
> namespace “http://www.w3.org/1999/xhtml” at this point.

Not dead with Netscape 4...

> 0042 / 400	Attribute “accesskey” not allowed on element “a” from  
> namespace “http://www.w3.org/1999/xhtml” at this point.

Surprisingly common.

> 0040 / 400	Last error required non-streamable recovery.

Surprisingly common.

> 0033 / 400	Attribute “height” not allowed on element “td” from  
> namespace “http://www.w3.org/1999/xhtml” at this point.

Pretty common...

> 0028 / 400	Element “center” from namespace “http://www.w3.org/1999/xhtml 
> ” not allowed in this context. (The parent was element “body” from  
> namespace “http://www.w3.org/1999/xhtml”.) Suppressing further  
> errors from this subtree.
> 0004 / 400	Element “center” from namespace “http://www.w3.org/1999/xhtml 
> ” not allowed in this context. (The parent was element “td” from  
> namespace “http://www.w3.org/1999/xhtml”.) Suppressing further  
> errors from this subtree.
> 0001 / 400	Element “center” from namespace “http://www.w3.org/1999/xhtml 
> ” not allowed in this context. (The parent was element “noscript”  
> from namespace “http://www.w3.org/1999/xhtml”.) Suppressing further  
> errors from this subtree.

I was surprised to still find <center> on actively maintained top  
sites. I guess CSS hasn't made centering easy enough.

> 0023 / 400	Attribute “frameborder” not allowed on element “iframe”  
> from namespace “http://www.w3.org/1999/xhtml” at this point.
> 0022 / 400	Attribute “width” not allowed on element “iframe” from  
> namespace “http://www.w3.org/1999/xhtml” at this point.
> 0022 / 400	Attribute “height” not allowed on element “iframe” from  
> namespace “http://www.w3.org/1999/xhtml” at this point.

Let's allow width/height as with other embedded content and allow  
frameborder='0'.

> 0020 / 400	Bad value “_top” for attribute “target” on element “a”  
> from namespace “http://www.w3.org/1999/xhtml”: Bad browsing context  
> name: Browsing context name started with the underscore and used a  
> reserved keyword “top”.
> 0006 / 400	Bad value “_top” for attribute “target” on element “form”  
> from namespace “http://www.w3.org/1999/xhtml”: Bad browsing context  
> name: Browsing context name started with the underscore and used a  
> reserved keyword “top”.
> 0001 / 400	Bad value “_top” for attribute “target” on element “area”  
> from namespace “http://www.w3.org/1999/xhtml”: Bad browsing context  
> name: Browsing context name started with the underscore and used a  
> reserved keyword “top”.
> 0002 / 400	Bad value “_top” for attribute “target” on element “base”  
> from namespace “http://www.w3.org/1999/xhtml”: Bad browsing context  
> name: Browsing context name started with the underscore and used a  
> reserved keyword “top”.
> 0006 / 400	Bad value “_new” for attribute “target” on element “a”  
> from namespace “http://www.w3.org/1999/xhtml”: Bad browsing context  
> name: Browsing context name started with the underscore and used a  
> reserved keyword “new”.

Perhaps these should be conforming.

> 0014 / 400	Element “acronym” from namespace “http://www.w3.org/1999/xhtml 
> ” not allowed in this context. (The parent was element “p” from  
> namespace “http://www.w3.org/1999/xhtml”.) Suppressing further  
> errors from this subtree.
> 0004 / 400	Element “acronym” from namespace “http://www.w3.org/1999/xhtml 
> ” not allowed in this context. (The parent was element “li” from  
> namespace “http://www.w3.org/1999/xhtml”.) Suppressing further  
> errors from this subtree.
> 0002 / 400	Element “acronym” from namespace “http://www.w3.org/1999/xhtml 
> ” not allowed in this context. (The parent was element “q” from  
> namespace “http://www.w3.org/1999/xhtml”.) Suppressing further  
> errors from this subtree.
> 0002 / 400	Element “acronym” from namespace “http://www.w3.org/1999/xhtml 
> ” not allowed in this context. (The parent was element “h2” from  
> namespace “http://www.w3.org/1999/xhtml”.) Suppressing further  
> errors from this subtree.
> 0001 / 400	Element “acronym” from namespace “http://www.w3.org/1999/xhtml 
> ” not allowed in this context. (The parent was element “h3” from  
> namespace “http://www.w3.org/1999/xhtml”.) Suppressing further  
> errors from this subtree.
> 0001 / 400	Element “acronym” from namespace “http://www.w3.org/1999/xhtml 
> ” not allowed in this context. (The parent was element “div” from  
> namespace “http://www.w3.org/1999/xhtml”.) Suppressing further  
> errors from this subtree.
> 0001 / 400	Element “acronym” from namespace “http://www.w3.org/1999/xhtml 
> ” not allowed in this context. (The parent was element “cite” from  
> namespace “http://www.w3.org/1999/xhtml”.) Suppressing further  
> errors from this subtree.
> 0001 / 400	Element “acronym” from namespace “http://www.w3.org/1999/xhtml 
> ” not allowed in this context. (The parent was element “article”  
> from namespace “http://www.w3.org/1999/xhtml”.) Suppressing further  
> errors from this subtree.

Perhaps <acronym> should be allowed but defined as a synonym of  
<abbr>. As deployed, <acronym> isn't exclusively used for what  
dictionaries define as acronyms.

> 0013 / 400	Attribute “classid” not allowed on element “object” from  
> namespace “http://www.w3.org/1999/xhtml” at this point.
> 0012 / 400	Attribute “codebase” not allowed on element “object” from  
> namespace “http://www.w3.org/1999/xhtml” at this point.

These messages will only annoy authors...

> 0010 / 400	Element “nobr” from namespace “http://www.w3.org/1999/ 
> xhtml” not allowed in this context. (The parent was element “div”  
> from namespace “http://www.w3.org/1999/xhtml”.) Suppressing further  
> errors from this subtree.
> 0002 / 400	Element “nobr” from namespace “http://www.w3.org/1999/ 
> xhtml” not allowed in this context. (The parent was element “p” from  
> namespace “http://www.w3.org/1999/xhtml”.) Suppressing further  
> errors from this subtree.
> 0003 / 400	Element “nobr” from namespace “http://www.w3.org/1999/ 
> xhtml” not allowed in this context. (The parent was element “td”  
> from namespace “http://www.w3.org/1999/xhtml”.) Suppressing further  
> errors from this subtree.
> 0004 / 400	Element “nobr” from namespace “http://www.w3.org/1999/ 
> xhtml” not allowed in this context. (The parent was element “a” from  
> namespace “http://www.w3.org/1999/xhtml”.) Suppressing further  
> errors from this subtree.
> 0001 / 400	Element “nobr” from namespace “http://www.w3.org/1999/ 
> xhtml” not allowed in this context. (The parent was element “form”  
> from namespace “http://www.w3.org/1999/xhtml”.) Suppressing further  
> errors from this subtree.
> 0002 / 400	Element “wbr” from namespace “http://www.w3.org/1999/ 
> xhtml” not allowed in this context. (The parent was element “p” from  
> namespace “http://www.w3.org/1999/xhtml”.) Suppressing further  
> errors from this subtree.
> 0001 / 400	Element “wbr” from namespace “http://www.w3.org/1999/ 
> xhtml” not allowed in this context. (The parent was element “a” from  
> namespace “http://www.w3.org/1999/xhtml”.) Suppressing further  
> errors from this subtree.

Let's make these conforming.

> 0010 / 400	Attribute “name” not allowed on element “map” from  
> namespace “http://www.w3.org/1999/xhtml” at this point.

Let's make this conforming.

> 0009 / 400	reference to undeclared general entity copy

Fun with XML parsers that don't process external entities...

> 0005 / 400	Element “u” from namespace “http://www.w3.org/1999/xhtml”  
> not allowed in this context. (The parent was element “a” from  
> namespace “http://www.w3.org/1999/xhtml”.) Suppressing further  
> errors from this subtree.
> 0002 / 400	Element “u” from namespace “http://www.w3.org/1999/xhtml”  
> not allowed in this context. (The parent was element “b” from  
> namespace “http://www.w3.org/1999/xhtml”.) Suppressing further  
> errors from this subtree.
> 0001 / 400	Element “u” from namespace “http://www.w3.org/1999/xhtml”  
> not allowed in this context. (The parent was element “span” from  
> namespace “http://www.w3.org/1999/xhtml”.) Suppressing further  
> errors from this subtree.
> 0001 / 400	Element “u” from namespace “http://www.w3.org/1999/xhtml”  
> not allowed in this context. (The parent was element “p” from  
> namespace “http://www.w3.org/1999/xhtml”.) Suppressing further  
> errors from this subtree.

Not very common.

> 0005 / 400	Bad value “X-UA-Compatible” for attribute “http-equiv” on  
> element “meta” from namespace “http://www.w3.org/1999/xhtml”.

Hmm.

> 0005 / 400	Bad value “search” for attribute “type” on element  
> “input” from namespace “http://www.w3.org/1999/xhtml”.
> 0001 / 400	Attribute “placeholder” not allowed on element “input”  
> from namespace “http://www.w3.org/1999/xhtml” at this point.


Let's make these conforming.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Thursday, 31 January 2008 21:51:09 UTC