HTML5 Authoring Conformance Study

From HTML WG Wiki
Jump to: navigation, search

This page includes a review of some notable sites, and their conformance interpreted as HTML5 and interpreted as their declared doctype. Each page was checked for conformance to HTML5 and to its declared doctype. In addition, HTML5 conformance errors were broken down in detail.

Methodology

For each of these sites, validator.nu was used to determine what DOCTYPE is reported for the main page. Based on this, the following tests were applied:

  • http://validator.nu/ was used to check for HTML5 conformance and count errors (but not warnings or info messages). HTML5 validation and parsing mode were forced for pages that are not declared as HTML5.
  • For pages that declared themselves to be something other than HTML5, http://validator.w3.org/ was used to validate them as their declared type.
  • For pages with an XHTML doctype, http://validator.nu/ was used to check for XHTML5 conformance. XHTML5 validation mode and XML parsing mode were forced, and the "lax about content types" checkbox was checked.
  • HTML5 conformance errors as reported by validator.nu were classified through manual review, using the data at http://intertwingly.net/blog/2010/03/20/Authoring-Conformance-Requirements as preliminary input, and using a custom copy of the scripts used to generate that automatic classification. The Conformance Study Rules set is available for review

Summary Data

Doctype Distribution

All Sites

  • HTML5 - 11
  • XHTML 1.0 Transitional - 10
  • HTML 4.01 Transitional - 7
  • XHTML 1.0 Strict - 6
  • HTML 4.01 Strict - 3
  • none - 2

Alexa Sites Only

  • HTML5 - 7
  • XHTML 1.0 Transitional - 7
  • HTML 4.01 Transitional - 7
  • XHTML 1.0 Strict - 3
  • HTML 4.01 Strict - 3
  • none - 2

Conclusions and Bugs Filed

Bugs Filed Against Spec

Bugs Filed Against Validator

Conformance Errors that Seem Justified

  • duplicate id (highly likely to cause problems down the road, if not causing them already)
  • bad attribute syntax - clearly an authoring error in context
  • stray close tag - likely indicates an authoring error where the result is not what the author intended
  • Misnested tags - results in bad performance, unexpected DOM, incompatibility with streaming or limited error handling UAs. Results in non-interoperable behavior in legacy UAs.
  • charset attribute on inline <script> - has no effect, possibly contrary to author expectations
  • inline script defer - doesn't work as expected, makes no sense
  • Encoding error
  • Missing space in doctype - likely hazard for interop with legacy software
  • Self-closing syntax on non-void elements (used in a way that may invoke the adoption agency algorithm)

Conformance Errors Under Discussion

  • Unclosed tag, in cases where the element is not void and the close tag is not implied; these seem likely to indicate an authoring error.
    • Why are close tags implied for some elements and not for others? In the cases used, the pages render as intended, arguing for at most a warning.
      • Some elements have implied end tags because that's what HTML4 historically allowed. In most cases these are implicitly closed by particular open tag. For instance, <p> is closed by a subsequent <p> or by many other block-level elements. However, other elements will end up containing the whole document if unclosed, or will invoke the adoption agency algorithm. For the one specific example in the study, we have inside information that it was unintentional. --Maciej Stachowiak 17:40, 3 April 2010 (UTC)
  • Almost standards / limited quirks mode doctype - triggers nonstandard behavior which may not be fully interoperable in legacy UAs
    • The behavior is defined by the HTML5 spec, and is interoperable. At most a warning is in order.
      • Which doctype triggers which mode is specified by the HTML5 spec. The spec also defines a few of the behaviors of quirks mode. The full behavior of the modes other than standards mode is largely undocumented and not interoperable.--Maciej Stachowiak 17:40, 3 April 2010 (UTC)
  • autocomplete attribute on <input type=hidden> -- has no effect, perhaps contrary to author expectations.
    • In the case of facebook it is autocomplete off; facebook makes extensive use of css and javascript, both of which can change the appearance. A good case can be made for a warning, but non-conforming?
      • I'm not sure what the CSS and JavaScript have to do with it - <input type=hidden> is never autofilled, even if you somehow make it render, which the Facebook page in question does not do. In general, particular <input> elements only allow the attributes that are applicable to their particular input subtype. Input types should be treated as effectively distinct elements. So <input type=hidden autocomplete=off> makes about as much sense as <div autocomplete=off>. It has no effect, and seems to indicate misunderstanding of the model. --Maciej Stachowiak 17:40, 3 April 2010 (UTC)
  • http-equiv X-UA-Compatible -- directly invokes non-interoperable behavior
    • And yet, the pages that use this are interoperable. In fact, these tags are a key part of the strategy used by these sites to obtain interoperability despite the existence of less than fully standards compliant browsers that are still in wide use.
      • In all the cases I am aware of, this mechanism is used to request less standards-compliant and less interoperable behavior (IE7 mode) on a browser that is capable of delivering more standards-compliant and more interoperable behavior (IE8). You are correct that it is typically part of a comprehensive strategy. The goal seems to be to avoid having to send IE8 down the standards code path (which may require extra work, since IE8 still has many divergences from standards/consensus behavior), or creating a separate IE8 code path.--Maciej Stachowiak 17:40, 3 April 2010 (UTC)
  • Unknown attributes and unknown elements - these have to be flagged to detect typos and to protect future extensibility of the language.
    • No question on typos, but people will invent attributes. We need to separate out what is reserved to HTML (example: names consisting entirely of alphabetic characters) and what will never be used by HTML. This has already been relaxed partially, and there are proposals for more.
      • Creating a distinguished syntax for extensions would indeed not be a problem. Most of the cases in the study do not fall into a readily identifiable pattern.--Maciej Stachowiak 17:40, 3 April 2010 (UTC)
  • empty value for a@target attribute - has no effect, likely contrary to author expectations
    • Might be exactly what the author expects. At most, merits a warning.
      • Why would an author deliberately add markup that has no effect? This is likely a bug at least some of the time it occurs, and does not have a valid use case, since it has no effect, so on the whole it seems better to flag it.--Maciej Stachowiak 17:40, 3 April 2010 (UTC)
  • elements after </body> - seems likely to be an error in context, will result in DOM that author does not expect, not interoperable with UAs that have limited error handling
    • These pages appear to be rendered as the author intended. Calling all such cases an error is likely overreaching.
      • My baseline assumption is that any time an element gets reparented relative to what appears in the markup, it is an error. That's so even if specific cases render as the author intended. In fact, the reason all of the reparenting rules exist in the first place is to render documents as the author intended. However, such cases are typically not interoperable in legacy UAs, and not interoperable with limited error handling or streaming UAs. </body><script> may be a little less surprising in its effects than <i>foo<b><bar</i>baz</b>, but the basic principle is the same.--Maciej Stachowiak 17:49, 3 April 2010 (UTC)
  • Unscoped <style> inside <body> - will result in unexpectedly bad performance, since it causes a style recalc of the full page when it may have already been incrementally rendered.
    • Seems to render acceptably in at least some browsers, at most this should be a warning.
  • Whitespace in name attribute - likely results in unexpected behavior, since name is treated as a single token with no whitespace.
    • "likely"? HTML5 needs to be interoperable even in the face of bad markup. And if this is something used in high profile sites, and works today, it needs to be supported.
      • Agree on "supported", but "supported" is not the same as "allowed".--Maciej Stachowiak 17:40, 3 April 2010 (UTC)
  • Whitespace in the middle of URLs - likely indicates an authoring error and/or will give unexpected results.
    • Worth further discussion. If it behaves interoperably and is useful, people will do it. I suggest a warning.
  • object element used to load plugin via classid/codebase instead of data/type - triggers non-interoperable behavior by requesting a specific piece of code by name, rather than code to handle particular content
    • Useful as a fallback
      • Loking closer, in the specific example in the study, there is a nested <embed> that does have a type attribute. Since nested object/embed is treated specially by most browsers, this particular pattern at least should be allowed.--Maciej Stachowiak 17:49, 3 April 2010 (UTC)
  • Bad <script type> value
    • the value is "text/javascript"
      • No, the bad value was "Javascript", not "text/javascript" (on microsoft.com). "Javascript" is not a valid MIME type.--Maciej Stachowiak 17:40, 3 April 2010 (UTC)
  • language attribute on script element with value other than "JavaScript"
    • appears on lines where the value is "javascript". Claims this attribute is obsolete.
      • No, the specific case of this that is an error had a value of "Javascript1.1" (on amazon.com). The spec makes this attribute obsolete, but allows the specific value of "Javascript" as conforming, but with a warning. I could imagine changing the validator error message or else changing the spec to make the allowed case not be a warning.--Maciej Stachowiak 17:49, 3 April 2010 (UTC)
  • Bogus image map
    • I can't find this one. An image map appears on amazon.com, ebay.com, and yahoo.co.jp, each of which uses some presentational attributes, in one case the attribute is mispelled.
      • On cnn.com there is an img element with a usemap attribute pointing to a nonexisting <map>--Maciej Stachowiak 17:49, 3 April 2010 (UTC)
  • bad comment syntax (--->) - interop hazard with legacy software, XML serialization issue
    • is and will be widely ignored: usefulness of consecutive dashes in comments outweighs theoretical usage; validators won't protect those that wish to serialize with XML as this content will continue to exist; justifies a warning.
      • All the examples in the study used three dashes instead of two to close the comment, which in context seems like a typo rather than something useful.--Maciej Stachowiak 17:40, 3 April 2010 (UTC)
  • Unknown element names - likely author error
    • Need to look at on a case by case basis. The ones I can readily find are cases like "wbr" which are unknown to the standard, but apparently useful.
      • This wasn't referring to presentational elements with a specific effect - I classed all those as "presentational" even if the validator claimed they were unknown. The specific example of an unknown element is <n> on yahoo.co.jp.--Maciej Stachowiak 17:40, 3 April 2010 (UTC)
  • <script defer="true"> (value should just be "defer")
    • Does it work interoperably? If so, a warning may be in order.
      • The spec makes it an error to use values like "true" or "yes" or "on" for boolean attributes that actually take effect through mere presence, to avoid confusion about what "false", "no" or "off" would do. This is explained in the Conformance Requirements for Authors section. --Maciej Stachowiak 18:42, 3 April 2010 (UTC)
  • <meta> Content-Type claiming an XML document is text/html
    • Content is served as text/html
      • This error is no longer flagged on the site in question (w3.org). I believe that at the time the study was done originally, the page was served as XML to the validator.--Maciej Stachowiak 17:51, 3 April 2010 (UTC)
  • <acronym> - will save people time not to wonder whether to use this or abbr
    • Sounds like at most a suggestion.

Conformance Errors with Unclear Value

  • Other presentational table attributes (besides that three that have a bug already). It seems that these typlically do not harm accessibility, and depending on circumstances may not be less compact than the alternative.
  • Presentational attributes on the body element. These seem unlikely to create accessibility problems. They can also only possibly occur once per page, and so omitting them does not save compactness.
  • Presentational attributes on iframe (frameborder, scrolling).
  • <center> and <font> elements.
  • Unknown <meta http-equiv> (content-style-type, content-script-type)
  • img align / input align
  • http-equiv Expires
  • http-equiv Pragma no-cache
  • <area shape="rectangle"> instead of "rect"
  • funky meta refresh values - "0;url=http://www.ebay.com/?_js=OFF" "1800;url=?refresh=1"
  • abbr on th
  • nohref on area

Alexa Top 10

google.com

  • Declared doctype: HTML5
  • Validates as declared doctype: no (54 errors)
  • Validates as HTML5: no (54 errors)
  • Breakdown of errors validating as HTML5:
    • Unescaped & in URL attribute - 11
    • Presentational attribute or element - 42
      • Attributes: body: (topmargin, marginheight, bgcolor, text, link, vlink, alink), div: (width), br: (clear), table: (cellpadding, cellspacing, border), td: (align, valign, width, nowrap)
      • Elements: nobr, center, font
    • Unclosed tag (apparently <center>, the validator error is unhelpful) - 1

facebook.com

  • Declared doctype: XHTML 1.0 Strict
  • Validates as declared doctype: no (41 errors)
  • Validates as HTML5: no (22 errors)
  • Validates as XHTML5: no (4 errors)
  • Breakdown of errors validating as HTML5:
    • autocomplete attribute on <select> element - 2
    • autocomplete attribute on <input type=hidden> - 11
    • http-equiv X-UA-Compatible - 1
    • Content model errors - 1 (div inside ul)
    • Duplicate id - 3
    • Presentational elements and attributes - 4
      • Attributes: table: (cellpadding, cellspacing, border)

Note: the autocomplete attribute is allowed on <input> elements by HTML5, but not on <input type=hidden> or <select>. It seems like a bug in the XHTML5 mode of the validator that it doesn't flag these errors. Also it's not clear if those restrictions are desirable.

yahoo.com

  • Declared doctype: HTML 4.01 Strict
  • Validates as declared doctype: no (154 errors)
  • Validates as HTML5: no (189 errors)
  • Breakdown of errors validating as HTML5:
    • Unescaped & in URL attribute - 87
    • Various errors about the bad attributes on <button http://www.yahoo.com class="pa-btn-open hide-textindent"> - 76
    • Custom attributes (modid) - 19
    • Custom attribute with hyphen / possible extension point (y-pkgid) - 4
    • Stray close tag (</li>) - 2
    • Content model error (<div> as a child of <h2> - headers only allow phrasing content) - 1

Note: The markup <button http://www.yahoo.com class="pa-btn-open hide-textindent">does not appear in the page when loaded in a browser, but does get returned to the validator.

It's unclear why this page uses invented attributes modid and y-pkgid in some cases, but data-* attributes in other cases.

youtube.com

  • Declared doctype: HTML 4.01 Transitional
  • Validates as declared doctype: no (265 errors)
  • Validates as HTML5: no (289 errors)
  • Breakdown of errors validating as HTML5:
    • "Almost standards" mode doctype - 1
    • Unescaped & in URL attribute - 160
    • Nested interactive elements (<button> as a child of <a>) - 24
    • Custom attributes: img: (image, qlicon, ql, thumb), button: (ql) - 62
    • Duplicate id- 9
    • Empty id - 2
    • Content model error (<div> as a child of <span>) - 16
    • Presentational elements and attributes - 15
      • Elements: nobr, wbr

live.com

  • Declared doctype: XHTML 1.0 Transitional
  • Validates as declared doctype: yes
  • Validates as HTML5: no (2 errors)
  • Validates as XHTML5: no (1 error)
  • Breakdown of errors validating as HTML5:
    • "Almost standards" mode doctype - 1
    • Unrecognized attribute - 1 (xmlns:Web on <html>)

Note: the Web prefix declared here is not used on the page.

Here is some prior data on Web pages containing an xmlns prefix declaration: http://philip.html5.org/data/xmlns-attributes.txt

Here is the raw data: http://philip.html5.org/data/xmlns-attributes-raw.txt.bz2

Studying the raw data reveals that about 1.6% of pages (from a fairly large sample) include at least one xml namespace prefix declaration.

wikipedia.org

  • Declared doctype: HTML5
  • Validates as declared doctype: yes
  • Validates as HTML5: yes

blogger.com

  • Declared doctype: HTML 4.01 Strict
  • Validates as declared doctype: no (21 errors)
  • Validates as HTML5: no (3 errors)
  • Breakdown of errors validating as HTML5:
    • <a target=""> (specifically, an empty value for the target attribute) - 1
    • Presentational elements and attributes - 2
      • Attributes: iframe: (frameborder, scrolling)

baidu.com

  • Declared doctype: HTML5
  • Validates as declared doctype: no (4 errors)
  • Validates as HTML5: no (4 errors)
  • Breakdown of errors validating as HTML5:
    • Unescaped & in URL attribute - 2
    • Missing alt text - 1 (on <area>)
    • Content model errors - 1 (<script> after </body>)

msn.com

  • Declared doctype: XHTML 1.0 Strict
  • Validates as declared doctype: yes
  • Validates as HTML5: no (15 errors)
  • Validates as XHTML5: no (18 errors)
  • Breakdown of errors validating as HTML5:
    • http-equiv X-UA-Compatible - 1
    • Leading/Trailing whitespace around URL - 8
    • Duplicate id - 3
    • Obsolete markup (abbr on th) - 3

qq.com

  • Declared doctype: XHTML 1.0 Transitional
  • Validates as declared doctype: no (validator fails with encoding error)
  • Validates as HTML5: no (171 errors)
  • Validates as XHTML5: no (3 errors)
  • Breakdown of errors validating as HTML5:
    • Unescaped & in URL attribute - 83
    • name attribute used on h3, h4, div - 38
    • Content model errors - 13 (<div> inside <h3>, <h3> inside <span>, <div> inside <span>, <a> inside <a>)
    • Whitespace inside URL (query) - 6
    • Presentational elements and attributes - 3
      • Attributes: input: (align), img: (align)
    • Missing src on <img> - 3
    • Unscoped <style> inside <body> - 3
    • Missing alt text - 2 (on <area>)
    • Bad doctype - 1
    • http-equiv X-UA-Compatible - 1
    • defer on inline script - 1
    • Unrecognized attribute - 1 (smartpid on <input>)
    • No space between attributes - 1
    • Stray closing tag - 1 (</a>)
    • Several errors due to misnested tags, eventually including a fatal one; validator gave up

Alexa Top 11-20

yahoo.co.jp

  • Declared doctype: HTML 4.01 Transitional
  • Validates as declared doctype: no (27 errors)
  • Validates as HTML5: no (349 errors)
  • Breakdown of errors validating as HTML5:
    • "Almost standards" mode doctype - 1
    • Unknown <meta http-equiv> (content-style-type, content-script-type) - 2
    • Presentational elements and attributes - 341
      • Attributes: img: (align), td: (align, bgcolor, width), table: (bgcolor, border, cellpadding, cellspacing, width), hr: (size, width), body: (link, vlink)
      • Elements: font, center, nobr
    • Unescaped & in URL attribute - 2
    • Unknown attribute (widh) - 1
    • Bogus markup mixed with attributes <img src="..." width:120px;height:90px; border="0"> - 1
    • Unknown element (n) - 1

twitter.com

  • Declared doctype: XHTML 1.0 Strict
  • Validates as declared doctype: no (70 errors)
  • Validates as HTML5: no (22 errors)
  • Validates as XHTML5: no (22 errors)
  • Breakdown of errors validating as HTML5:
    • Unknown http-equiv value (imagetoolbar) - 1
    • Whitespace in name attribute on <a> - 20
    • Unknown attribute (data) on <div> - 1

google.co.in

  • Declared doctype: HTML5
  • Validates as declared doctype: no (54 errors)
  • Validates as HTML5: no (54 errors)
  • Breakdown of errors validating as HTML5:
    • Unescaped & in URL attribute - 11
    • Presentational attribute or element - 42
      • Attributes: body: (topmargin, marginheight, bgcolor, text, link, vlink, alink), div: (width), br: (clear), table: (cellpadding, cellspacing, border), td: (align, valign, width, nowrap)
      • Elements: nobr, center, font
    • Unclosed tag (apparently <center>, the validator error is unhelpful) - 1

Note: exact same errors as google.com.

sina.com.cn

  • Declared doctype: XHTML 1.0 Transitional
  • Validates as declared doctype: no (validator fails due to encoding error)
  • Validates as HTML5: no (over 960 errors, validator gave up due to too many errors)
  • Validates as XHTML5: no (validator fails due to encoding error)
  • Breakdown of errors validating as HTML5:
    • "Almost standards" mode doctype
    • urn attribute on a element - 12
    • stray semicolon mixed with attributes - 1
    • duplicate id - 4
    • Whitespace in the middle of a URL attribute - 4
    • Whitespace at boundaries of a URL attribute - 7
    • Unescaped & in URL attribute - 876
    • Inline <script defer> - 2
    • Nonstandard attribute (pos on a) - 9
    • Presentational markup - 20
      • Attributes: iframe: (scrolling, frameborder, marginheight, marginwidth), table: (width, border, cellspacing, cellpadding), td: (align), input: (align)
      • Elements: font
    • Unscoped style element outside head - 5
    • Nonstandard meta http-equiv (X-UA-Compatible) - 1
    • Missing space between attributes - 9
    • Unquoted equal sign (likely a syntax error in attributes) - 1
    • Bad comment syntax (three hyphens) - 1
    • Content model errors (div in span, script in ul, ul in ul) - 6

google.cn

  • Declared doctype: none (W3C validator assumes HTML 4.01 Transitional)
  • Validates as declared doctype: no (53 errors)
  • Validates as HTML5: no (42 errors)
  • Breakdown of errors validating as HTML5:
    • Unescaped & in URL attribute - 12
    • Presentational markup (same as other google main pages) - 28
    • Missing end tag (likely for <center>, hard to tell) - 1
    • Missing doctype

google.de

  • Declared doctype: HTML5
  • Validates as declared doctype: no (42 errors)
  • Validates as HTML5: no (42 errors)
  • Breakdown of errors validating as HTML5:
    • Unescaped & in URL attributes - 12
    • Presentational markup (see other google main pages) - 29
    • Missing end tag (likely <center>) - 1

wordpress.com

  • Declared doctype: XHTML 1.0 Transitional
  • Validates as declared doctype: yes
  • Validates as HTML5: no (3 errors)
  • Validates as XHTML5: no (7 errors)
  • Breakdown of errors validating as HTML5:
    • "Almost standards" mode doctype - 1
    • profile attribute on <head> - 1
    • charset attribute on inline <script> (it gets ignored in this case) - 1

myspace.com

  • Declared doctype: XHTML 1.0 Transitional
  • Validates as declared doctype: no (122 errors)
  • Validates as HTML5: no (44 errors)
  • Validates as XHTML5: no (9 errors, then validator dies with fatal error)
  • Breakdown of errors validating as HTML5:
    • "Almost standards" mode doctype - 1
    • name attribute used where it is not allowed (on <embed>), recommend id - 1
    • object element used to load plugin via classid/codebase instead of data/type - 3
    • duplicate id - 1
    • iframe with 0 width and height - 6 (this is a bug in the validator)
    • URL with leading/trailing space - 2
    • URL with interior space - 1
    • Stray close tag (</meta>) - 1
    • Unescaped & in URL attribute - 4
    • Presentational markup - 18
      • Attributes: iframe: (hspace, vspace, marginwidth, marginheight, frameborder, scrolling)
    • Nonstandard /non-interoperable attribute: allowTransparency on iframe - 3
    • Non-scoped <style> outside <head> - 8
    • Unknown <meta http-equiv> values - X-UA-Compatible, expires, Pragma - 3
    • Content model errors (div inside span) - 3

The validator seems to report an <iframe> with 0 with and height as an error, but this appears to be conforming: http://dev.w3.org/html5/spec/Overview.html#attr-dim-width

microsoft.com

  • Declared doctype: XHTML 1.0 Transitional
  • Validates as declared doctype: no (360 errors)
  • Validates as HTML5: no (270 errors)
  • Validates as XHTML5: no (2 errors, then fatal)
  • Breakdown of errors validating as HTML5:
    • "Almost standards" mode doctype - 1
    • <meta scheme> - 6
    • <img> element with no src - 1
    • <a target=""> (specifically, an empty value for the target attribute) - 1
    • Duplicate id - 4
    • Leading or trailing whitespace in URL attribtue - 7
    • bad <script type> value ("Javascript" - not a valid MIME type) - 1
    • Unescaped & in URL attribute - 3
    • Missing alt - <input type="image"> without alt attribute - 1
    • Unknown attribute (cid, xyz, cpgn) - 234
    • Presentational markup - 2
      • Elements: u
    • <meta> element outside <head> - 2
    • <meta> X-UA-Compatible - 1
    • <area shape="rectangle"> - 4

google.co.uk

  • Declared doctype: HTML5
  • Validates as declared doctype: no (42 errors)
  • Validates as HTML5: no (42 errors)
  • Breakdown of errors validating as HTML5:
    • Unescaped & in URL attributes - 12
    • Presentational markup (see other google main pages) - 29
    • Missing end tag (likely <center>) - 1

Notable Alexa Sites Outside the Top 20

amazon.com

  • Declared doctype: none (W3C validator assumes HTML 4.01 transitional)
  • Validates as declared doctype: no (1192 errors)
  • Validates as HTML5: no (over 980 errors - validator gives up)
  • Breakdown of errors validating as HTML5:
    • Missing doctype - 1
    • language attribute on script element with value other than "JavaScript" - 1
    • img element with no src - 6
    • stray close tag (div) - 1
    • Unescaped & in URL attribute - 862
    • Unknown attribute (url on img, allowtransparency on iframe) - 6
    • Presentational markup - 94
      • Attributes: table: (border cellspacing cellpadding width height), td: (align width valign height background), tr: (width), iframe (topmargin leftmargin frameborder scrolling marginheight marginwidth), img: (align), br (clear)
    • Non-scoped style outside head - 7
    • Self-closing syntax on non-void element (td) - 1

bing.com

  • Declared doctype: XHTML 1.0 Transitional
  • Validates as declared doctype: no (12 errors)
  • Validates as HTML5: 3 errors
  • Breakdown of errors validating as HTML5:
    • "Almost standards" mode doctype - 1
    • Presentational markup - 1
      • Attributes: td: (align)
    • Content model error: <p> inside <span> - 1

ebay.com

  • Declared doctype: HTML 4.01 Transitional
  • Validates as declared doctype: no (218 errors)
  • Validates as HTML5: no (218 errors)
  • Breakdown of errors validating as HTML5:
    • "Almost standards" mode doctype - 1
    • Miscellaneous obsolete markup - nohref attribute on <area> element - 1
    • Stray ASCII DEL character (U+007f) in markup -13
    • Duplicate ID - 2
    • Stray end tag (img) - 3
    • Table integrity (colspan=2 when no cells start in column 2) - 1
    • Missing alt (on area) - 2
    • Nonstandard attribute (_sp) - 88
    • Presentational markup - 99
      • Attributes: div: (align), table: (width, cellpadding, cellspacing, border), td: (width, valign, align, bgcolor), iframe: (frameborder, marginheight, marginwidth, scrolling)
    • Non-scoped style element outside head - 1
    • Content model error (div inside span) - 2
    • Disallowed meta refresh value ("0;url=http://www.ebay.com/?_js=OFF") - 1

linkedin.com

  • Declared doctype: HTML5
  • Validates as declared doctype: (13 errors)
  • Validates as HTML5: no (13 errors)
  • Breakdown of errors validating as HTML5:
    • Unknown attribute (defaulturl on form) - 1
    • http-equiv X-UA-Compatible - 1
    • Unescaped & in URL attribute - 9
    • Presentational markup - 2
      • Attributes: ol: (type), li: (type)

flickr.com

  • Declared doctype: HTML 4.01 Strict
  • Validates as declared doctype: no (38 errors)
  • Validates as HTML5: no (26 errors)
  • Breakdown of errors validating as HTML5:
    • Unescaped & in URL attribute - 16
    • Presentational markup - 10
      • Attributes: table: (border, cellpadding, cellspacing), iframe: (scrolling, frameborder, marginwidth, marginheight), td: (width)

craigslist.org

  • Declared doctype: HTML 4.01 Transitional
  • Validates as declared doctype: no (1 error)
  • Validates as HTML5: no (5 errors)
  • Breakdown of errors validating as HTML5:
    • Encoding error - internally claims to be Latin1 but is actually UTF-8 - 1
    • "Almost standards" mode doctype - 1
    • Parse error (invokes adoption agency - <p> in <table>) - 1
    • Presentational markup - 2
      • Element: <font>

cnn.com

  • Declared doctype: HTML 4.01 Transitional
  • Validates as declared doctype: no (60 errors)
  • Validates as HTML5: no (36 errors)
  • Breakdown of errors validating as HTML5:
    • "Almost standards" mode doctype -1
    • Use of name attribute instead of id (on <img>, not <a>) - 3
    • Bogus image map - 2
    • Unescaped & in URL attribute - 4
    • Presentational markup - 16
      • Attributes - div: (align), img: (hspace, vspace), iframe: (frameborder)
    • Non-scoped <style> element outside of <head> - 1
    • Syntax error in doctype (missing space) - 1
    • Content model errors (form in span, div in span, script in ul)
    • bad syntax in meta refresh <meta http-equiv="refresh" content="1800;url=?refresh=1"> - 1

orkut.com

  • Declared doctype: HTML 4.01 Transitional
  • Validates as declared doctype: no (4 errors)
  • Validates as HTML5: no (43 errors)
  • Breakdown of errors validating as HTML5:
    • Quirks mode doctype -1
    • Stray end tag (img) - 1
    • Unescaped & in URL attribute - 1
    • Presentational markup -37
      • attributes: table: (border, cellpadding, cellspacing, width), div: (align), td: (width, valign, nowrap)
      • elements: font
    • style element outside head - 2
    • content model error (<script> inside <tbody>) - 1

nytimes.com

  • Declared doctype: HTML 4.01 Transitional
  • Validates as declared doctype: no (287 errors)
  • Validates as HTML5: no (183 errors)
  • Breakdown of errors validating as HTML5:
    • "Almost standards" mode doctype - 1
    • Tag misnesting that will lead to reparenting (div in table) - 1
    • Duplicate id - 1
    • Misnested close tag (a tag - may invoke adoption agency) - 6
    • Unescaped & in URL attribute - 161
    • Span apparently being treated as image (has src, alt, width and height attributes, no content) - 20
    • Nonstandard attribute (articleid, overflowurl) - 4
    • Presentational markup - 11
      • Attributes: table: (cellspacing)
    • Non-scoped style outside head - 5
    • Disallowed http-equiv values (Pragma, Expires) - 2
    • Nonstandard element <align=right>[sic] - 1
    • Self-closing syntax used on <span> (probably invoking adoption agency) - 5

Notable Organizations Involved with HTML5

apache.org

  • Declared doctype: XHTML 1.0 Transitional
  • Validates as declared doctype: yes
  • Validates as HTML5: no (3 errors, dies on fatal error)
  • Breakdown of errors validating as HTML5:
    • "Almost standards" mode doctype - 1
    • Changing encoding midstream (results in a fatal error) - 1

apple.com

  • Declared doctype: HTML5
  • Validates as declared doctype: no (3 errors)
  • Validates as HTML5: no (3 errors)
  • Breakdown of errors validating as HTML5:
    • Presentational markup - 1
      • Attributes: iframe (frameborder)
    • Bad meta http-equiv values (pics-label, X-UA-Compatible) - 2

adobe.com

  • Declared doctype: XHTML 1.0 Transitional
  • Validates as declared doctype: no (24 errors)
  • Validates as HTML5: no (10 errors)
  • Breakdown of errors validating as HTML5:
    • charset specified on stylesheet links - 3
    • "Almost standards" mode doctype - 1
    • name attribute disallowed - 1 (looks like a validator bug)
    • Unescaped & in URL attribute - 1
    • Bad comment syntax (ends in --->) - 1
    • Content model errors (bad <dl> contents) - 3

chromium.org

  • Declared doctype: XHTML 1.0 Transitional
  • Validates as declared doctype: no (99 errors)
  • Validates as HTML5: no (32 errors)
  • Breakdown of errors validating as HTML5:
    • "Almost standards" mode doctype - 1
    • Bogus xmlns (http://www.google.com/ns/jotspot on body) - 1
    • Empty value for dir attribute - 1
    • <script defer="true"> (value should be empty or "defer") - 4
    • Unknown attribute (jotid, onpropertychange) - 8
    • Inline <script defer> - 2
    • Presentational markup - 13
      • Attributes: table (cellpadding, cellspacing, border, width), h3 (align), td (width, valign)

ibm.com

  • Declared doctype: XHTML 1.0 Strict
  • Validates as declared doctype: yes
  • Validates as HTML5: no (6 errors)
  • Breakdown of errors validating as HTML5:
    • <meta scheme> - 5
    • meta http-equiv pics-label - 1

mozilla.com

  • Declared doctype: HTML5
  • Validates as declared doctype: yes
  • Validates as HTML5: yes

opera.com

  • Declared doctype: XHTML 1.0 Strict
  • Validates as declared doctype: yes
  • Validates as HTML5: no (1 error)
  • Breakdown of errors validating as HTML5:
    • Use of <acronym> element - 1

w3.org

  • Declared doctype: XHTML 1.0 Strict
  • Validates as declared doctype: yes
  • Validates as HTML5: no (2 errors)
  • Breakdown of errors validating as HTML5:
    • Use of <acronym> element - 2
    • <meta http-equiv> setting Content-Type to text/html in XML - 1

webkit.org

  • Declared doctype: HTML5
  • Validates as declared doctype: yes
  • Validates as HTML5: yes

whatwg.org

  • Declared doctype: HTML5
  • Validates as declared doctype: yes
  • Validates as HTML5: yes