Re: Parsing: Trailing garbage in doctype FPI (was: Re: Doctype usage data)

On Thursday 2008-05-22 20:28 -0700, L. David Baron wrote:
> On Friday 2008-05-23 03:19 +0000, Ian Hickson wrote:
> > On Mon, 3 Mar 2008, Simon Pieters wrote:
> > > > 
> > > > I've got some data about doctypes at 
> > > > http://philip.html5.org/data/doctypes.html (125K pages from dmoz.org) 
> > > > and http://philip.html5.org/data/doctypes-alexa.html (about 400 from 
> > > > Alexa's list). I'm not entirely sure what this could be useful for, 
> > > > but I'll point out a couple of things here.
> > > 
> > > [...] This means that Opera would break about 0.05% of pages of this 
> > > sample if we implemented HTML5 doctype switching, assuming that the 
> > > remaining pages I didn't look at were the same.
> 
> It looks (from the limited context in the email) that you're talking
> about making quirks-mode detection handle pages where the author has
> manually changed the "EN" in the FPI to match the language of the
> page content, or similar.
> 
> Are the data you present showing that pages with these broken
> DOCTYPEs break if they're not in quirks mode, or simply that pages
> have these broken doctypes?  It's a pretty significant difference.

Ah, it wasn't in the URLs quoted, but it was clear in
http://lists.w3.org/Archives/Public/public-html/2008Mar/0013.html
that the finding was really the former.

Given that, I don't object to this change, although I would
encourage being very hesitant to expand quirks mode to more pages.
I suppose it's a pretty small set, though.

(Does anybody have any data on which quirks pages (these, or quirks
mode pages in general) actually depend on?)

-David

-- 
L. David Baron                                 http://dbaron.org/
Mozilla Corporation                       http://www.mozilla.com/

Received on Friday, 23 May 2008 06:51:53 UTC