Re: comments on draft-barth-mime-sniffing from Adam Barth on 2009-05-31 (ietf-http-wg@w3.org from April to June 2009)

From: Adam Barth <w3c@adambarth.com>
Date: Sun, 31 May 2009 13:18:33 -0700
To: Larry Masinter <masinter@adobe.com>
Cc: "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-ID: <7789133a0905311318p744c2f3m375c95ce7b4d7850@mail.gmail.com>

On Sun, May 31, 2009 at 10:22 AM, Larry Masinter <masinter@adobe.com> wrote:
> In reply to:
>
> On Sat, May 30, 2009 at 6:06 PM, Larry Masinter <masinter@adobe.com> wrote:
>>> 1.  Isn't the incidence of server misconfiguration far less than 10 years
>>> ago? Can you provide more evidence that this problem is as significant as it
>>> once was?
>
> Adam replied:
>
>> I don't have any data to compare current server behavior with
>> historical server behavior, but sniffing is required to process
>> approximately 1% of HTTP responses correctly.
>
> In the interest of monitoring this, and possibly removing content
> type sniffing in the future, is it possible to publish (and reference)
> the methodology used?

The methodology is described in
http://www.adambarth.com/papers/2009/barth-caballero-song.pdf, which
has been published in archival form in the IEEE Symposium on Security
and Privacy.  I suppose I can add a reference if you'd find that
helpful.

> I heard at least one question that this number might be inflated by
> HTML pages that were intentionally labeled as text/plain.

This is not the case because the 1% figure excludes HTTP responses
with a Content-Type header of text/plain.

> Also, the
> proportion of mislabeled HTTP responses from the searchable Internet
> may be different from the HTTP responses from the "private" Internet
> behind firewalls.

We got this data from users who have opted in to sharing anonymous
usage statistics in Google Chrome.  Thus, these figures are
representitive of both the searchable and private Internet.  The
figures are also weighed by popularity.  That's why I quote the figure
as percent of HTTP responses, not as percent of HTTP resources.

> advise
> specific classes of user agents that they MAY wish to follow
> this behavior IF they wish to continue to be compatible with
> the deployed infrastructure, to the extent that there remains
> a significant proportion of the deployed Internet of concern
> to clients.

I agree that we shouldn't mandate that user agents sniff.  However, I
do think that IF a user agent does decide to sniff, then they we
should mandate that they using this algorithm to avoid the security
and compatibility nightmare of umpteen different sniffing algorithms.

> I see no justification whatsoever for allowing conforming user
> agents to sniff types for new elements such as <video>, or
> encouraging such behavior, which is just opening the door
> for whole other categories of spoofing.  Certainly this
> isn't represented by any deployed infrastructure.

The current draft doesn't take a position on this issue.  Is there
something you'd like changed in the draft pursuant to the above?

Adam

Received on Sunday, 31 May 2009 20:19:32 UTC