RE: conneg, HTTPbis, and generic resources (status check) from Larry Masinter on 2009-11-25 (www-tag@w3.org from November 2009)

From: Larry Masinter <masinter@adobe.com>
Date: Wed, 25 Nov 2009 07:50:13 -0800
To: Karl Dubost <karl+w3c@la-grange.net>, "julian.reschke@gmx.de" <julian.reschke@gmx.de>
CC: Dan Connolly <connolly@w3.org>, "www-tag@w3.org" <www-tag@w3.org>
Message-ID: <8B62A039C620904E92F1233570534C9B0118DC9EC398@nambx04.corp.adobe.com>
> I read in the tag minutes ... http://lists.w3.org/Archives/Public/www-tag/2009Aug/0067.html 

    masinter: That was the original conception when ....
  
I said more, but I talk faster than anyone can reasonably keep up with. The "..." ellipses were paragraphs. But was there some issue with what was minuted?

# Content Negotiation on languages

Yes, using content negotiation for language selection interferes with page ranking. Presumably you mean "by Google". Whether the same is true for any other search engine is unclear, but I suppose other search engines are forced to reverse engineer Google's page ranking in the same way that browsers are forced to reverse engineer Internet Explorer.  Should there be standards specifying what parts of URIs search engines should or should not pay attention to, or whether they should index content-negotiated pages? 

#    Vary: Accent-Language

You noted that the "Vary" header is unlikely to help in indexing.  But the "Vary:" header was never intended or designed to have any effect on search engines. IN particular, server-driven content negotiation (using request-headers + Vary) is unsuitable by itself for search indexers during crawling because it doesn't reveal the full list of alternatives available.  I don't think the "search index crawling" use case was even considered during the design of content negotiation, but I would imagine that agent-driven negotiation would be the preferred way of letting crawlers decide which alternatives might be crawled independently (pages in different languages: yes; images in different sizes: no; text content in different charsets: maybe).

# Accept header of User agents

You note that "accept" headers cannot be relied on to filter requests based on misconfiguration. I'm not sure why it is a problem that a header designed to solve one problem isn't useful for another. No client is required to send *any* Accept: headers. Could you use "referer" or "origin" for filtering instead? 

Larry
--
http://larry.masinter.net

-----Original Message-----
From: www-tag-request@w3.org [mailto:www-tag-request@w3.org] On Behalf Of Karl Dubost
Sent: Wednesday, November 25, 2009 6:50 AM
To: julian.reschke@gmx.de
Cc: Dan Connolly; www-tag@w3.org
Subject: Re: conneg, HTTPbis, and generic resources (status check)


Le 25 nov. 2009 à 03:43, Julian Reschke a écrit :
> The only reason why no change hasn't been made yet is that it's not totally clear how the proposed text (which looks good) is to be integrated into the current draft (more precisely, what parts of <http://greenbytes.de/tech/webdav/draft-ietf-httpbis-p3-payload-08.html#rfc.section.4> it's meant to replace).

I read in the tag minutes

    On Sun, 30 Aug 2009 05:20:49 GMT
    In Draft minutes of TAG teleconference of 13 August 2009 from noah_mendelsohn@us.ibm.com on 2009-08-27 (www-tag@w3.org from August 2009)
    At http://lists.w3.org/Archives/Public/www-tag/2009Aug/0067.html

    masinter: That was the original conception when 
    conneg was introduced years ago. In practice, it's 
    now used for lots of other things. ... Sometimes, 
    for example, CSS media queries is used to select 
    best rep. That's not in HTTP. ... We seem to be 
    moving toward "HTTP is used for transporting 
    content; if there's variability desired, the 
    initial bit of content is used to make subsequent 
    decision." 
    
Let me share practical experiences with Content Negotiation in a business context (Web agency working for clients).



# Content Negotiation on languages

Accept-Language
http://greenbytes.de/tech/webdav/draft-ietf-httpbis-p3-payload-08.html#header.accept-language

HTTP gives a mechanism for negotiating the language of the ressource upon the Accept-Language parameter. In terms of usability, it seems a good thing, because it helps sharing one URI across different cultures. In terms of SEO (Search Engine Optimization), it is considered a bad practice. The terms in URIs spelling and the content of the page are indexed and are important for a better ranking (view of the market). Search engine indexing bots seem not able to index all individual representations of a resource, so instead of relying on

    http://example.com/my-unique-link     (fr, en)

Web agencies will choose to do:

    http://example.com/fr/mon-lien-unique  (fr)
    http://example.com/en/my-unique-link   (en)

I'm not sure that the Vary header would solve anything. I have not tested it against search engines indexing bots. Would someone have details about that?

    Vary: Accent-Language

Without counting the issues of IE
http://crisp.tweakblogs.net/blog/311/internet-explorer-and-cacheing-beware-of-the-vary.html
As well Mark Nottingham tests
http://www.mnot.net/blog/2007/06/20/proxy_caching#comment-2989



# Accept header of User agents

Practical experience again. Two sites, a first site (BigOne) with a very high trafic and a second site (SmallOne) not optimized for the trafic BigOne receives.

Web developers of BigOne puts an image tag by mistake

    <img src="http://smallone.example.com/boo" alt="logo"/>

The resource http://smallone.example.com/boo is not an image but an html file served with text/html. BigOne killed SmallOne. 

The browser is requesting a resource parsed from an img element, we can expect the accept header must be something of the type: 

    Accept: image/*

And then block the requests of BigOne with a "406 Not Acceptable". We only have text/html for this resource. We implemented it in a test version of SmallOne.

It worked perfectly with Firefox 3.5 which sends this when requesting images:

    Accept: image/png,image/*;q=0.8,*/*;q=0.5

Unfortunately, Opera sends:

    Accept: text/html, application/xml;q=0.9, application/xhtml+xml, application/x-obml2d, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1

And IE, webkit seem to send:

    Accept: */*

No luck.



-- 
Karl Dubost
Montréal, QC, Canada
http://www.la-grange.net/karl/
Received on Wednesday, 25 November 2009 15:51:01 UTC