This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 18 - For conneg, allow choosing the Accept-* headers to send.
Summary: For conneg, allow choosing the Accept-* headers to send.
Status: RESOLVED FIXED
Alias: None
Product: Validator
Classification: Unclassified
Component: check (show other bugs)
Version: 0.6.0b1
Hardware: All All
: P2 enhancement
Target Milestone: 1.0
Assignee: Olivier Thereaux
QA Contact: qa-dev tracking
URL:
Whiteboard:
Keywords:
: 784 (view as bug list)
Depends on:
Blocks:
 
Reported: 2002-10-25 02:33 UTC by Terje Bless
Modified: 2008-06-15 16:20 UTC (History)
6 users (show)

See Also:


Attachments
patch to add accept-headers (4.82 KB, patch)
2006-05-25 22:08 UTC, Derek Young
Details

Description Terje Bless 2002-10-25 02:33:21 UTC
Reported by Christoph Päper:

For the matter of content negotiation it would be nice if one could choose
in <http://validator.w3.org:8001/detailed.html> the Accept (text/html vs.
application/xhtml+xml), Accept-Language and perhaps Accept-Encoding HTTP
headers sent by the validator.
Comment 1 Terje Bless 2002-10-25 21:20:35 UTC
Comments from Björn Höhrmann:

Since the user is likely trying to validate the document he would get,
the Validator could just tunnel the received Accept headers to the
remote host (don't forget to make sure the Accept:-line contains all
types supported by the validator and the Accept-Langage header contains
a * to ensure the user won't get a 406 response). Otherwise it would get
rather complicated to implement a form that allows for complex choices
in this regard. Just consider a resources beeing served as a dozen
different types (e.g., RSS, HTML, SVG 1.0, SVG 1.1, XHTML Basic 1.0,
XHTML 1.0 Strict, XHTML 1.1, some custom XHTML types, etc.)
Comment 2 orion suydam 2003-08-07 14:59:00 UTC
Although the tunneling feature suggested by Björn would be handy, I'd be more 
interested in being able to supply exact values for the Accept-* headers.  This 
way I can validate my WML, XHTML Mobile Profile, and CHTML output from my 
browser.
Comment 3 Bj 2004-06-02 14:28:01 UTC
*** Bug 784 has been marked as a duplicate of this bug. ***
Comment 4 Terje Bless 2004-09-01 16:47:10 UTC
Retarget 1.0. I don't think this will make it for 0.7.
Comment 5 gidyn 2005-09-01 02:41:32 UTC
A workaround for IE not handling application/xhtml+xml is to only send this MIME
type when the UA offers to accept it, otherwise text/html is sent. This means
that Mozilla browsers will be sent valid XHTML. However, the Validator will
fault, because it doesn't send the relevant Accept header, and is therefore
served the IE workaround.
Comment 6 Derek Young 2006-05-25 21:40:49 UTC
Ok.. considering this is a 3 1/2 year old bug.. 

When trying to use XHTML you will run into problems serving the correct MIME type to Internet Exploder users. There is a wonderful article at http://keystonewebsites.com/articles/mime_type.php that shows gets you started in the direction of correctly serving your documents as XHTML and HTML4.

Unfortuantly, by doing that you are unable to validate your XHTML pages as the validator is incapable of sending the accept headers.

Since no one really seems to be in a rush to correct this I went ahead and did it myself.. it may break stuff and be improper but I don't really care. 

Go to http://validator.w3.org/docs/install.html and install validator-0.7.2. When that is up and running download http://glimpse.onlineok.com/validator-accept.tgz with that you can either apply the patch inside or just untar it in your root validator directory and it will overwrite the correct files.

You can now ask for text/html pages or application/xhtml+xml pages. There was someone here earlier wanting to be able to ask for SMS type pages, it would be trivial to add support for different acception options by making changes to share/templates/en_US/popup_accept.tmpl and htdocs/accept-select.html

You can see this work at http://glimpse.onlineok.com/w3c-validator/ but please go easy on that since it is my desktop at work :)

W3C team can feel free to use my code, I don't care.. I just think this is kind of important and critical to be in the validator if you want people actually taking XHTML seriously.
Comment 7 Derek Young 2006-05-25 22:08:17 UTC
Created attachment 427 [details]
patch to add accept-headers
Comment 8 Olivier Thereaux 2006-08-30 02:40:45 UTC
(In reply to comment #7)
> Created an attachment (id=427) [edit]
> patch to add accept-headers

Thank you Derek for this patch. I think it has a few problems, however, which are hinted in other comments, namely, that text/html and application/xhtml+xml aren't the only media types accepted by this validator. I suppose the alternative suggested by orion in comment #2, of a free-field for the various Accept-* Headers. 

Whether or not it would be worth the extra volume of options (albeit on the advanced interface) is still not entirely clear to me.
Comment 9 Victor Engmark 2007-05-07 14:06:26 UTC
Good stuff; I'm having trouble validating a page with embedded SVG. In relation to this, it would be useful to have the following choices of which "Accept-" headers will be sent:
* None
* Current browser's (as found in HTTP request; useful for checking whether any single browser will balk)
* Text field(s) (useful to test e.g. language auto-detection)
Comment 10 Olivier Thereaux 2007-09-26 13:50:16 UTC
I am removing the blocker to Bug 785. 
IMHO, a default and an option are orthogonal. I would actually welcome a fix to Bug #785 as a much better solution, making this one moot. 
Comment 11 Olivier Thereaux 2007-09-27 10:20:36 UTC
After sitting with a colleague on the issue for a while today, we concluded that:
* it was interesting to provide users with a way to trigger format and language negotiation, by having the validator send custom Accept and Accept-Language headers.

* that the need for custom headers was rare, since most follow the good practice to give a specific URI to each representation of a negotiated resource. rare, but real, and limited to a few "experts", who would probably be OK reading the documentation and adding the parameter to validation URIs themselves.

As a result, I added the accept and accept-language parameters into the validator, and documented them in the user's manual. These headers are marked as experimental, but will be in the 0.8.2 release. There won't, for the time being, be any GUI for now, which is typically the case for very rare or experts-only options, such as output=n3, debug=1 etc. 
Comment 13 Olivier Thereaux 2007-09-27 10:34:14 UTC
(In reply to comment #12)
> see 
> http://lists.w3.org/Archives/Public/www-validator-cvs/2007Sep/0209.html
> and 
> http://lists.w3.org/Archives/Public/www-validator-cvs/2007Sep/0207.html

Patch tested and on its way for 0.8.2 release.
Comment 14 Dean Edridge 2007-09-28 06:07:40 UTC
(In reply to comment #11)
> After sitting with a colleague on the issue for a while today, we concluded
> that:
> * it was interesting to provide users with a way to trigger format and language
> negotiation, by having the validator send custom Accept and Accept-Language
> headers.
> 
> * that the need for custom headers was rare, since most follow the good
> practice to give a specific URI to each representation of a negotiated
> resource. rare, but real, and limited to a few "experts", who would probably......

Giving a specific URI for separate HTML and XHTML pages is not a good practise at all and has been clearly pointed out before. One would run into various problems such as Internet Explorer users browsing to XHTML pages. Having duplicate content issues would also arise and give problems with Search engines and usability. Not to mention the added hassles of maintaining two files.

Why is this such a big deal to fix? Does the W3C not want to encourage people to use XHTML with the correct mime type? Other validators send ACCEPT headers, why can't the W3C_validator?

Thanks
Dean 
Comment 15 Dean Edridge 2007-09-28 06:20:47 UTC
(In reply to comment #11)

> * that the need for custom headers was rare, since most follow the good
> practice to give a specific URI to each representation of a negotiated
> resource....
Really? the experts that I know of that actually *know how* to use XHTML in the real world do no such thing.

Even the W3c's home page uses content negotiation based on the user-agents capability's and uses the same URL for XHTML(application/xhtml+xml) and HTML(text/html).

Unfortunately (due to this bug #18) it validates with the (text/html) mime type and not the (application/xhtml+xml) that the home page sends my browser.

Dean
Comment 16 Olivier Thereaux 2007-09-28 07:00:33 UTC
(In reply to comment #15)
> (In reply to comment #11)
> 
> > * that the need for custom headers was rare, since most follow the good
> > practice to give a specific URI to each representation of a negotiated
> > resource....
> Really? the experts that I know of that actually *know how* to use XHTML in the
> real world do no such thing.

Dean, there is more to content-negotiation than the hack for IE and XHTML. :)

thanks
olivier
Comment 17 Dean Edridge 2007-09-28 09:11:18 UTC
> Dean, there is more to content-negotiation than the hack for IE and XHTML. :)
> 
> thanks
> olivier
> 
The only 'hack' in my content negotiation script is the line where I have to unnecessarily check for your user-agent as it fails to give my server an accept header.

if (stristr($_SERVER['HTTP_ACCEPT'], "application/xhtml+xml") ||

	/* the line below is an unnecessary hack :) */
	stristr($_SERVER["HTTP_USER_AGENT"], "W3C_Validator")) 
{
	$mime = "application/xhtml+xml";
}
else
{
	$mime = "text/html";
}
header("Content-Type: $mime; charset=utf-8");


Thanks
Dean 
Comment 18 Stian Oksavik 2008-01-06 18:11:54 UTC
It seems to me the W3C is promoting a double standard here, ignoring HTTP while demanding strict compliance with HTML/XHTML.

My pages, which are written in XHTML 1.1, now do the following:

If the HTTP_ACCEPT header indicates acceptance of application/xhtml+xml, I serve the page as XHTML 1.1.

If the HTTP_ACCEPT header indicates acceptance of text/html (but not application/xhtml+xml), I deliberately choose to violate spec by serving it as text/html. This is a workaround to allow the content to be displayed by broken browsers; however, I make users very aware of this by printing a warning that their user agent does not support XHTML and some content may not render properly.

If the HTTP_ACCEPT header contains neither of these, I return a 406 HTTP error. This seems the correct thing to do, since these user agents are demanding content in a form that my web server is unable to offer. In fact, although I'd have to go read the HTTP spec to confirm, it seems to me that by sending a blank (but present) HTTP_ACCEPT header, the W3C validator is stating that it will not accept *any* content, regardless of format.

Yes, the root cause here is IE (and lynx, and a few older browsers, but primarily IE) not support XHTML. Yes, I've complained to Microsoft. Microsoft states that they'd rather not support it at all than add half-ass support by treating XHTML as tag soup. While I commend them for THAT, I fail to understand why a company with Microsoft's resources can't come up with an XHTML parser in the span of half a decade.

But the fact that IE lacks support for XHTML is no excuse for the W3C to fail to validate my valid XHTML. The fact that I choose to serve SOME content to browsers such as IE *with* a warning should be irrelevant to the W3C, since the W3C validator is supposed to validate XHTML and therefore has no excuse to lie to the browser about supporting it in the first place.

I guess people attempting to validate my sites will have to live with the 406 for now. This is the best way I can see to comply with HTTP content negotiation rules without adding a special case for the W3C; adding exceptional handling for a standards body is something I refuse to do.
Comment 19 Olivier Thereaux 2008-01-06 22:12:20 UTC
(In reply to comment #18)
> If the HTTP_ACCEPT header indicates acceptance of text/html (but not
> application/xhtml+xml), I deliberately choose to violate spec by serving it as
> text/html.

That's your choice. did you really need XHTML 1.1? 

> This is a workaround to allow the content to be displayed by broken
> browsers; however, I make users very aware of this by printing a warning that
> their user agent does not support XHTML and some content may not render
> properly.

Interesting method. I don't know if all the people visiting your site really need to know about such technicalities, but at least you're trying to raise awareness.

> In fact, although I'd
> have to go read the HTTP spec to confirm, it seems to me that by sending a
> blank (but present) HTTP_ACCEPT header, the W3C validator is stating that it
> will not accept *any* content, regardless of format.

If no Accept header field is present, then it is assumed that the client accepts all media types.
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
 
> But the fact that IE lacks support for XHTML is no excuse for the W3C to fail
> to validate my valid XHTML. 

Please see http://validator.w3.org/docs/users.html#option-accept
Comment 20 Terje Bless 2008-01-07 06:00:04 UTC
[ Hei Stian, forresten :-) ]

(In reply to comment #18)
> It seems to me the W3C is promoting a double standard here, ignoring HTTP while
> demanding strict compliance with HTML/XHTML.

The Validator has always strived to implement the standards as closely as possible, and has on occasion preferred to follow the HTTP spec rather then a W3C issued Recommendation when the two have been in seeming conflict.

> [] it seems to me that by sending a blank (but present) HTTP_ACCEPT header, the
> W3C validator is stating that it will not accept *any* content, regardless of format.

If you can cite an authoritative argument based on RFC2616 that indicates the current Validator behavior is incorrect then that will be a quite weighty factor in determining how to deal with this issue.

As far as we've been aware so far, the issue in this bug is one of practical implementations for publishers attempting to both use application/xhtml+xml _and_ cater to users of UAs that do not support it; apart from the general desirability of exposing generic content negotiation features to the Validator's users.

We are not aware that the Validator's behaviour in this area is in violation of any standard (see e.g. the RFC2616 cite in Olivier's response in Comment #19). If your reading of the spec indicates otherwise then please do open a new bug detailing the issue!