This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
Since older versions of the Markup Validator, I've been facing a big problem. Again, in this beta 0.8, the problem remains. The validator works fine both via "File Upload" and "Direct Input", but FAILS when fired by referer ("uri=?referer"). It also fails the complete URL is passed as parameter. The complete error message: ------------------------------------------------- This page is not Valid (no Doctype found)! Error Line 1 column 0: end of document in prolog. ------------------------------------------------- My XHTML 1.0 Transitional page is hosted on a site that redirects to a IP-based one. Wheter I enter the URL of the first one (domain-based) or the IP-based (using my current IP), the validation fails. However, the validator at http://www.validome.org is able to parse the doc via URL Parameter, without any problem. See: BOTH FAIL: http://validator.w3.org/check?uri=http://athrasoft.com/en/Home.htm http://validator-test.w3.org/check?uri=http://athrasoft.com/en/Home.htm BOTH WORK FINE: http://www.validome.org/validate/?uri=http://athrasoft.com/en/Home.htm http://www.validome.org/get/http://athrasoft.com/en/Home.htm The redirection is correctly managed by W3C's validator, but the result is always "No DocType Found". Even if I enter the IP-based URL as the target, I got the error. My DocType is correct, as well as the whole page. Any clue? Thanks in advance. Pauo França http://www.athrasoft.com
Just dicovered this also works fine: http://www.validome.org/referer So why doesn't W3C's? The link above is now on every page in my site. Just fire it from there and see it's working. This proves there's no problem with my DocType, nor with calling the validator by referer. What happens to W3C's validator when checking a URL? Thanks. Paulo França
(In reply to comment #0) > The validator works fine both via "File Upload" and "Direct Input", but FAILS > when fired by referer ("uri=?referer"). It also fails the complete URL is > passed as parameter. > > The complete error message: > ------------------------------------------------- > This page is not Valid (no Doctype found)! > Error Line 1 column 0: end of document in prolog. > ------------------------------------------------- Apparently this has nothing to do with the document type. If you turn on the "show page source" option, you will see that the validator is given an empty document to validate (and hence, does not find a doctype - or any content for that matter). I traced the issue down to libwww-perl, the (usually extremely rubust) library the validator uses to retrieve documents on the web. When using this library, one gets empty content back from your server. Not sure what is going on between LWP and your server, but this is a very rare case. Possibly, the fact that your server is sending two "Content-Type" HTTP headers could be confusing LWP, but I'm not sure about it yet.
Hi, Olivier. > Apparently this has nothing to do with the document type. If you turn on the > "show page source" option, you will see that the validator is given an empty > document to validate (and hence, does not find a doctype - or any content for > that matter). Yes, I noticed that. \\:^. --o-o-o-o-o-o-o-- > Not sure what is going on between LWP and your server, but this is a very rare > case. Possibly, the fact that your server is sending two "Content-Type" HTTP > headers could be confusing LWP, but I'm not sure about it yet. The server has been set up and is maintained by myself, so I'm free to do any modifications needed to solve the issue. As for the headers, I tested them by using some online header-readers and saw nothing unusual. If you see a page about browser compliance, there's a screenshot available that prove the pages are correctly being rendered by a number of browsers: http://www.athrasoft.com/en/Compliance.htm I have set up the HTTP Headers (on IIS/Win2k) so that it serves two different ones: 1) UTF-8 for the forums root folder, otherwise the phpbb 3 messes up with accented characthers. This is working fine this way. 2) Windows-1252 (the same as in page's meta statements), for English/Portuguese pages on the web site.Also working fine. I tested W3Cs validator with several by serving the pages with different HTTP Headers (UTF-8, ISO-8859-1, etc) and also with none at all - nothing works. I am yet to try reading the headers from a Delphi program for wich I got the source code - ICS from François Piette. I'm going to change one of its sample projects so that it traces the read of server headers. I'll let you know if I discover something weird (as the duplicate header you mentioned). For now, thanks for replying so fast. Very good this bug tracker! Best regards, Paulo França
(In reply to comment #3) > I am yet to try reading the headers from a Delphi program for wich I got the > source code - ICS from François Piette. I'm going to change one of its sample > projects so that it traces the read of server headers. I'll let you know if I > discover something weird (as the duplicate header you mentioned). Ok, I have just done the test. This is the header data received by the tool from my server: ---------------------------------------------------------------------- HTTP/1.1 200 OK Server: Microsoft-IIS/5.0 Content-Type: text/html; charset=Windows-1252 Cache-Control: no-cache Expires: Sun, 27 May 2007 07:41:22 GMT Date: Sun, 27 May 2007 07:41:22 GMT Content-Type: text/html Accept-Ranges: bytes Last-Modified: Sat, 26 May 2007 20:28:28 GMT ETag: "0fef96ad49fc71:892" Content-Length: 15633 StatusCode = 200 ---------------------------------------------------------------------- Ok, the "Content-Type" was partially duplicated, so I changed IIS so that it does not send that header entry. As result, the "duplication" is gone, and only the "Content-Type: text/html" entry is returned (although not sent by IIS). Even though, W3C's validator KEEPS going crazy, and Validome too starts returning an error. Then I changed back to the original HTTP Header at server. At least Validome works fine that way. If the partially duplicated "Content-Type" was the cause, it would have worked when I remove it, but it hasn't. Back to point zero again. \\:^)
Your server is borked. Apart from what's evident in the trace below, the server also timeouts way too quickly on input. I'd venture a guess that your server is either running custom IIS plugins or is deferring far too much to homebrew CGI or ASP type code. [[[ $ telnet athrasoft.com http Trying 216.98.141.250... Connected to athrasoft.com. Escape character is '^]'. GET /en/Home.htm HTTP/1.0 Host: athrasoft.com HTTP/1.0 301 Found Server: Apache Status: 301 Found Expires: Mon, 28 May 2007 19:25:08 GMT Date: Sun, 27 May 2007 19:25:08 GMT location: http://200.222.134.220:8081/en/Home.htm Connection closed by foreign host. $ telnet 200.222.134.220 8081 Trying 200.222.134.220... Connected to 200222134220.user.veloxzone.com.br. Escape character is '^]'. GET /en/Home.htm HTTP/1.0 Host: athrasoft.com HTTP/1.1 200 OK Server: Microsoft-IIS/5.0 Content-Type: text/html; charset=Windows-1252 Cache-Control: no-cache Expires: Sun, 27 May 2007 19:24:58 GMT Date: Sun, 27 May 2007 19:24:58 GMT Content-Type: text/html Accept-Ranges: bytes Last-Modified: Sat, 26 May 2007 20:28:28 GMT ETag: "0fef96ad49fc71:893" Content-Length: 15633 Connection closed by foreign host. $ ]]]
(In reply to comment #5) > Your server is borked. First of all, my server's IP has changed now to 201.79.145.166 , but at that time it was 200.222.134.220 . Just took this picture (before changing the ip): http://img518.imageshack.us/img518/4713/validomeau5.png Sincerily, I can't see anything wrong in the trace you post. If you use the domain-based url, it leads you to my IP-based server. athrasoft.com = 216.98.141.250:80 (domain monger redirector) -> 200.222.134.220:8081 (my IIS/5.0 server) And your telnet session has reached my server normally, and has received the server's header as any other header reader currently does. The W3C's validator is correctly following the redirection. Thi sis not the problem. Even if you type in the final url ( http://200.222.134.220:8081/en/Home.htm ) directly in validator's uri field, the validator fails (W3C's, not Validome). I didn't see any evidence of the "bork" in the trace you post. > $ telnet athrasoft.com http > Trying 216.98.141.250... > Connected to athrasoft.com. > Escape character is '^]'. > GET /en/Home.htm HTTP/1.0 > Host: athrasoft.com > > HTTP/1.0 301 Found > Server: Apache > Status: 301 Found > Expires: Mon, 28 May 2007 19:25:08 GMT > Date: Sun, 27 May 2007 19:25:08 GMT > location: http://200.222.134.220:8081/en/Home.htm So at this point the redirection completed fine. > $ telnet 200.222.134.220 8081 > Trying 200.222.134.220... > Connected to 200222134220.user.veloxzone.com.br. > Escape character is '^]'. > GET /en/Home.htm HTTP/1.0 > Host: athrasoft.com > > HTTP/1.1 200 OK It says "OK", and "200222134220.user.veloxzone.com.br" corresponds to my server's ip (at that time). > HTTP/1.1 200 OK > Server: Microsoft-IIS/5.0 > Content-Type: text/html; charset=Windows-1252 > Cache-Control: no-cache > Expires: Sun, 27 May 2007 19:24:58 GMT > Date: Sun, 27 May 2007 19:24:58 GMT > Content-Type: text/html > Accept-Ranges: bytes > Last-Modified: Sat, 26 May 2007 20:28:28 GMT > ETag: "0fef96ad49fc71:893" > Content-Length: 15633 > > Connection closed by foreign host. So the whole header has been reached. > the server also timeouts way too quickly on input. In fact, doing a tracerout from some locations it takes some time to complete, and in some cases the tracerout is aborted. It seems the problem is really with a botleneck somewhere in the path. Validome takes 5-8 seconds to show the report; maybe its closer to Brazil, dunno. > I'd venture a guess that your server is either running custom > IIS plugins or is deferring far too much to homebrew CGI or ASP > type code. No plugin, no ASP at all, just serving plain html pages. thanks for your effort, Olivier. I think we have found the problem: the short timeout from W3C's validator. I remember day ago to be able to tracerout my server from USA, but today it's really being a pain. I'll try again inthe next days to see if something changes in this regard. Thanks again.
Just a note: W2C's CSS Validator is validating my CSS (in the same "/en/Home.htm" page) quickly and via URI. The same for the Link Checker: working fine! The CSS and link validators are hosted in a subdomain of W3C, so my previous guess about the tracerout was wrong! WORKING: http://jigsaw.w3.org/css-validator/validator?warning=no&profile=css3&usermedium=all&uri=http://athrasoft.com/en/Home.htm http://validator.w3.org/checklink?hide_redirects=on&hide_type=all&check=Check&uri=http://athrasoft.com/en/Home.htm See? There's something wrong with the Markup Validator, indeed.
(In reply to comment #6) >>Connection closed by foreign host. > > So the whole header has been reached. Actually, no, that's kind of the point... :-) The server returns the headers, but not the actual content; in spite of emitting something that looks plausible in the Content-Length header field. At that point in the output I should have gotten pretty much the same thing you'd see if you did View Source in a browser, and not Connection closed by foreign host.
(In reply to comment #8) > The server returns the headers, but not the actual content... Could you please test it again? At the time I read your previous post, I looked at IIS and realised I would have to restart it (this sometimes happens to IIS, 1 or 2 times a week). After doing so, the server got back to its normal state. Despite your next tests (in case you are willing to do so), the fact is the same W3C is able to read my contents (CSS and all links), so how come the contents cannot be read? How is W3C's Link Checker able to trace every link on my pages without reading their contents first?! \\8^* Not to mention that, besides W3C's CSS and Link Checker, Validome too reads my page's contents (as do as my site visitors). \\;^)
As I mentioned in my latest post about Link Checker... http://img253.imageshack.us/img253/7476/linkcheckerdv4.png (just took the snapshot) So Link Checker is able to read my contents but the Makup Validator is not?! \\:^.
(In reply to comment #9) > Could you please test it again? Sure. I checked a little more in-depth and it looks like the difference is caused by the value of the 'Connection' HTTP header. When it's 'close' the server fails to return a body, when the value is 'keep-alive' it performs as expected. In the below trace I have elided (marked with []) irrelevant bits for clarity: [[[ $ telnet 201.79.145.166 8081 GET /en/Home.htm HTTP/1.0 Host: athrasoft.com Connection: close HTTP/1.1 200 OK [] Content-Length: 15617 Connection closed by foreign host. $ telnet 201.79.145.166 8081 GET /en/Home.htm HTTP/1.0 Host: athrasoft.com Connection: keep-alive HTTP/1.1 200 OK [] Content-Length: 15617 <!DOCTYPE html []> [] <title> AthraSoft Components • Home of SmartPlugin </title> [] ]]]
(In reply to comment #11) > Sure. I checked a little more in-depth and it looks like the difference is > caused by the value of the 'Connection' HTTP header. When it's 'close' the > server fails to return a body, when the value is 'keep-alive' it performs as > expected. Thanks, Terje. In fact, the "keep-alive" is crucial for the redirection to succeed. However, even though it's turned on, W3C's Markup Validator keeps saying no doctype has been found (and even when typing in the IP uri to dismiss the redirection). Duh! It's a pity the Markup Validator does not use telnet. \\;^) Only those guys who developed the markup, link and css validators are able to determine why in earth the markup validator is unable to read my contents. Such "feature" has been around for years! Thank you for your good will.
(In reply to comment #12) > In fact, the "keep-alive" is crucial for the redirection to succeed. Just to make sure I'm being clear... That the server is behaving differently by not returning the document data when the Connection header field is set to 'close', is a bug in the server and not a problem with the Validator. Both of the test cases I quoted should have behaved identically up to that point. HTTP Keep-Alive should only have had any effect _after_ the full document was returned to the client. > Only those guys who developed the markup, link and css validators are able to > determine why in earth the markup validator is unable to read my contents. The Markup Validator, quite correctly, simply happens to not make use of the (optional) 'Keep-Alive' feature of HTTP, and, it would appear, the two other tools cited happen to make use of it. That the server behaves differently for the two cases is a bug incorrect behavior in the server (IIS 5.0).
(In reply to comment #13) > ... > The Markup Validator, quite correctly, simply happens to not make use of the > (optional) 'Keep-Alive' feature of HTTP, and, it would appear, the two other > tools cited happen to make use of it. That the server behaves differently for > the two cases is a bug incorrect behavior in the server (IIS 5.0). Ok, I got it now. But if the other two tools make use of the (optional) "keep-alive" feature, why doesn't the Markup Validator do the same? This would solve the problem not only for me, but for all other sites using IIS 5. The CSS Validator loads my contents fast, so using that feature wouldn't be that time-consuming. Would it be too hard to change that in the validator code? Notice I am comparing three tools from W3C, where two of them make use of the keep-alive and one doesn't. If this one that doesn't is facing problems, then it should do it too, or not?! --o-o-o-o-o-o-- > The Markup Validator, quite correctly, simply happens to not make use... Why "quite correctly"? How the use of the keep-alive flag would hurt the validator? Sorry for my lack of knowledge: I've never developed a validator in my life. I'm just a Delphi programmer trying to self-host a server (tired of remoted ones). \\:^) And thank you very much for your patience. I'm taking the oportunity to learn a bit more about servers, headers, and related subjects.
(In reply to comment #14) > But if the other two tools make use of the (optional) "keep-alive" feature, why > doesn't the Markup Validator do the same? This would solve the problem not only > for me, but for all other sites using IIS 5. We're not aware of any general problem with the Markup Validator and IIS 5.0. This appears to be a local configuration issue with your particular installation; either due to how it is configured or due to an ISAPI filter or... The Markup Validator has no need to use HTTP Keep Alive because it only requests a single document from the remote server. HTTP Keep Alive is an optional method to optimize retreival of multiple documents from the same server within a short period of time. Both the CSS Validator and the Link Checker have need to fetch multiple documents to satisfy the request, so for these tools HTTP Keep Alive is an appropriate optimization technique to use.
(In reply to comment #15) > We're not aware of any general problem with the Markup Validator and IIS 5.0. Well, now you are. \\:^) --o-o-o-o-o-o-o-- > This appears to be a local configuration issue with your particular > installation... It seems W3C's Markup Validator is quite sensitive in this regard. My server is reachable and is also content-readable all over the web (except for that tool). --o-o-o-o-o-o-o-- > ...either due to how it is configured or due to an ISAPI filter or... No filter at all. All default filters removed other than php interpreter (used by my web-based community). --o-o-o-o-o-o-o-- > The Markup Validator has no need to use HTTP Keep Alive because it only > requests a single document from the remote server. HTTP Keep Alive is an > optional method to optimize retreival of multiple documents from the same > server within a short period of time. Both the CSS Validator and the Link > Checker have need to fetch multiple documents to satisfy the request, so for > these tools HTTP Keep Alive is an appropriate optimization technique to use. Understood. that seems logical. However, making it respect the keep-alive flag wouldn't hurt. --o-o-o-o-o-o-o-- Terje, I'd like to thank you and to Olivier for all time spent on this. I'll drop it, leave is as-is. I have a feeling that, if my server was behind some *$oft company or similar, this "feature" in the validator would deserve more attention - I mean inside the code itself. Still don't see any inconvenience in respecting the keep-alive. After all, I have managed to make the guys see that there is a bad side-effect of the validator's decision to not take that flag into account. Also have learned a lot with you. Thanks again and sorry for any inconvenience. Best regards, Paulo França
Back again! \\;^) Just an interesting detail... THIS WORKS: http://validator.w3.org/check?uri=http://www.athrasoft.com/cgi-bin/Environ.cgi The dynamically created page above works both with W3C and Validome. It's a Delphi console application (.exe renamed to .cgi) made by myself. In the very beginning of the application I write this to the console: Content-Type: text/html; charset=windows-1252 The rest (page header and contents) is the same as in all my static (.htm) pages. Now I think I'll get nuts!! The final server-generated headers (reported by Validome) differs between statical and dynamic pages: STATICAL (.htm): HTTP/1.1 200 OK Server: Microsoft-IIS/5.0 Content-Type: text/html; charset=Windows-1252 Cache-Control: no-cache Expires: Tue, 29 May 2007 16:11:59 GMT Content-Location: http://201.79.145.166:8081/Default.htm Date: Tue, 29 May 2007 16:11:59 GMT Content-Type: text/html Accept-Ranges: bytes Last-Modified: Tue, 29 May 2007 00:19:10 GMT ETag: "01347fa86a1c71:893" Content-Length: 1618 DYNAMIC (.cgi): HTTP/1.1 200 OK Server: Microsoft-IIS/5.0 Date: Tue, 29 May 2007 16:12:03 GMT Connection: close Content-Type: text/html; charset=Windows-1252 <<-- my cgi did this !! Maybe something in the server headers for the statical pages hold things that it shouldn't. I'll investigate it better when I have the courage. \\;^) Kind regards, Paulo.
(In reply to comment #17) > THIS WORKS: [] /cgi-bin/Environ.cgi Yes, this script works regardless of whether the client requested a keep-alive connection or not. IOW, it behaves differently from the previous test case ('/en/Home.htm').
(In reply to comment #18) > Yes, this script works regardless of whether the client requested a keep-alive > connection or not. IOW, it behaves differently from the previous test case > ('/en/Home.htm'). Do you know which headers in specific the W3C Markup Validator looks at? Unfortunately, there isn't a "show server headers" option (as in Validome), so one cannot guess what exactly is being taken into account regarding server headers. That would be very useful for diagnostics. In the near future, I plan on recreating my site by providing dynamic pages, mainly for serving the pages with the more appropriate Content Type. For instance, "application/xhtml+xml" for validators and those browsers which correctly support it, and "text/html" for the rest (mainly legacy browsers), based on this (very interesting) report: http://www.w3.org/People/mimasa/test/xhtml/media-types/results As such, knowing which headers the W3C Markup Validator pays attention to, I will be better prepared for setting up the dynamic site. Thanks, Paulo.
Closing bug, as I gather the problem was IIS misconfiguration. Discussion on negotiation would probably be better on www-validator mailing-list, as it is becoming off-topic for this particular bug report. Thank you!
*** Bug 5222 has been marked as a duplicate of this bug. ***