Markup Validator Updated

Part of Tools

Author(s) and publish date

By:

Olivier Thereaux

Published:

8 August 2008

I tend to keep an eye on things done at CERN. Not just because this is the Web's mothership, but also because there is always a very slim chance that one of their experiments happen to recreate the big bang, kill us all, re-shape the laws of the universe or something else equally exciting and dreadful. After all, it would really be a waste to plan a release of one of our tools after the end of time. So when I started reading about the countdown to the launch of the Large Hadron Collider for August 8th, 2008 I knew it was time to push that maintenance release of the Markup Validator I had been promising “real soon now” for… the past months.

As it turns out, our friends in Switzerland will only start recreating the time just after the big bang in a month. Ah well. Until then, we will have time to enjoy sports on TV, and the Markup Validator, release 0.8.3.

This is mostly a maintenance release, fixing a few bugs, adding support for recently added or updated document types such as XHTML Basic 1.1, but it does have a number of valuable tricks up its sleeves.

For those of us using the validator not just as a web service but as a web platform, a couple of new features will make our life even easier. First, a json output has been added to the validator's results possible outputs. The format is modeled after the JSON output built by our friends at validator.nu. Try this:

GET "http://validator.w3.org/check?uri=http://qa-dev.w3.org/wmvs/HEAD/dev/tests/2342-opensp_type_X.html&output=json"

…you get:

{
    "url": "http://qa-dev.w3.org/wmvs/HEAD/dev/tests/2342-opensp_type_X.html",
    "messages": [
        
          {
              
              "type": "info",
              "subtype": "warning"
              "lastLine": "11",
              "lastColumn": 20,
              "message": "reference to non-existent ID "MMIARCH"",
              "messageid": 183,
              "explanation": "    
        [...]
    <div class="ve mid-183">        
    <p>This error can be triggered by:</p>        
    <ul>        
      <li>A non-existent input, select or textarea element</li>        
      <li>A missing id attribute</li>        
      <li>A typographical error in the id attribute</li>        
    </ul>        
    <p>Try to check the spelling and case of the id you are referring to.</p>        
  </div>        
",
          }
        
        ],
    "source": {
        "encoding": "utf-8"
    }
}

While we are looking at calling the validator and getting quick, easy to process results, did you know that the fastest way to get basic info on validation were the validator's custom HTTP headers? They have been around for a while, now are properly documented and we have added information about the number of warnings, too. Try this:

HEAD http://validator.w3.org/check?uri=http://qa-dev.w3.org/wmvs/HEAD/dev/tests/2342-opensp_type_X.html

    200 OK
    Date: Fri, 08 Aug 2008 15:00:49 GMT
    Content-Language: en
    Content-Type: text/html; charset=utf-8
    Client-Date: Fri, 08 Aug 2008 15:00:52 GMT
    Client-Peer: 128.30.52.49:80
    Client-Response-Num: 1
    X-W3C-Validator-Errors: 0
    X-W3C-Validator-Recursion: 1
    X-W3C-Validator-Status: Valid
    X-W3C-Validator-Warnings: 1

Another good piece of news. If you have a vested interest in XHTML, you will know this dilemma fairly well:

XHTML is supposed to be served with the media type application/xhtml+xml media type. That XHTML media type has a few issues, however, in particular the fact that the most distributed browser, up to now, still hasn't added support for it.
XHTML 1.0 defined an informative way to be “served as (legacy) HTML”, which kind of worked. But for the rest of the XHTML family…? Some people came up with clever hacks, using HTTP format negotiation to serve XHTML as application/xhtml+xml only to the agents that clearly specify they support this media type, and as text/html, by default, to the others
What does that have to do with the Markup Validator? It does not declare an authoritative list of the media types it accepts. Actually, it can't, since there is no way in HTTP to say "Accept HTML, SVG, MathML… and any kind of XML". It does not have to, either, since the HTTP technology makes the Accept header optional, and its absence just means “send me what you've got”
When checking one resource set up with the Accept hack for XHTML, the validator would be served content as text/html, and, since that is not supposed to happen, the validator would yield a warning stating, in essence are you certain you really want to serve XHTML 1.1 content as text/html?.

It may have been a mere warning, but it made a lot, lot, lot of people anxious and upset. So, by popular demand – and also because the XHTML working group are preparing a revised note on XHTML and media types ??the warning is gone.

Those interested in HTTP content negotiation beyond the issue with XHTML media type will be interested with some new features in the validator. In version 0.8.2 we had added a way to specify the Accept: and Accept-Language headers sent by the validator to the server holding documents it checks, and in 0.8.3 we also added Accept-Charset and User-Agent. These options are still experimental, but should be useful for content-negotiated resources that do not have a specific URI for each representation.

There is more in this version, and more to come. Read the 0.8.3 release notes, learn how to send feedback or participate in the project, and join me in thanking everyone involved in this release.

Related RSS feed

Subscribe to our blog feed

Comments (7)

David Smalley - 19 January 2009 at 23:40:29 UTC

Hi,
I really love returning the information in the HTTP headers! It's removed a bottle neck where I had to run an xml parser over the results, which is always a pain in Ruby. One issue, the CSS parser does not ever seem to return warning counts, only the error counts in the headers.
e.g.
I have a site which returns 1 warning on the html output:
Errors (23) Warnings (1) Validated CSS
But in the headers I only get the errors:
{"cache-control"=>["no-cache"], "x-w3c-validator-status"=>["Invalid"], "x-w3c-validator-errors"=>["23"], "connection"=>["close"], "content-type"=>["text/plain;charset=utf-8"], "date"=>["Mon, 19 Jan 2009 23:38:03 GMT"], "content-language"=>["en"], "server"=>["Jigsaw/2.3.0-beta1"], "content-length"=>["506746"], "pragma"=>["no-cache"]}
The same is not true for the HTML parser, which does correctly return the warnings and errors. Any chance of a fix?
Thanks,
David
Olivier Théreaux - 23 January 2009 at 00:20:05 UTC

Hi David,
It looks indeed like the markup validator is the only one with the number of warnings sent via HTTP. The CSS validator does not have that feature yet, but I'll add it in our bugzilla so that we can code it for a future release. Thanks!
Andrew Roberts - 29 January 2009 at 12:14:57 UTC

Hi Both
forgive me intruding on this thread as you both sound knowledgeable on the code validation.
I'm a complete novice and looking to remove errors thrown up by my site. Yes - managed to lose all the silly ones with tags unclosed or nested incorrectly, but certain errors I cannot seem to identify why the validator is rejecting it.
for example

A:hover {TEXT-DECORATION: underline}
It dislikes the style tag and some 30 or so other issues with the site which is www.DiscusGroup.co.uk
Can you help point me in the direction of a user friendly (or more like idiots guide) to help me fix these issues or atleast understand them and accept or reject based on knowledge.
Many thanks for anyone able to assist
Toflar - 10 October 2009 at 11:13:20 UTC
Hi all
I was trying to decode your json with php and I've got some troubles because your output seems not to be valid json ;-)
Examples:
current and wrong:

"subtype": "warning" "lastLine": "18",

should be:

"subtype": "warning", "lastLine": "18",

problem in short:
missing comma
current and wrong:

"explanation": "xy", }

should be:

"explanation": "xy" }

problem in short:
the explanation is the last item of this array. So putting a comma in front of the closing "}" is wrong.
current and wrong:

"explanation": "xy",} { "type": "info"

should be:

"explanation": "xy"}, { "type": "info"

problem in short:
Every message should be separated by a comma. so instead of "} {" you should use "}, {". Of course no comma should be added after the last message.
This is the first time I'm working with JSON and I might be wrong. If so, can you give me example of how to use the php function json_decode() with your code?
Thanks for considering my feedback and keep up the good work!
- Ted Guild - 12 October 2009 at 11:59:16 UTC
  
  Toflar,
  Thank you for the feedback. There has been some recent fixes in the JSON output including the issues you noted which I have released to the production server.
Raymond Camden - 25 June 2012 at 16:20:10 UTC

Sorry to post on an old topic here, but I'm trying to test json output with fragment or file upload calls. I've tried passing output/json as a form field, and in the URL, but it never works. I get results, but in HTML.
- Coralie Mercier - 26 June 2012 at 06:45:43 UTC
  
  Hello Raymond,
  I invite you to consult the validator feedback page at http://validator.w3.org/feedback.html to search past questions, submit reports, etc.