This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 21383 - JSON output for the HTML validator doesn't return the detected/forced doctype
Summary: JSON output for the HTML validator doesn't return the detected/forced doctype
Status: NEW
Alias: None
Product: Validator
Classification: Unclassified
Component: Templates (show other bugs)
Version: HEAD
Hardware: All All
: P2 normal
Target Milestone: ---
Assignee: This bug has no owner yet - up for the taking
QA Contact: qa-dev tracking
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-03-25 05:33 UTC by Art McBain
Modified: 2013-03-25 06:15 UTC (History)
0 users

See Also:


Attachments

Description Art McBain 2013-03-25 05:33:25 UTC
When asking the HTML validator to output in "SOAP 1.2", it returns the following:

<?xml version="1.0" encoding="UTF-8"?>
<env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope">
<env:Body>
<m:markupvalidationresponse env:encodingStyle="http://www.w3.org/2003/05/soap-encoding" xmlns:m="http://www.w3.org/2005/10/markup-validator">
    
    <m:uri>http://www.iana.org/domains/example</m:uri>
    <m:checkedby>http://validator.w3.org/</m:checkedby>
    <m:doctype>HTML5</m:doctype>
    <m:charset>utf-8</m:charset>
    <m:validity>true</m:validity>
    <m:errors>
        <m:errorcount>0</m:errorcount>
        <m:errorlist>
          
        </m:errorlist>
    </m:errors>
    <m:warnings>
        <m:warningcount>1</m:warningcount>
        <m:warninglist>
        </m:warninglist>
    </m:warnings>
</m:markupvalidationresponse>
</env:Body>
</env:Envelope>


This includes the doctype of the page. The API ( http://validator.w3.org/docs/api ) indicates this is the detected or doctype or that forced by the validator for validation.

What I think would be great, and provide parity with the SOAP 1.2 output would be for the JSON output to provide the doctype. The JSON output for the same website is reproduced below:

{
    "url": "http://www.iana.org/domains/example",
    "messages": [
        
        ],
    "source": {
        "encoding": "utf-8",
        "type": "text/html"
    }
}


I know the output of the JSON is intended to match that of validator.nu's, but it seems quite the oversight to not return the doctype. Even more so that the data provided by the output formats differs even though they are known to have the same source of data.
Comment 1 Art McBain 2013-03-25 06:15:08 UTC
is the detected doctype or that forced*


Also, the JSON output for validator.nu in a roundabout way it does provide the "schema" it used to validate a document.

http://validator.nu/?doc=http%3A%2F%2Fwww.iana.org%2Fdomains%2Fexample&out=json

Produces the following:

{"url":"http://www.iana.org/domains/example","messages":[{"type":"info","message":"The Content-Type was “text/htmlâ€. Using the HTML parser."},{"type":"info","message":"Using the schema for HTML5 + SVG 1.1 + MathML 3.0 + RDFa Lite 1.1."}]}

Which indicates what it validated against in the last message. Though I much prefer the shorter more programmatic output of the W3's SOAP 1.2 doctype tag.

Earlier I had indicated the format was intended to match validator.nu's (which I found at http://validator.w3.org/docs/users.html ), but it seems this only applies to the existence of the "url" and "message" properties and the format of messages listed in the latter. Thus the doctype used for validation could be included in the extra source block, or as a property of the root (like "url") without harming this compatibility.