This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
Hi all, first thank you for you well done job. But I found bug in SOAP output of check-script: When bad non-valid page contains non-utf8 character, this is provided in your SOAP output aswell witch causes non-valid XML and I am not able to work with XML like that in PHP. Sample page with bad character: http://www.itrebon.cz/ubytovani-v-treboni-a-okoli_78.html I've solved that by removing non-utf8 chars from your output before creating SimpleXMLElement so I am good now, but I want to let you know about this because I mean that however this mistake is not in your code, you should output only valid XML in SOAP. Thank you very much! Pavel Janda
Bump. I'm having this issue as well. see my example script to reproduce: https://gist.github.com/mfairchild365/9645880
This problem occurs in the web output as well as the soap12 output. Perhaps when the error type is "Forbidden code point", the source sample should be altered to remove the invalid code point, or not shown at all.
Created attachment 1453 [details] Don't include the forbidden code point
Created attachment 1454 [details] Replace the invalid character instead of removing the entire line This patch replaces the forbidden character with a question mark (?) before it is displayed. This is an improvement over the last patch, which simply prevented the entire line of context from being displayed. By showing the line with the question mark, it will hopefully be easier for people to find the location of the character and fix the problem.
The output=soap12 option is obsolete and no longer maintained and should no longer be used or relied on. We recommend instead using the current HTML checker https://validator.w3.org/nu/ with the out=json option.