<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>19241</bug_id>
          
          <creation_ts>2012-10-03 08:23:12 +0000</creation_ts>
          <short_desc>non-utf8 characters in SOAP1.2 output</short_desc>
          <delta_ts>2015-08-23 07:37:37 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>Validator</product>
          <component>check</component>
          <version>HEAD</version>
          <rep_platform>PC</rep_platform>
          <op_sys>Linux</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>WONTFIX</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Pavel Janda">pavel.janda</reporter>
          <assigned_to name="This bug has no owner yet - up for the taking">dave.null</assigned_to>
          <cc>brett.bieber</cc>
    
    <cc>mfairchild</cc>
    
    <cc>mike</cc>
    
    <cc>pavel.janda</cc>
          
          <qa_contact name="qa-dev tracking">www-validator-cvs</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>75205</commentid>
    <comment_count>0</comment_count>
    <who name="Pavel Janda">pavel.janda</who>
    <bug_when>2012-10-03 08:23:12 +0000</bug_when>
    <thetext>Hi all,

first thank you for you well done job. But I found bug in SOAP output of check-script:

When bad non-valid page contains non-utf8 character, this is provided in your SOAP output aswell witch causes non-valid XML and I am not able to work with XML like that in PHP.

Sample page with bad character: http://www.itrebon.cz/ubytovani-v-treboni-a-okoli_78.html

I&apos;ve solved that by removing non-utf8 chars from your output before creating SimpleXMLElement so I am good now, but I want to let you know about this because I mean that however this mistake is not in your code, you should output only valid XML in SOAP.

Thank you very much!
Pavel Janda</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>102652</commentid>
    <comment_count>1</comment_count>
    <who name="Michael Fairchild">mfairchild</who>
    <bug_when>2014-03-19 16:46:59 +0000</bug_when>
    <thetext>Bump.

I&apos;m having this issue as well.
see my example script to reproduce: https://gist.github.com/mfairchild365/9645880</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>102665</commentid>
    <comment_count>2</comment_count>
    <who name="Brett Bieber">brett.bieber</who>
    <bug_when>2014-03-19 19:37:35 +0000</bug_when>
    <thetext>This problem occurs in the web output as well as the soap12 output.

Perhaps when the error type is &quot;Forbidden code point&quot;, the source sample should be altered to remove the invalid code point, or not shown at all.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>102681</commentid>
    <comment_count>3</comment_count>
      <attachid>1453</attachid>
    <who name="Michael Fairchild">mfairchild</who>
    <bug_when>2014-03-19 22:14:51 +0000</bug_when>
    <thetext>Created attachment 1453
Don&apos;t include the forbidden code point</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>102709</commentid>
    <comment_count>4</comment_count>
      <attachid>1454</attachid>
    <who name="Michael Fairchild">mfairchild</who>
    <bug_when>2014-03-20 21:02:18 +0000</bug_when>
    <thetext>Created attachment 1454
Replace the invalid character instead of removing the entire line

This patch replaces the forbidden character with a question mark (?) before it is displayed.  This is an improvement over the last patch, which simply prevented the entire line of context from being displayed.  By showing the line with the question mark, it will hopefully be easier for people to find the location of the character and fix the problem.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>122728</commentid>
    <comment_count>5</comment_count>
    <who name="Michael[tm] Smith">mike</who>
    <bug_when>2015-08-23 07:37:37 +0000</bug_when>
    <thetext>The output=soap12 option is obsolete and no longer maintained and should no longer be used or relied on.

We recommend instead using the current HTML checker https://validator.w3.org/nu/ with the out=json option.</thetext>
  </long_desc>
      
          <attachment
              isobsolete="1"
              ispatch="1"
              isprivate="0"
          >
            <attachid>1453</attachid>
            <date>2014-03-19 22:14:51 +0000</date>
            <delta_ts>2014-03-20 21:02:18 +0000</delta_ts>
            <desc>Don&apos;t include the forbidden code point</desc>
            <filename>19241.patch</filename>
            <type>text/plain</type>
            <size>743</size>
            <attacher name="Michael Fairchild">mfairchild</attacher>
            
              <data encoding="base64">LS0tIGEvaHR0cGQvY2dpLWJpbi9jaGVjawlNb24gU2VwIDIzIDEzOjM0OjE5IDIwMTMgKzAyMDAK
KysrIGIvaHR0cGQvY2dpLWJpbi9jaGVjawlXZWQgTWFyIDE5IDE1OjQ0OjAxIDIwMTQgLTA1MDAK
QEAgLTIzMjksNyArMjMyOSwxMCBAQAogICAgICAgICAgICAgICAgIGlmIChkZWZpbmVkKCRlcnIt
PntsaW5lfSkgJiYKICAgICAgICAgICAgICAgICAgICAgJEZpbGUtPntDb250ZW50fS0+WyRlcnIt
PntsaW5lfSAtIDFdKQogICAgICAgICAgICAgICAgIHsKLSAgICAgICAgICAgICAgICAgICAgaWYg
KGRlZmluZWQoJGVyci0+e2NoYXJ9KSAmJiAkZXJyLT57Y2hhcn0gPX4gL15bMC05XSskLykgewor
ICAgICAgICAgICAgICAgICAgICBpZiAoaW5kZXgoJGVyci0+e21zZ30sICdGb3JiaWRkZW4gY29k
ZSBwb2ludCAnKSAhPSAtMSkgeworICAgICAgICAgICAgICAgICAgICAgICAgIyBQcmV2ZW50IGRp
c3BsYXkgb2YgaW52YWxpZCBjaGFyYWN0ZXJzCisgICAgICAgICAgICAgICAgICAgIH0KKyAgICAg
ICAgICAgICAgICAgICAgZWxzaWYgKGRlZmluZWQoJGVyci0+e2NoYXJ9KSAmJiAkZXJyLT57Y2hh
cn0gPX4gL15bMC05XSskLykgewogICAgICAgICAgICAgICAgICAgICAgICAgKCRsaW5lLCAkY29s
KSA9CiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgJnRydW5jYXRlX2xpbmUoCiAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgJEZpbGUtPntDb250ZW50fS0+WyRlcnItPntsaW5lfSAtIDFd
LAo=
</data>

          </attachment>
          <attachment
              isobsolete="0"
              ispatch="1"
              isprivate="0"
          >
            <attachid>1454</attachid>
            <date>2014-03-20 21:02:18 +0000</date>
            <delta_ts>2014-03-20 21:02:18 +0000</delta_ts>
            <desc>Replace the invalid character instead of removing the entire line</desc>
            <filename>19241-1.patch</filename>
            <type>text/plain</type>
            <size>687</size>
            <attacher name="Michael Fairchild">mfairchild</attacher>
            
              <data encoding="base64">LS0tIGEvaHR0cGQvY2dpLWJpbi9jaGVjawlNb24gU2VwIDIzIDEzOjM0OjE5IDIwMTMgKzAyMDAK
KysrIGIvaHR0cGQvY2dpLWJpbi9jaGVjawlUaHUgTWFyIDIwIDE1OjQ5OjM0IDIwMTQgLTA1MDAK
QEAgLTIzODgsNiArMjM4OCwxMCBAQAogICAgICAgICAgICAgICAgIGlmIChkZWZpbmVkKCRlcnIt
PntsaW5lfSkgJiYKICAgICAgICAgICAgICAgICAgICAgJEZpbGUtPntDb250ZW50fS0+WyRlcnIt
PntsaW5lfSAtIDFdKQogICAgICAgICAgICAgICAgIHsKKyAgICAgICAgICAgICAgICAgICAgaWYg
KGluZGV4KCRlcnItPnttc2d9LCAnRm9yYmlkZGVuIGNvZGUgcG9pbnQgJykgIT0gLTEpIHsKKyAg
ICAgICAgICAgICAgICAgICAgICAgICMgUHJldmVudCBkaXNwbGF5IG9mIGludmFsaWQgY2hhcmFj
dGVycworICAgICAgICAgICAgICAgICAgICAgICAgc3Vic3RyKCRGaWxlLT57Q29udGVudH0tPlsk
ZXJyLT57bGluZX0gLSAxXSwgJGVyci0+e2NoYXJ9LTEsIDEpPSc/JzsKKyAgICAgICAgICAgICAg
ICAgICAgfQogICAgICAgICAgICAgICAgICAgICBpZiAoZGVmaW5lZCgkZXJyLT57Y2hhcn0pICYm
ICRlcnItPntjaGFyfSA9fiAvXlswLTldKyQvKSB7CiAgICAgICAgICAgICAgICAgICAgICAgICAo
JGxpbmUsICRjb2wpID0KICAgICAgICAgICAgICAgICAgICAgICAgICAgICAmdHJ1bmNhdGVfbGlu
ZSgK
</data>

          </attachment>
      

    </bug>

</bugzilla>