<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>66</bug_id>
          
          <creation_ts>2002-11-03 16:09:48 +0000</creation_ts>
          <short_desc>SP does not properly escape SIs.</short_desc>
          <delta_ts>2006-10-31 02:21:34 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>Validator</product>
          <component>Parser</component>
          <version>0.6.1</version>
          <rep_platform>All</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>INVALID</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>major</bug_severity>
          <target_milestone>1.0</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Terje Bless">link</reporter>
          <assigned_to name="Terje Bless">link</assigned_to>
          <cc>bjoern</cc>
    
    <cc>nick</cc>
          
          <qa_contact name="qa-dev tracking">www-validator-cvs</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>139</commentid>
    <comment_count>0</comment_count>
    <who name="Terje Bless">link</who>
    <bug_when>2002-11-03 16:09:48 +0000</bug_when>
    <thetext>Reported by Björn Höhrmann:

two documents,

  &lt;?xml version=&apos;1.0&apos; encoding=&apos;iso-8859-1&apos;?&gt;
  &lt;!DOCTYPE foo SYSTEM
    &quot;http://www.bjoernsworld.de/cgi-bin/dtd.pl?björn&quot;&gt;
  &lt;foo&gt;
  &lt;bar/&gt;
  &lt;/foo&gt;

and

  &lt;?xml version=&apos;1.0&apos; encoding=&apos;iso-8859-1&apos;?&gt;
  &lt;!DOCTYPE foo SYSTEM
    &quot;http://www.bjoernsworld.de/cgi-bin/dtd.pl?bj%c3%b6rn&quot;&gt;
  &lt;foo&gt;
  &lt;bar/&gt;
  &lt;/foo&gt;

As per XML 1.0 Second Edition section 4.2.2 XML processors must process
these documents as beeing equivalent, the Validator however does not, it
claims the second document beeing valid while the first document is said
to be invalid. It&apos;s getting somehow confused by the system identifier in
the first example.

Typically, XML processors get it &quot;right&quot; but request ...?bj\xF6rn or
....?bj\xC3\xB6rn or ...?bj%f6rn instead of ...?bj%c3%b6rn,
....?bj%C3%b6rn or ...?bj%C3%B6rn (which are all equivalent).

dtd.pl is a CGI script that outputs different DTDs depending on whether
the processor is behaving correctly:

  #!/usr/local/bin/perl -w
  print &quot;Content-Type: application/xml-dtd;charset=us-ascii\n\n&quot;;
  print &quot;&lt;!ELEMENT foo (bar)&gt;\n&quot;
  if ($ENV{&apos;QUERY_STRING&apos;} eq &quot;bj%c3%b6rn&quot; or
      $ENV{&apos;QUERY_STRING&apos;} eq &quot;bj%C3%b6rn&quot; or
      $ENV{&apos;QUERY_STRING&apos;} eq &quot;bj%C3%B6rn&quot;)
  {
    print &quot;&lt;!ELEMENT bar EMPTY&gt;\n&quot;
  }

I.e. the document is valid for conforming processors, invalid for
non-conforming processors.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>140</commentid>
    <comment_count>1</comment_count>
    <who name="Terje Bless">link</who>
    <bug_when>2002-11-03 16:12:06 +0000</bug_when>
    <thetext>Reassigning to Nick as he&apos;s the only one that knows enough about SP&apos;s innards to
do anything about this. Setting target to 0.7.0 as I don&apos;t think anything can be
done about this in the 0.6.0 timeframe.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>256</commentid>
    <comment_count>2</comment_count>
    <who name="Terje Bless">link</who>
    <bug_when>2002-12-02 23:32:10 +0000</bug_when>
    <thetext>Ping Nick...</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>263</commentid>
    <comment_count>3</comment_count>
    <who name="niq">nick</who>
    <bug_when>2002-12-03 19:04:48 +0000</bug_when>
    <thetext>The bug report seems to me to be in error.  Or, more specifically, it hinges 
on whether unescaped ö is allowed in a QUERY_STRING.  It may be unsafe(?), but
is AFAICS nevertheless legal, so ISTM SP is working correctly, and the bug is
in the serverside script.

Lynx agrees:

[nick@jarl nick]$ lynx -dump -source 
http://www.bjoernsworld.de/cgi-bin/dtd.pl?björn
&lt;!ELEMENT foo (bar)&gt;
&lt;!----&gt;
[nick@jarl nick]$ lynx -dump -source 
http://www.bjoernsworld.de/cgi-bin/dtd.pl?bj%c3%b6rn
&lt;!ELEMENT foo (bar)&gt;
&lt;!ELEMENT bar EMPTY&gt;

If someone can convince me otherwise, I could patch it fairly easily to escape 
8-bit URIs, but I fear that could introduce serious bugs when working with a 
16-bit charset.  So at the very least I&apos;d have to ask on openjade-devel.

</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>265</commentid>
    <comment_count>4</comment_count>
    <who name="Bj">bjoern</who>
    <bug_when>2002-12-04 14:21:52 +0000</bug_when>
    <thetext>Niq, please have a look at section 4.2.2 of XML 1.0 Second Edition 
(http://www.w3.org/TR/REC-xml#dt-sysid), XML processors MUST behave as I 
described for system identifiers.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>396</commentid>
    <comment_count>5</comment_count>
    <who name="Terje Bless">link</who>
    <bug_when>2003-03-01 14:37:28 +0000</bug_when>
    <thetext>Retarget 0.7.0 as 0.6.2 is imminent, but this really needs to be resolved.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>2213</commentid>
    <comment_count>6</comment_count>
    <who name="Terje Bless">link</who>
    <bug_when>2004-09-01 16:39:15 +0000</bug_when>
    <thetext>Ping Nick again, and add CC to Björn.

Putting blocker on Bug #856; should be resolved before 0.7.0 release.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>2246</commentid>
    <comment_count>7</comment_count>
    <who name="Bj">bjoern</who>
    <bug_when>2004-09-06 20:10:51 +0000</bug_when>
    <thetext>Fixing this would require to either change all system identifers in the 
document and its external parsed entities (which would require major work) or 
to change OpenSP which is not exactly trivial and I doubt such a change will 
make it for OpenSP 1.5.2. This is in fact better addressed through using a 
proper XML processor which would already ship with correct behavior in this 
regard. So this does not seem to fit in the 0.7.0 time frame.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>12679</commentid>
    <comment_count>8</comment_count>
    <who name="Olivier Thereaux">ot</who>
    <bug_when>2006-10-31 02:21:34 +0000</bug_when>
    <thetext>Sounds like &quot;someone else&apos;s problem&quot; to me. There&apos;s no point in keeping a record of bugs in SP, sourceforge has a bug database, right? And given the participants in this bug&apos;s discussion, I trust the bug has been recorded there. 

As Bjoern notes, &quot;This is in fact better addressed through using a 
proper XML processor which would already ship with correct behavior in this 
regard.&quot; Which is true, but not relevant to this bug.

Closing. ok?</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>