This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 66 - SP does not properly escape SIs.
Summary: SP does not properly escape SIs.
Alias: None
Product: Validator
Classification: Unclassified
Component: Parser (show other bugs)
Version: 0.6.1
Hardware: All All
: P2 major
Target Milestone: 1.0
Assignee: Terje Bless
QA Contact: qa-dev tracking
Depends on:
Reported: 2002-11-03 16:09 UTC by Terje Bless
Modified: 2006-10-31 02:21 UTC (History)
2 users (show)

See Also:


Description Terje Bless 2002-11-03 16:09:48 UTC
Reported by Björn Höhrmann:

two documents,

  <?xml version='1.0' encoding='iso-8859-1'?>


  <?xml version='1.0' encoding='iso-8859-1'?>

As per XML 1.0 Second Edition section 4.2.2 XML processors must process
these documents as beeing equivalent, the Validator however does not, it
claims the second document beeing valid while the first document is said
to be invalid. It's getting somehow confused by the system identifier in
the first example.

Typically, XML processors get it "right" but request ...?bj\xF6rn or
....?bj\xC3\xB6rn or ...?bj%f6rn instead of ...?bj%c3%b6rn,
....?bj%C3%b6rn or ...?bj%C3%B6rn (which are all equivalent). is a CGI script that outputs different DTDs depending on whether
the processor is behaving correctly:

  #!/usr/local/bin/perl -w
  print "Content-Type: application/xml-dtd;charset=us-ascii\n\n";
  print "<!ELEMENT foo (bar)>\n"
  if ($ENV{'QUERY_STRING'} eq "bj%c3%b6rn" or
      $ENV{'QUERY_STRING'} eq "bj%C3%b6rn" or
      $ENV{'QUERY_STRING'} eq "bj%C3%B6rn")
    print "<!ELEMENT bar EMPTY>\n"

I.e. the document is valid for conforming processors, invalid for
non-conforming processors.
Comment 1 Terje Bless 2002-11-03 16:12:06 UTC
Reassigning to Nick as he's the only one that knows enough about SP's innards to
do anything about this. Setting target to 0.7.0 as I don't think anything can be
done about this in the 0.6.0 timeframe.
Comment 2 Terje Bless 2002-12-02 23:32:10 UTC
Ping Nick...
Comment 3 niq 2002-12-03 19:04:48 UTC
The bug report seems to me to be in error.  Or, more specifically, it hinges 
on whether unescaped ö is allowed in a QUERY_STRING.  It may be unsafe(?), but
is AFAICS nevertheless legal, so ISTM SP is working correctly, and the bug is
in the serverside script.

Lynx agrees:

[nick@jarl nick]$ lynx -dump -sourceörn
<!ELEMENT foo (bar)>
[nick@jarl nick]$ lynx -dump -source
<!ELEMENT foo (bar)>

If someone can convince me otherwise, I could patch it fairly easily to escape 
8-bit URIs, but I fear that could introduce serious bugs when working with a 
16-bit charset.  So at the very least I'd have to ask on openjade-devel.

Comment 4 Bj 2002-12-04 14:21:52 UTC
Niq, please have a look at section 4.2.2 of XML 1.0 Second Edition 
(, XML processors MUST behave as I 
described for system identifiers.
Comment 5 Terje Bless 2003-03-01 14:37:28 UTC
Retarget 0.7.0 as 0.6.2 is imminent, but this really needs to be resolved.
Comment 6 Terje Bless 2004-09-01 16:39:15 UTC
Ping Nick again, and add CC to Björn.

Putting blocker on Bug #856; should be resolved before 0.7.0 release.
Comment 7 Bj 2004-09-06 20:10:51 UTC
Fixing this would require to either change all system identifers in the 
document and its external parsed entities (which would require major work) or 
to change OpenSP which is not exactly trivial and I doubt such a change will 
make it for OpenSP 1.5.2. This is in fact better addressed through using a 
proper XML processor which would already ship with correct behavior in this 
regard. So this does not seem to fit in the 0.7.0 time frame.
Comment 8 Olivier Thereaux 2006-10-31 02:21:34 UTC
Sounds like "someone else's problem" to me. There's no point in keeping a record of bugs in SP, sourceforge has a bug database, right? And given the participants in this bug's discussion, I trust the bug has been recorded there. 

As Bjoern notes, "This is in fact better addressed through using a 
proper XML processor which would already ship with correct behavior in this 
regard." Which is true, but not relevant to this bug.

Closing. ok?