This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
Reported by Björn Höhrmann: two documents, <?xml version='1.0' encoding='iso-8859-1'?> <!DOCTYPE foo SYSTEM "http://www.bjoernsworld.de/cgi-bin/dtd.pl?björn"> <foo> <bar/> </foo> and <?xml version='1.0' encoding='iso-8859-1'?> <!DOCTYPE foo SYSTEM "http://www.bjoernsworld.de/cgi-bin/dtd.pl?bj%c3%b6rn"> <foo> <bar/> </foo> As per XML 1.0 Second Edition section 4.2.2 XML processors must process these documents as beeing equivalent, the Validator however does not, it claims the second document beeing valid while the first document is said to be invalid. It's getting somehow confused by the system identifier in the first example. Typically, XML processors get it "right" but request ...?bj\xF6rn or ....?bj\xC3\xB6rn or ...?bj%f6rn instead of ...?bj%c3%b6rn, ....?bj%C3%b6rn or ...?bj%C3%B6rn (which are all equivalent). dtd.pl is a CGI script that outputs different DTDs depending on whether the processor is behaving correctly: #!/usr/local/bin/perl -w print "Content-Type: application/xml-dtd;charset=us-ascii\n\n"; print "<!ELEMENT foo (bar)>\n" if ($ENV{'QUERY_STRING'} eq "bj%c3%b6rn" or $ENV{'QUERY_STRING'} eq "bj%C3%b6rn" or $ENV{'QUERY_STRING'} eq "bj%C3%B6rn") { print "<!ELEMENT bar EMPTY>\n" } I.e. the document is valid for conforming processors, invalid for non-conforming processors.
Reassigning to Nick as he's the only one that knows enough about SP's innards to do anything about this. Setting target to 0.7.0 as I don't think anything can be done about this in the 0.6.0 timeframe.
Ping Nick...
The bug report seems to me to be in error. Or, more specifically, it hinges on whether unescaped ö is allowed in a QUERY_STRING. It may be unsafe(?), but is AFAICS nevertheless legal, so ISTM SP is working correctly, and the bug is in the serverside script. Lynx agrees: [nick@jarl nick]$ lynx -dump -source http://www.bjoernsworld.de/cgi-bin/dtd.pl?björn <!ELEMENT foo (bar)> <!----> [nick@jarl nick]$ lynx -dump -source http://www.bjoernsworld.de/cgi-bin/dtd.pl?bj%c3%b6rn <!ELEMENT foo (bar)> <!ELEMENT bar EMPTY> If someone can convince me otherwise, I could patch it fairly easily to escape 8-bit URIs, but I fear that could introduce serious bugs when working with a 16-bit charset. So at the very least I'd have to ask on openjade-devel.
Niq, please have a look at section 4.2.2 of XML 1.0 Second Edition (http://www.w3.org/TR/REC-xml#dt-sysid), XML processors MUST behave as I described for system identifiers.
Retarget 0.7.0 as 0.6.2 is imminent, but this really needs to be resolved.
Ping Nick again, and add CC to Björn. Putting blocker on Bug #856; should be resolved before 0.7.0 release.
Fixing this would require to either change all system identifers in the document and its external parsed entities (which would require major work) or to change OpenSP which is not exactly trivial and I doubt such a change will make it for OpenSP 1.5.2. This is in fact better addressed through using a proper XML processor which would already ship with correct behavior in this regard. So this does not seem to fit in the 0.7.0 time frame.
Sounds like "someone else's problem" to me. There's no point in keeping a record of bugs in SP, sourceforge has a bug database, right? And given the participants in this bug's discussion, I trust the bug has been recorded there. As Bjoern notes, "This is in fact better addressed through using a proper XML processor which would already ship with correct behavior in this regard." Which is true, but not relevant to this bug. Closing. ok?