4956 – Expected results for fn-doc-23

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 4956 - Expected results for fn-doc-23

Summary: Expected results for fn-doc-23

Status:	RESOLVED FIXED

Alias:	None

Product:	XML Query Test Suite
Classification:	Unclassified
Component:	XML Query Test Suite (show other bugs)
Version:	unspecified
Hardware:	All All

Importance:	P2 normal
Target Milestone:	---
Assignee:	Benjamin Nguyen
QA Contact:	Mailing list for public feedback on specs from XSL and XML Query WGs

URL:
Whiteboard:
Keywords:

Duplicates (1):	5088 (view as bug list)
Depends on:
Blocks:

Reported:	2007-08-17 15:38 UTC by Nick Jones
Modified:	2016-03-22 14:49 UTC (History)
CC List:	5 users (show)

See Also:

Attachments
My result for this test case. (2.74 KB, application/zip) 2008-01-09 16:42 UTC, Andrew Eisenberg	Details
my updated actual result (294.93 KB, application/xml) 2008-05-07 19:01 UTC, Andrew Eisenberg	Details
Our results (2.83 KB, application/zip) 2010-03-16 16:10 UTC, Nick Jones	Details
latest result for fn-doc-23 (2.75 KB, application/x-zip) 2010-07-06 16:14 UTC, Andrew Eisenberg	Details

Description Nick Jones 2007-08-17 15:38:35 UTC

This query loads in a document containing much whitespace and then outputs it. I believe some of the whitespace in the expected output is incorrect. For example the last textNode element in the source is:
      <textNode>&#x20;&#x20;&#x20;&#xd;&#xd;&#xd;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xa;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;&#xd;</textNode>

in the expected results this becomes textNode containing a series of spaces followed by a series of "&#xD;". But I think it should be:

<textNode>  &#xD;&#xD;&#D;


(lots of newlines)


&#xD;&#xD;etc....</textNode>

I think there are other problems, but are quite hard to describe in a 160K file of various blank characters!

Comment 1 Frans Englich 2007-10-01 10:06:12 UTC

*** Bug 5088 has been marked as a duplicate of this bug. ***

Comment 2 Frans Englich 2007-10-01 10:08:00 UTC

See report #5088, for a nifty script for debugging this test. I'll have an updated version submitted soon.

Comment 3 Frans Englich 2007-11-21 11:45:21 UTC

Updated fn-doc-23.txt in CVS with the version Nick sent in private mail. XQTS_current.zip is not yet updated.

Comment 4 Nick Jones 2007-11-22 16:57:41 UTC

Thanks. Have checked out from CVS and the version there works fine for me, although I did generate the file so someone else might want to confirm before I close....

Comment 5 Andrew Eisenberg 2008-01-09 16:42:13 UTC

Created attachment 506 [details]
My result for this test case.

This test is failing for me. I see a difference with the expected result after line 65724. I'm at a loss to determine why this difference occurs and which is at fault.

Comment 6 Nick Jones 2008-01-10 10:38:09 UTC

I think one difference is from 5 XML Output Method of the XSLT 2.0 and XQuery 1.0 Serialization spec:

Specifically, CR, NEL and LINE SEPARATOR characters in text nodes MUST be output respectively as "&#xD;", "&#x85;", and "&#x2028;", or their equivalents; while CR, NL, TAB, NEL and LINE SEPARATOR characters in attribute nodes MUST be output respectively as "&#xD;", "&#xA;", "&#x9;", "&#x85;", and "&#x2028;", or their equivalents.


So one differences is that we are outputting &#8232; where you are outputting the character itself. I think we are correct.

There are other differences but am still trying to track them down.

Comment 7 Frans Englich 2008-02-29 11:55:18 UTC

Andrew, did you get any further on finding what could be wrong with this test?

Comment 8 Frans Englich 2008-04-28 12:08:24 UTC

Nick, did you get any further on tracking this down?

As someone pointed out to me, this is a messy test. It's big and hard to debug. But it also bring good sides, since it contains complexity and therefore stress implementations in an area where the spec is complex and where implementations tends to become elaborated(e.g, compression schemes for whitespace-only nodes).

Comment 9 Nick Jones 2008-04-28 12:15:02 UTC

Sorry. This slipped off my todo list. The last difference I found between ourselves/Saxon and Andrew's output was in how we were outputting the LINE SEPARATOR as mentioned in #6.

I'll try to have a look again at the other differences sometime this week and get back to you.

Comment 10 Nick Jones 2008-05-01 10:50:20 UTC

Have found another problem

The unicode character &#x2001; (em quad)

we are serializing in UTF-8 as the following series of bytes:

0xE2 0x80 0x81

but Andrew's file has this character serialized as:

OxE2 0x80 0x3F

This can be seen at line 65606 of his file. I believe our serialization is correct, and that his output doesn't represent any valid unicode character.

Comment 11 Nick Jones 2008-05-01 10:58:40 UTC

Similarly:

The unicode character &#x205f; (medium mathematical space)

we are serializing in UTF-8 as the following series of bytes:

0xE2 0x81 0x9F

but Andrew's file has this character serialized as:

0xE2 0x3F 0x9F


It appears that the byte 0x81 appears nowhere in Andrew's output, that all instances have been replaced by the byte 0x3F. This is definitely an invalid utf-8 encoding, as the second/third bytes should all be higher that 7F.

Comment 12 Andrew Eisenberg 2008-05-07 19:01:25 UTC

Created attachment 551 [details]
my updated actual result

Nick, your comments led me to a bug in the serialization of my query result to a file. I compare my actual result to the expected results prior to this serialization, and so I still see a difference after line 65724.

Comment 13 Nick Jones 2008-05-08 09:09:56 UTC

I think at line 65724 we have what I was referring to in comment #6.

So this is where the lines of &#x2028; (LINE SEPARATOR) characters start, which should be output as "&#x2028;" not as a LINE SEPARATOR character (E2 80 81).

Comment 14 Andrew Eisenberg 2008-05-09 17:58:40 UTC

I see that my product is not processing the &#x2028 characters correctly. It will be a while until I see a fix and can run this test again. Nick, if you are happy with the expected result, then I suggest that you close this bug.

Comment 15 Michael Kay 2008-06-09 11:32:34 UTC

Saxon is currently showing three differences between the expected test results and the actual results. The first (using the diagnostic query in bug #5088) is:

<difference node="813">
   <gold>32 8195 10 160</gold>
   <actual>32 8195 10 10 160</actual>
</difference>

It's entirely possible, of course, that the files have been corrupted in the course of CVS download, as in bug #5686.

Comment 16 Frans Englich 2008-07-23 10:41:07 UTC

Awaiting reply in private mail from Michael K.

Comment 17 Nick Jones 2009-10-21 13:26:58 UTC

I've re-opening this as I believe the issue is still outstanding.

Comment 18 Frans Englich 2010-03-15 14:45:07 UTC

Nick, seems I latest committed a version I received from you:

Revision 1.5
date: 2007/11/21 11:43:37;  author: fenglich;  state: Exp;  lines: +3 -3
Update with the version Nick sent me

Could you give a precise description of what is wrong?

Comment 19 Nick Jones 2010-03-16 16:10:58 UTC

Created attachment 831 [details]
Our results

Comment 20 Nick Jones 2010-03-16 16:11:55 UTC

I believe in the expected results, fn-doc-23.txt there are a couple of extra
new lines.

Particularly in the textNode at line 66180
and the textNode at line 66424

I've attached a file which I believe is the correct results, zipped to
hopefully preserver it correctly.

Comment 21 Andrew Eisenberg 2010-07-06 16:14:06 UTC

Created attachment 889 [details]
latest result for fn-doc-23

Comment 22 Michael Kay 2010-07-06 16:31:50 UTC

Saxon's results show a large number of differences from those in the comment #21 attachment. Here are the first few:

<difference node="739">
   <gold>10</gold>
   <actual>8232</actual>
</difference>
<difference node="740">
   <gold>10</gold>
   <actual>8232</actual>
</difference>
<difference node="741">
   <gold>10 10</gold>
   <actual>8232 8232</actual>
</difference>
<difference node="742">
   <gold>10 10</gold>
   <actual>8232 8232</actual>
</difference>

Here "gold" means the comment #22 results, "actual" means the Saxon 9.2 results. Clearly the product that produced the comment #22 results differs from Saxon in its handling of the character 8232 (x2028, the Unicode line separator character). This character has a special meaning in XML 1.1 so there may be legitimate differences here.

Saxon is still showing the same three differences from the CVS reference results as reported in comment #15

Comment 23 Andrew Eisenberg 2010-09-08 18:52:18 UTC

At the Sept. 8 XML Query/XSL WG meeting, I was asked to remove test case fn-doc-23, and I have just done so.

Work on this item will resume after we have published XQTS 1.0.3 and collected our implementation results for the family of XQuery/XPath PER documents.

Comment 24 Norman Walsh 2011-02-23 20:24:00 UTC

Moving it over to Andrew until such time as it resurfaces.

Comment 25 Andrew Eisenberg 2011-02-23 22:16:52 UTC

I've got nothing new for this test case. I'll let Benjamin decide how to proceed.

Comment 26 O'Neil Delpratt 2016-03-22 14:49:25 UTC

I am closing this bug as I believe it has been resolved by removing the test case.