This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
This query loads in a document containing much whitespace and then outputs it. I believe some of the whitespace in the expected output is incorrect. For example the last textNode element in the source is: <textNode>   







































































































































</textNode> in the expected results this becomes textNode containing a series of spaces followed by a series of "
". But I think it should be: <textNode> 

&#D; (lots of newlines) 

etc....</textNode> I think there are other problems, but are quite hard to describe in a 160K file of various blank characters!
*** Bug 5088 has been marked as a duplicate of this bug. ***
See report #5088, for a nifty script for debugging this test. I'll have an updated version submitted soon.
Updated fn-doc-23.txt in CVS with the version Nick sent in private mail. XQTS_current.zip is not yet updated.
Thanks. Have checked out from CVS and the version there works fine for me, although I did generate the file so someone else might want to confirm before I close....
Created attachment 506 [details] My result for this test case. This test is failing for me. I see a difference with the expected result after line 65724. I'm at a loss to determine why this difference occurs and which is at fault.
I think one difference is from 5 XML Output Method of the XSLT 2.0 and XQuery 1.0 Serialization spec: Specifically, CR, NEL and LINE SEPARATOR characters in text nodes MUST be output respectively as "
", "…", and "
", or their equivalents; while CR, NL, TAB, NEL and LINE SEPARATOR characters in attribute nodes MUST be output respectively as "
", "
", "	", "…", and "
", or their equivalents. So one differences is that we are outputting 
 where you are outputting the character itself. I think we are correct. There are other differences but am still trying to track them down.
Andrew, did you get any further on finding what could be wrong with this test?
Nick, did you get any further on tracking this down? As someone pointed out to me, this is a messy test. It's big and hard to debug. But it also bring good sides, since it contains complexity and therefore stress implementations in an area where the spec is complex and where implementations tends to become elaborated(e.g, compression schemes for whitespace-only nodes).
Sorry. This slipped off my todo list. The last difference I found between ourselves/Saxon and Andrew's output was in how we were outputting the LINE SEPARATOR as mentioned in #6. I'll try to have a look again at the other differences sometime this week and get back to you.
Have found another problem The unicode character   (em quad) we are serializing in UTF-8 as the following series of bytes: 0xE2 0x80 0x81 but Andrew's file has this character serialized as: OxE2 0x80 0x3F This can be seen at line 65606 of his file. I believe our serialization is correct, and that his output doesn't represent any valid unicode character.
Similarly: The unicode character   (medium mathematical space) we are serializing in UTF-8 as the following series of bytes: 0xE2 0x81 0x9F but Andrew's file has this character serialized as: 0xE2 0x3F 0x9F It appears that the byte 0x81 appears nowhere in Andrew's output, that all instances have been replaced by the byte 0x3F. This is definitely an invalid utf-8 encoding, as the second/third bytes should all be higher that 7F.
Created attachment 551 [details] my updated actual result Nick, your comments led me to a bug in the serialization of my query result to a file. I compare my actual result to the expected results prior to this serialization, and so I still see a difference after line 65724.
I think at line 65724 we have what I was referring to in comment #6. So this is where the lines of 
 (LINE SEPARATOR) characters start, which should be output as "
" not as a LINE SEPARATOR character (E2 80 81).
I see that my product is not processing the 
 characters correctly. It will be a while until I see a fix and can run this test again. Nick, if you are happy with the expected result, then I suggest that you close this bug.
Saxon is currently showing three differences between the expected test results and the actual results. The first (using the diagnostic query in bug #5088) is: <difference node="813"> <gold>32 8195 10 160</gold> <actual>32 8195 10 10 160</actual> </difference> It's entirely possible, of course, that the files have been corrupted in the course of CVS download, as in bug #5686.
Awaiting reply in private mail from Michael K.
I've re-opening this as I believe the issue is still outstanding.
Nick, seems I latest committed a version I received from you: Revision 1.5 date: 2007/11/21 11:43:37; author: fenglich; state: Exp; lines: +3 -3 Update with the version Nick sent me Could you give a precise description of what is wrong?
Created attachment 831 [details] Our results
I believe in the expected results, fn-doc-23.txt there are a couple of extra new lines. Particularly in the textNode at line 66180 and the textNode at line 66424 I've attached a file which I believe is the correct results, zipped to hopefully preserver it correctly.
Created attachment 889 [details] latest result for fn-doc-23
Saxon's results show a large number of differences from those in the comment #21 attachment. Here are the first few: <difference node="739"> <gold>10</gold> <actual>8232</actual> </difference> <difference node="740"> <gold>10</gold> <actual>8232</actual> </difference> <difference node="741"> <gold>10 10</gold> <actual>8232 8232</actual> </difference> <difference node="742"> <gold>10 10</gold> <actual>8232 8232</actual> </difference> Here "gold" means the comment #22 results, "actual" means the Saxon 9.2 results. Clearly the product that produced the comment #22 results differs from Saxon in its handling of the character 8232 (x2028, the Unicode line separator character). This character has a special meaning in XML 1.1 so there may be legitimate differences here. Saxon is still showing the same three differences from the CVS reference results as reported in comment #15
At the Sept. 8 XML Query/XSL WG meeting, I was asked to remove test case fn-doc-23, and I have just done so. Work on this item will resume after we have published XQTS 1.0.3 and collected our implementation results for the family of XQuery/XPath PER documents.
Moving it over to Andrew until such time as it resurfaces.
I've got nothing new for this test case. I'll let Benjamin decide how to proceed.
I am closing this bug as I believe it has been resolved by removing the test case.