This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
This was raised by Jonathan Robie in #1863 (marked as closed) but is still presentin the 0.7 release. Literals056.txt is specified as Text comparison but contains the text & I would have expected &. Actually the meaning of the text comparison as still very unclear. http://www.w3.org/XML/Query/test-suite/Guidelines%20for%20Running%20the%20XML%20Query%20Test%20Suite.html says of the Text comparison Text: text is compared using byte-comparison. I, and apparently other testers, have interpreted this as meaning that the result should be serialised using the text output method in utf8 and compared to the supplied result. (Although actually I compare the string value of the result with the expected result as xpath strings (so a unicode character equality not byte comparison of its utf8 encoding). However the new guidelines are now explict that the results are always to be serialised with a specified set of serialisation paremters, including method=xml. If Literals056.txt is to be compared with the output of an xml serialisation, then it does need to be & not < (and the other examples that -were- changed for bug #1863 need changing back) in this case other results would need to be allowed (eg using numeric character reference or CDATA section to quote the &) Otherwise the guidelines need to say that text serialisation should be used for the Text comparison. Alternatively (and perhaps preferably) all instances of Text comparison could be replaced by Fragment. That reduces the dependence on support for multiple serialisation forms which is an optional feature for Xquery. http://www.w3.org/TR/2003/WD-xquery-20031112/#id-serialization makes it clear that even if serialisation is supported a system need only support method="xml" which means that a conforming application may fail all tests using Text comparison.
We have clarified this in the task force and will add some text to the upcoming release to make this more clear. Basically, we require all results to be serialized using XML serialization. In this case, simple string literals will be output as a top-level text node with the special XML characters escaped. So, for Literals056, the results is correct - while we had incorrect results for the output of the other special XML characters. I have now fixed these.
I must say this is a slightly surprising outcome (although at least it will be consistent, even though it reverses the resolution of bug #1863). However I now can't see any reason for the Text comparison at all, there are virtually no features of an XML serialisation that can be compared safely in a byte-for-byte manner. In my own test harness I currently don't serialise the result at all, XML and Fragment are compared by loading the file as XML (after wrapping in an element in the case of Fragment) and then compared using deep-equal() (Or I could write a different recursive equality function that took different actions on comments etc, but that wouldn't change this issue) Currently I compare Text by reading the Expected result with unparsed-text() and comparing using = with the string value of the result. With the clarification you give here I would change my test harness to treat Text in exactly the same way as Fragment. Even if I was serialising and then comparing (which is closer to the offical method) I still couldn't see any difference between Text and Fragment comparison. If the output has no attributes or namespaces, Text and Fragment are the same and if the output does have attributes or namepsaces, you'd want to compare as Fragment rather than Text so that you can write all attributes in some canonical order before comparing, wouldn't you? David
I agree - technically, the 'text' comparator could be subsumed by the 'Fragment' comparator. So could the 'XML' comparator (provided we do not have expected results which contain an XML declaration and/or DTD - which we do not). The origional idea of the different comparators was to give the user an indication of what to expect from the results - initially, either an XML result or a textual string. But, with the additional complications raised by the different serialization methods and the neccessity of adding the 'Fragment' comparator, these differences have been made less important. So, basically, these comparators are just an indication to the user about what type of results to expect. The actual technical details of how each comparator is implemented are pretty similar - the comparator is more of a conceptual idea than a technical requirement. Though we do provide technical details of how each should be implemented in the Test Execution Guidelines document - as you have pointed out, these implementations are all very similar.
Thanks for the further feedback. Personally I'd prefer to see XML on all well formed expected results and Fragment on any that are not well formed, and no Text at all, but if you think that having some marked as Text will help some other testers, I don't actually object to that. So I'm closing this report, as at least I now know what I'm supposed to do, which is the main thing! David