2476 – Literals057/058 results: wrong comparator

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 2476 - Literals057/058 results: wrong comparator

Summary: Literals057/058 results: wrong comparator

Status:	RESOLVED FIXED

Alias:	None

Product:	XML Query Test Suite
Classification:	Unclassified
Component:	XML Query Test Suite (show other bugs)
Version:	0.8.0
Hardware:	PC Windows XP

Importance:	P2 normal
Target Milestone:	---
Assignee:	Mike Rorke
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2005-11-08 14:02 UTC by Michael Kay
Modified:	2005-11-10 23:10 UTC (History)
CC List:	0 users

See Also:

Attachments

Description Michael Kay 2005-11-08 14:02:02 UTC

The results for Literals057/058 should use Fragment comparator rather than Text
comparator, since the reference results contain XML entity/character references.

Similarly for Literals062/063/064/065 and 068/069

Comment 1 David Carlisle 2005-11-08 16:31:03 UTC

Mike, see
http://www.w3.org/Bugs/Public/show_bug.cgi?id=2402
which is the same issue (except it was raised on 0.7 so it is phrased in reverse:-)

In my own harness I have just decided to treat Text as Fragment always.
(I don't see any occasions when it is safe to do as the documentation advises
and test the result of an XML serialisation using byte comparison: any character
could be arbitrarily serialised using a character ref, which would fail a
byte-comparison test), although actually I never serialise the result at all,
and do xpath comparison of the result trees.

David

Comment 2 Mike Rorke 2005-11-09 18:38:52 UTC

With the new requirement that all results be serialized using the XML 
serialization option, the 'Fragment' and 'Text' comparators become essentially 
equivalent. We have retained this distinction in the catalog to try and 
indicate to the user that a scalar value is being returned ('Text' comparator) 
as opposed to some sequence of XML nodes. I believe the results as they stand 
in 0.8 are correct though right?

Comment 3 Michael Kay 2005-11-09 22:24:53 UTC

I wasn't aware of a change in this area: and it doesn't seem to have been
reflected in the documentation. The "Guidelines for running tests" say that
compare="Text" means the results should be compared using "byte comparison",
which means that you should report a fail if your results are ["] and the
expected results are [&quot;].

I agree it makes sense to get rid of the distinction between Fragment and Text,
but such changes need to be properly documented and announced.

I've reopened the bug to allow the documentation to be fixed.

Michael Kay

Comment 4 Mike Rorke 2005-11-09 22:43:56 UTC

The results still need to be compared using 'byte comparisson' - it's the way 
that the results of the query are serialized that have changed. Previously, we 
said that the user should use 'XML serialization' for 'Fragment' and 'XML' 
comparators and 'text serialization' for 'Text'. But, it turns out that the 
only required serialization method in XQuery is 'XML serialization', so we 
can't rely on implementations having any form of 'text serialization'. This 
should already be spelled out in the 0.8 draft where we specify the 
serialzation settings used to generate the results.

For the scalar results (i.e. those where the verifier is 'Text'), this means 
that they should serialize their results as though they were a top-level XML 
text node. The only changes this makes to the stored results is that the 
special XML characters (i.e. <, >, &, " and ') are serialized in their 
entitized form (i.e. &lt;, &gt;, &amp;, &quot; and &apos; respectively). The de-
entitization of these characters is a side-effect of the 'text serialzation', 
so should not be performed when using 'XML serialization'.

In your case, if you used a fragment verifier with a scalar result whose 
expected value was '<' - adding XML elements around this would give you 
<container><</container> which is invalid XML (the '<' must be entitized), so 
we really need to store these characters in their entitized forms.

Comment 5 Michael Kay 2005-11-09 23:12:59 UTC

Given a query whose result sequence is a text node containing a single
character, namely a double quotation mark, there are at least four legal outputs
of the XML serialization method:

"
&quot;
&#x22;
&#37;

I would expect most processors to use the first, but the reference results use
the second and this is also legal.

Since serialization using the XML output method may produce any of these forms
and they are all equivalent, comparison of the serialized results byte-for-byte
is clearly not an option.

Comment 6 Mike Rorke 2005-11-10 00:40:08 UTC

I was under the impression that there was a single, well-defined method for 
storing these special characters in an XML test node. Apparently, this is not 
the case and we allow multiple different options. I have updated the test 
catalog with multiple results for these cases, to handle all the different 
serialization options.

Comment 7 David Carlisle 2005-11-10 11:24:59 UTC

(In reply to comment #6)
> I was under the impression that there was a single, well-defined method for 
> storing these special characters in an XML test node. Apparently, this is not 
> the case and we allow multiple different options. I have updated the test 
> catalog with multiple results for these cases, to handle all the different 
> serialization options.

No, sorry that is not enough.
It is not just "special characters" it is _all_ characters.

When the result contains a character such as  "1"
the Xquery engine is allowed, using the XML serialisation, to serialise it as
"1" or "&#x49;" or "&#x31;" or anything else that will parse to give a character
1. So it is _never_ safe to do a byte comprison of the XML serialisation with
the supplied file. You always need to parse the supplied result file as XML and 
then serialise it using exactly the same canonical serialisation you are using
for your result, and then compare. In other words you need to follow the
documentation given for the Fragment comparison, not the documentation for the
Text comparison.

It would be simpler just to globally replace "Text" by "Fragment" in the catalog
file.

David

Comment 8 Mike Rorke 2005-11-10 23:10:10 UTC

You are correct - we need to update the test execution guidelines to state that 
XML canonicalization needs to be applied to values that use the 'Text' 
comparator too. There is still a valid distinction between 'Text' 
and 'Fragment' in the test suite though - 'Text' comparator results represent 
scalar values while 'Fragment' represents XML or sequences of XML. While the 
test harnesses may choose to implement 'Text' and 'Fragment' with the same 
semantics (i.e. add container elements and canonicalize for both), we would 
still like to retain this distinction between the different types of results in 
the catalog.