This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 2441 - xqx: character references
Summary: xqx: character references
Status: RESOLVED FIXED
Alias: None
Product: XML Query Test Suite
Classification: Unclassified
Component: XML Query Test Suite (show other bugs)
Version: 1.0.1
Hardware: PC Windows XP
: P2 normal
Target Milestone: ---
Assignee: Ravindranath (Ravi) Chennoju
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-11-04 00:31 UTC by David Carlisle
Modified: 2007-01-23 00:48 UTC (History)
0 users

See Also:


Attachments

Description David Carlisle 2005-11-04 00:31:11 UTC
The XqueryX files appear to have incorrectly translated character references.
eg Constr-elem-curlybr-4.xq
has }
which is encoded as
                    <xqx:stringConstantExpr>
                      <xqx:value>&amp;#x7d;</xqx:value>
                    </xqx:stringConstantExpr>

which translates back with the stylesheet to
&amp;#x7d;

The translator needs to encode character references by themselves, or indeed by
the characters referenced. xq2xqx encodes this test file using
               <xqx:stringConstantExpr>
                  <xqx:value>}</xqx:value>
               </xqx:stringConstantExpr>


which does translate back to an equivalent query.

(This affects several tests files)
Comment 1 David Carlisle 2006-06-13 12:40:31 UTC
still the same in 0.9.4 (this affects lots of files, any ones using & in the XQuery)
Comment 2 David Carlisle 2006-07-16 00:14:18 UTC
(In reply to comment #1)
> still the same in 0.9.4 (this affects lots of files, any ones using & in the
> XQuery)
> 
still the same in the xqueryx.zip posted to public cvs today affects
around 80 files as far as I can see
Queries/XQuery/Expressions/PrimaryExpr/Literals/Literals056.xq
Queries/XQuery/Expressions/PrimaryExpr/Literals/Literals057.xq
Queries/XQuery/Expressions/PrimaryExpr/Literals/Literals058.xq
Queries/XQuery/Expressions/PrimaryExpr/Literals/Literals059.xq
Queries/XQuery/Expressions/PrimaryExpr/Literals/Literals060.xq
Queries/XQuery/Expressions/PrimaryExpr/Literals/Literals061.xq
Queries/XQuery/Expressions/PrimaryExpr/Literals/K-Literals-47.xq
Queries/XQuery/Expressions/PrimaryExpr/Literals/K-Literals-49.xq
Queries/XQuery/Expressions/Construct/DirectConElem/Constr-elem-curlybr-3.xq
Queries/XQuery/Expressions/Construct/DirectConElem/Constr-elem-curlybr-4.xq
Queries/XQuery/Expressions/Construct/DirectConElem/DirectConElemAttr/Constr-attr-ws-3.xq
Queries/XQuery/Expressions/Construct/DirectConElem/DirectConElemAttr/Constr-attr-ws-4.xq
Queries/XQuery/Expressions/Construct/DirectConElem/DirectConElemAttr/Constr-attr-ws-5.xq
Queries/XQuery/Expressions/Construct/DirectConElem/DirectConElemAttr/Constr-attr-charref-1.xq
Queries/XQuery/Expressions/Construct/DirectConElem/DirectConElemContent/Constr-cont-eol-3.xq
Queries/XQuery/Expressions/Construct/DirectConElem/DirectConElemContent/Constr-cont-eol-4.xq
Queries/XQuery/Expressions/Construct/DirectConElem/DirectConElemContent/Constr-cont-charref-1.xq
Queries/XQuery/Expressions/Construct/DirectConElem/DirectConElemWhitespace/Constr-ws-genchref-1.xq
Queries/XQuery/Expressions/Construct/DirectConElem/DirectConElemWhitespace/Constr-ws-genchref-2.xq
Queries/XQuery/Expressions/Construct/DirectConElem/DirectConElemWhitespace/Constr-ws-genchref-3.xq
Queries/XQuery/Expressions/Construct/DirectConElem/DirectConElemWhitespace/Constr-ws-genchref-4.xq
Queries/XQuery/Expressions/Construct/DirectConElem/DirectConElemWhitespace/Constr-ws-adjchref-1.xq
Queries/XQuery/Expressions/Construct/DirectConElem/DirectConElemWhitespace/Constr-ws-adjchref-2.xq
Queries/XQuery/Expressions/Construct/DirectConElem/DirectConElemWhitespace/Constr-ws-adjchref-3.xq
Queries/XQuery/Expressions/Construct/ComputeCon/ComputeConPI/Constr-comppi-space-2.xq
Queries/XQuery/Expressions/Construct/ComputeCon/ComputeConPI/Constr-comppi-space-4.xq
Queries/XQuery/Expressions/Construct/ComputeCon/ComputeConComment/Constr-compcomment-dash-3.xq
Queries/XQuery/Expressions/Construct/ComputeCon/ComputeConComment/Constr-compcomment-doubledash-3.xq
Queries/XQuery/Expressions/PrologExpr/BoundarySpaceProlog/boundary-space-005.xq
Queries/XQuery/Expressions/PrologExpr/BoundarySpaceProlog/boundary-space-006.xq
Queries/XQuery/Expressions/PrologExpr/BoundarySpaceProlog/boundary-space-007.xq
Queries/XQuery/Expressions/PrologExpr/BoundarySpaceProlog/boundary-space-008.xq
Queries/XQuery/Expressions/PrologExpr/BoundarySpaceProlog/boundary-space-009.xq
Queries/XQuery/Expressions/PrologExpr/BoundarySpaceProlog/boundary-space-010.xq
Queries/XQuery/Expressions/PrologExpr/BaseURIProlog/base-URI-2.xq
Queries/XQuery/Expressions/PrologExpr/BaseURIProlog/base-URI-3.xq
Queries/XQuery/Expressions/PrologExpr/BaseURIProlog/base-URI-4.xq
Queries/XQuery/Expressions/PrologExpr/BaseURIProlog/base-URI-5.xq
Queries/XQuery/Expressions/PrologExpr/BaseURIProlog/base-URI-6.xq
Queries/XQuery/Expressions/PrologExpr/BaseURIProlog/base-URI-18.xq
Queries/XQuery/Expressions/PrologExpr/NamespaceProlog/namespaceDecl-23.xq
Queries/XQuery/Expressions/PrologExpr/VariableProlog/InternalVariablesWithout/VarDecl009.xq
Queries/XQuery/CodepointToStringFunc/K-CodepointToStringFunc-9.xq
Queries/XQuery/CodepointToStringFunc/K-CodepointToStringFunc-10.xq
Queries/XQuery/CodepointToStringFunc/K-CodepointToStringFunc-13.xq
Queries/XQuery/CodepointToStringFunc/K-CodepointToStringFunc-16.xq
Queries/XQuery/CodepointToStringFunc/K-CodepointToStringFunc-17.xq
Queries/XQuery/CodepointToStringFunc/K-CodepointToStringFunc-18.xq
Queries/XQuery/CodepointToStringFunc/K-CodepointToStringFunc-21.xq
Queries/XQuery/CodepointToStringFunc/K-CodepointToStringFunc-22.xq
Queries/XQuery/CodepointToStringFunc/K-CodepointToStringFunc-23.xq
Queries/XQuery/Functions/AllStringFunc/GeneralStringFunc/NormalizeUnicodeFunc/fn-normalize-unicode1args-3.xq
Queries/XQuery/Functions/AllStringFunc/GeneralStringFunc/NormalizeUnicodeFunc/fn-normalize-unicode1args-4.xq
Queries/XQuery/Functions/AllStringFunc/GeneralStringFunc/NormalizeUnicodeFunc/fn-normalize-unicode1args-5.xq
Queries/XQuery/Functions/AllStringFunc/GeneralStringFunc/NormalizeUnicodeFunc/fn-normalize-unicode1args-6.xq
Queries/XQuery/Functions/AllStringFunc/GeneralStringFunc/NormalizeUnicodeFunc/fn-normalize-unicode2args-4.xq
Queries/XQuery/Functions/AllStringFunc/GeneralStringFunc/NormalizeUnicodeFunc/fn-normalize-unicode-1.xq
Queries/XQuery/Functions/AllStringFunc/GeneralStringFunc/TranslateFunc/fn-translate3args-2.xq
Queries/XQuery/Functions/AllStringFunc/Surrogates/surrogates01.xq
Queries/XQuery/Functions/AllStringFunc/Surrogates/surrogates02.xq
Queries/XQuery/Functions/AllStringFunc/Surrogates/surrogates03.xq
Queries/XQuery/Functions/AllStringFunc/Surrogates/surrogates04.xq
Queries/XQuery/Functions/AllStringFunc/Surrogates/surrogates05.xq
Queries/XQuery/Functions/AllStringFunc/Surrogates/surrogates06.xq
Queries/XQuery/Functions/AllStringFunc/Surrogates/surrogates07.xq
Queries/XQuery/Functions/AllStringFunc/Surrogates/surrogates09.xq
Queries/XQuery/Functions/AllStringFunc/Surrogates/surrogates10.xq
Queries/XQuery/Functions/AllStringFunc/Surrogates/surrogates11.xq
Queries/XQuery/Functions/AllStringFunc/Surrogates/surrogates12.xq
Queries/XQuery/Functions/AllStringFunc/Surrogates/surrogates13.xq
Queries/XQuery/Functions/AllStringFunc/Surrogates/surrogates14.xq
Queries/XQuery/Functions/AllStringFunc/Surrogates/surrogates15.xq
Queries/XQuery/Functions/AllStringFunc/EscapingFuncs/IRIToURIfunc/fn-iri-to-uri-18.xq
Queries/XQuery/Functions/AllStringFunc/EscapingFuncs/EscapeHTMLURIFunc/fn-escape-html-uri-20.xq
Queries/XQuery/Functions/AllStringFunc/EscapingFuncs/EscapeHTMLURIFunc/fn-escape-html-uri-21.xq
Queries/XQuery/Functions/AllStringFunc/MatchStringFunc/MatchesFunc/caselessmatch04.xq
Queries/XQuery/Functions/AllStringFunc/MatchStringFunc/MatchesFunc/caselessmatch05.xq
Queries/XQuery/Functions/AllStringFunc/MatchStringFunc/MatchesFunc/caselessmatch06.xq
Queries/XQuery/Functions/AllStringFunc/MatchStringFunc/MatchesFunc/caselessmatch07.xq
Queries/XQuery/Functions/NodeSeqFunc/SeqDocFunc/fn-doc-1.xq
Comment 3 David Carlisle 2006-07-24 13:06:25 UTC
This is still not fixed in the XqueryX.zip 1.4 posted at the weekend.
for example  Literals061 encodes "&#8364;" as 
<xqx:value>&amp;#8364;</xqx:value>
rather than
 <xqx:value>&#8364;</xqx:value>

David
Comment 4 David Carlisle 2006-08-04 16:45:22 UTC
Just to note that this problem (first reported on XQTS 0.8.0, and  confirmed in 0.8.{2,4,6} and  0.9.{0,4} is still present in the current.zip file in CVS, even though we're hopefully getting close to 1.0). 

David

Comment 5 Jim Melton 2006-08-10 15:58:32 UTC
I believe that I have fixed this problem in the most recent version of the XQueryX stylesheet (see Bugzilla bug # 3446, http://www.w3.org/Bugs/Public/show_bug.cgi?id=3446) and the revised stylesheet posted at http://www.w3.org/2005/XQueryX/xqueryx.xsl contains the fix.  Please try that out and let me know whether the problem persists. 
Comment 6 David Carlisle 2006-08-10 16:13:34 UTC
(In reply to comment #5)
> I believe that I have fixed this problem in the most recent version of the
> XQueryX stylesheet (see Bugzilla bug # 3446,
> http://www.w3.org/Bugs/Public/show_bug.cgi?id=3446) and the revised stylesheet
> posted at http://www.w3.org/2005/XQueryX/xqueryx.xsl contains the fix.  Please
> try that out and let me know whether the problem persists. 
> 

The problem wasn't in the stylesheet, the problem was/is that the XQueryX files in the test suite are wrong.

David
Comment 7 David Carlisle 2006-08-10 16:39:27 UTC
and just to confirm, the latest stylesheet just downloaded has no effect on these files. (Which is good as the existing stylesheet was working correctly in these cases)


 saxon Queries/XQueryX/Expressions/PrimaryExpr/Literals/Literals061.xqx xqueryx.xsl 

produces

 declare variable $input-context  external ;
"&amp;#8364;"



which is not equivalent to the xquery test

 cat Queries/XQuery/Expressions/PrimaryExpr/Literals/Literals061.xq               
(: Name: Literals061 :)
(: Description: Test for string literal containing the character reference '&#8364;' which transaltes into the 'Euro' currency symbol :)

(: insert-start :)
declare variable $input-context external;
(: insert-end :)

"&#8364;"
Comment 8 David Carlisle 2006-08-11 10:44:55 UTC
I'm not sure why this bug has remained open for 9 months I wouldn't have thought it that hard to fix the xqueryx files in the distribution (running
sed -e s/&amp;/&/
over them all would do the job)

However in case fixing the generator being used does prove difficult Perhaps I should repeat the standing offer that the xq2xml distribution contains a set of xqueryx versions of the test files, and is all distributed under the w3c software licence so you are welcome to use any of them, I was planning to wait until the XQTS 1.0 release before updating but have just updated today so there are currently 15038 xqueryx files available in
http://monet.nag.co.uk/xq2xml/xqxtest-20060811.zip
which include files that fix this problem as well as files that fix bug #3521
and fill in the gaps where the current test suite has no xqueryx file at all for some reason. If you want to drop some of these files into a 1.0 Test suite release feel free.

David
Comment 9 David Carlisle 2006-08-14 11:26:23 UTC
The list in comment #1 was for XQTS 0.9.4, the list updated for the current.zip
in cvs is the following 60 files:

Literals061
K-Literals-47
Constr-elem-curlybr-3
Constr-elem-curlybr-4
Constr-cont-eol-3
Constr-cont-eol-4
Constr-cont-charref-1
Constr-ws-genchref-1
Constr-ws-genchref-2
Constr-ws-genchref-3
Constr-ws-genchref-4
Constr-ws-adjchref-1
Constr-ws-adjchref-2
Constr-ws-adjchref-3
Constr-comppi-space-2
Constr-comppi-space-4
Constr-compcomment-dash-3
Constr-compcomment-doubledash-3
boundary-space-005
boundary-space-006
boundary-space-007
boundary-space-008
boundary-space-009
boundary-space-010
K-CodepointToStringFunc-9
K-CodepointToStringFunc-10
K-CodepointToStringFunc-13
K-CodepointToStringFunc-16
K-CodepointToStringFunc-17
K-CodepointToStringFunc-18
K-CodepointToStringFunc-21
K-CodepointToStringFunc-22
K-CodepointToStringFunc-23
fn-normalize-unicode1args-3
fn-normalize-unicode1args-4
fn-normalize-unicode1args-5
fn-normalize-unicode1args-6
fn-normalize-unicode2args-4
fn-normalize-unicode-1
fn-translate3args-2
surrogates01
surrogates02
surrogates03
surrogates04
surrogates05
surrogates06
surrogates07
surrogates09
surrogates10
surrogates11
surrogates12
surrogates13
surrogates14
surrogates15
fn-escape-html-uri-20
fn-escape-html-uri-21
caselessmatch04
caselessmatch05
caselessmatch06
caselessmatch07
Comment 10 David Carlisle 2006-08-15 16:48:45 UTC
In addition to the files in comment #9 (which use references in strings) the following use them in attribute values

Constr-attr-ws-3
Constr-attr-ws-4
Constr-attr-ws-5
Constr-attr-charref-1
Comment 11 Maxim Orgiyan 2006-09-15 17:41:05 UTC
I am back to working on XQueryX, and I am
looking at this bug along with the others..

I'll also need to clarify how entity
refs are handlded (possibly w/ Jim), since it's
not clear what the behaviour should be.

Thank you for the offer to use your XQueryX files -
that's up to the XQTTF, but it seems to
me that for consistency reason it would be better to have
all files come from a single generator.
Comment 12 David Carlisle 2006-09-15 18:29:18 UTC
>  since it's not clear what the behaviour should be.

I do not see any ambiguity in the current spec, what ambiguity do you see?

string literals in XQueryX should just encode the string in XML not the XML encoding of  the XQuery encoding of the string. so the string of length 1
consisting of an ampersand is encoded as &amp; not as &amp;amp;
The implementation in xqueryx.xsl also requires this, it is easy to check by 
running xqueryx.xsl on any of the xqueryx files listed in comment #9 that the resulting XQuery is not equivalent to the original XQuery file in the test suite.

David
Comment 13 Maxim Orgiyan 2006-09-28 17:27:29 UTC
(In reply to comment #12)
> >  since it's not clear what the behaviour should be.
> 
> I do not see any ambiguity in the current spec, what ambiguity do you see?
> 
> string literals in XQueryX should just encode the string in XML not the XML
> encoding of  the XQuery encoding of the string. so the string of length 1
> consisting of an ampersand is encoded as &amp; not as &amp;amp;
> The implementation in xqueryx.xsl also requires this, it is easy to check by 
> running xqueryx.xsl on any of the xqueryx files listed in comment #9 that the
> resulting XQuery is not equivalent to the original XQuery file in the test
> suite.
> 
> David
> 

Well, it seems that the resolution of entity references and character
references is not clearly defined...

XQueryX spec 3.1.1 says:

"Each predefined entity reference is replaced by the character it represents when the string literal is processed."

For example, take surrogates01.xq that you mention as one of the
files having the problem.

The XQueryX we currently generate for this includes
the escaped & character:

                      <xqx:stringConstantExpr>
                        <xqx:value>abc&amp;#x1D156;def</xqx:value>
                      </xqx:stringConstantExpr>

The stylesheet converts the XQueryX to:

string-length("abc&amp;#x1D156;def")

Are string-length("abc&#x1D156;def") and string-length("abc&amp;#x1D156;def")? That would depend on the rules for resolving entity refs and character refs...
at least one XQuery processor I tried resolves these two strings to the same string value,
and returns the same answer for both these queries: 7.

So what are the exact rules for the resolution of entity refs and character
refs?
Comment 14 Maxim Orgiyan 2006-09-28 17:43:51 UTC
Also, you say "string literals in XQueryX should just encode the string in XML not the XML encoding of  the XQuery encoding of the string. "

So, for example, string-length("<"), should in fact be encoded as:

<xqx:stringConstantExpr>
   <xqx:value>&lt;</xqx:value>
</xqx:stringConstantExpr>

So which characters should be replaced by entity refs when producing XQueryX? 
Seems that the stylesheet assumes ",',<,> should be replaced but not &?


Comment 15 David Carlisle 2006-09-28 21:47:02 UTC
> Well, it seems that the resolution of entity references and character
> references is not clearly defined...

I honestly am struggling to see any ambiguity in the current specification
and so I'm not sure I can really answer your questions in a helpful way but
I'll try.

> "Each predefined entity reference is replaced by the character it represents
> when the string literal is processed."

There are five predefined entities, including amp and that means that &amp; gets replaced by an ampersand character. The whole point of writing &amp; rather than & is to _stop_ it being used as markup so it is absolutely clear that in XQuery as in XML 
&amp;#1234;is the 7 characters 7 # 1 2 3 4 ; not a reference to the character
with codepoint 1234. It would be absolutely bizare if Xquery were defined otherwise, as it would be using XML syntax with completely different semantics.

> Are string-length("abc&#x1D156;def") and string-length("abc&amp;#x1D156;def")?
> at least one XQuery processor I tried resolves these two strings to the same
> string value,
bugs happen, report it as a bug to that system's maintainers, That is unquestionably a bug.

> So which characters should be replaced by entity refs when producing XQueryX? 
> Seems that the stylesheet assumes ",',<,> should be replaced but not &?
as always when writing xml (or xml-like) syntax you just need to quote those characters that have special significance in XML, which includes &, and this is what the stylesheet does, see the template name="quote" which
                          <xsl:with-param name="toBeReplaced">&amp;</xsl:with-param>
                          <xsl:with-param name="replacement">&amp;amp;</xsl:with-param>


David
Comment 16 Maxim Orgiyan 2006-09-29 00:41:05 UTC
> > "Each predefined entity reference is replaced by the character it represents
> > when the string literal is processed."
> 
> There are five predefined entities, including amp and that means that &amp;
> gets replaced by an ampersand character. The whole point of writing &amp;
> rather than & is to _stop_ it being used as markup so it is absolutely clear
> that in XQuery as in XML 
> &amp;#1234;is the 7 characters 7 # 1 2 3 4 ; not a reference to the character
> with codepoint 1234. It would be absolutely bizare if Xquery were defined
> otherwise, as it would be using XML syntax with completely different semantics.
> 
> > Are string-length("abc&#x1D156;def") and string-length("abc&amp;#x1D156;def")?
> > at least one XQuery processor I tried resolves these two strings to the same
> > string value,
> bugs happen, report it as a bug to that system's maintainers, That is
> unquestionably a bug.

>>> Ok. Well, I would interpret this in the
>>> same way. My point, however, is that
>>> this is not stated anywhere in the XQuery spec - that 
>>> what I mean by "ambiguity".
>>>
>>> I am validating w/ Jim whether this is the intended meaning.
Comment 17 Michael Kay 2006-09-29 07:37:31 UTC
I'm puzzled that you don't find the XQuery spec clear on the subject of how predefined entity references are handled. It seems eminently clear to me. 

There are three places they can occur: in string literals, in attribute content, and in element content.

For string literals, section 3.1.1 spells out the rules and seems entirely clear.

For attribute content, rule 1 says "Attribute value normalization is then applied to normalize whitespace and expand character references and predefined entity references. " This spells out the rules by reference to the XML specification (which describes the interaction of entity expansion and whitespace normalization): the rules are complicated, but I think they are unambiguous.

For element content, section 3.7.1.3 rule 1b gives the rules by reference to the rules in 3.1.1 for string literals.

So what exactly is it that you think isn't stated clearly in the XQuery specification?

(You alleged that one implementation did double-expansion of entity references, turning &amp;&lt; into a less-than-sign. I think it's quite clear in the XQuery spec that processors mustn't do that. If you're in element content, for example, no possible reading of section 3.7.1.3 would allow that interpretation. In any case, as David Carlisle points out, common sense should give you the same answer: if an ampersand written as &amp; were treated in the same way as one written as &, why would the specification bother to provide a way of escaping the character in the first place?)

Michael Kay
Comment 18 Maxim Orgiyan 2006-09-29 10:02:15 UTC
(In reply to comment #17)

Michael,

Section 3.7.1.3 states:

"Predefined entity references and character references are expanded into their referenced strings, as described in 3.1.1 Literals."

And section 3.1.1 states:

"Each predefined entity reference is replaced by the character it represents when the string literal is processed."

It doesn't say anything about how character refs are processed (as
far as I can see), but does give some example of string value with character refs.

Given these descriptions, one possible algorithm, for example, is to process a string
by first applying all entity ref replacements, and then all the character
reference replacements on the resulting string. Which is what at least
one processors I tried appears to do.

But yes, I agree with the common-sense interpretation David gives.

> I'm puzzled that you don't find the XQuery spec clear on the subject of how
> predefined entity references are handled. It seems eminently clear to me. 
> 
> There are three places they can occur: in string literals, in attribute
> content, and in element content.
> 
> For string literals, section 3.1.1 spells out the rules and seems entirely
> clear.
> 
> For attribute content, rule 1 says "Attribute value normalization is then
> applied to normalize whitespace and expand character references and predefined
> entity references. " This spells out the rules by reference to the XML
> specification (which describes the interaction of entity expansion and
> whitespace normalization): the rules are complicated, but I think they are
> unambiguous.
> 
> For element content, section 3.7.1.3 rule 1b gives the rules by reference to
> the rules in 3.1.1 for string literals.
> 
> So what exactly is it that you think isn't stated clearly in the XQuery
> specification?
> 
> (You alleged that one implementation did double-expansion of entity references,
> turning &amp;&lt; into a less-than-sign. I think it's quite clear in the XQuery
> spec that processors mustn't do that. If you're in element content, for
> example, no possible reading of section 3.7.1.3 would allow that
> interpretation. In any case, as David Carlisle points out, common sense should
> give you the same answer: if an ampersand written as &amp; were treated in the
> same way as one written as &, why would the specification bother to provide a
> way of escaping the character in the first place?)
> 
> Michael Kay
> 
Comment 19 Maxim Orgiyan 2006-09-29 10:14:19 UTC
(In reply to comment #8)
> I'm not sure why this bug has remained open for 9 months I wouldn't have
> thought it that hard to fix the xqueryx files in the distribution (running
> sed -e s/&amp;/&/
> over them all would do the job)
> 
> However in case fixing the generator being used does prove difficult Perhaps I
> should repeat the standing offer that the xq2xml distribution contains a set of
> xqueryx versions of the test files, and is all distributed under the w3c
> software licence so you are welcome to use any of them, I was planning to wait
> until the XQTS 1.0 release before updating but have just updated today so there
> are currently 15038 xqueryx files available in
> http://monet.nag.co.uk/xq2xml/xqxtest-20060811.zip
> which include files that fix this problem as well as files that fix bug #3521
> and fill in the gaps where the current test suite has no xqueryx file at all
> for some reason. If you want to drop some of these files into a 1.0 Test suite
> release feel free.
> 
> David
> 


David, it looks like in certain cases the XQueryX implementation
should escape &amp;.

For example:

<!--<?&-&lt;&#x20;><![CDATA[x]]>-->

Is currently correctly encoded as:

 <xqx:value>&lt;?&amp;-&lt;&amp;#x20;&gt;&lt;![CDATA[x]]&gt;</xqx:value>

So, it doesn't seem to be a blind replace of &amp with & as suggested above
(I am, btw, actually fixing the generator rather then modifying
the queries)... right?
Comment 20 David Carlisle 2006-09-29 10:36:36 UTC
(In reply to comment #18)

> Given these descriptions, one possible algorithm, for example, is to process a
> string
> by first applying all entity ref replacements, and then all the character
> reference replacements on the resulting string. Which is what at least
> one processors I tried appears to do.


The spec does not explictly say that algorithm is not used, but it can not list all possible non-used algorithm. if such double parsing were to be used the string "&amp;" would, like the string "&" be a syntax error (unterminated reference, element content of &lt;a/&gt; would generate an element node, etc.
There is no way that the spec can be interepreted in that way.

David
Comment 21 David Carlisle 2006-09-29 10:44:47 UTC
(In reply to comment #19)

> David, it looks like in certain cases the XQueryX implementation
> should escape &amp;.
> ..
> So, it doesn't seem to be a blind replace of &amp with & as suggested above

I'd assumed that your convertor was always double escaping and so removing one level would fix it (it certainly fixes most) if your convertor is sometimes double escaping and sometimes not, them clearly you only need to remove the double escaping at those places where it was added.


> 
> For example:
> 
> <!--<?&-&lt;&#x20;><![CDATA[x]]>-->
> 
> Is currently correctly encoded as:
> 
>  <xqx:value>&lt;?&amp;-&lt;&amp;#x20;&gt;&lt;![CDATA[x]]&gt;</xqx:value>
> 

that encoding is incorrect.

Given the XQuery

<!--<?&-&lt;&#x20;><![CDATA[x]]>-->

xq2xqx produces


<xqx:module xmlns:xqx="http://www.w3.org/2005/XQueryX">
   <xqx:mainModule>
      <xqx:queryBody>
         <xqx:computedCommentConstructor>
            <xqx:argExpr>
               <xqx:stringConstantExpr>
                  <xqx:value>&lt;?&amp;-&amp;lt;&amp;#x20;&gt;&lt;![CDATA[x]]&gt;</xqx:value>
               </xqx:stringConstantExpr>
            </xqx:argExpr>
         </xqx:computedCommentConstructor>
      </xqx:queryBody>
   </xqx:mainModule>
</xqx:module>

which when processed with the standard stylesheet produces

 comment{"&lt;?&amp;-&amp;lt;&amp;#x20;>&lt;![CDATA[x]]>"}

which is an equivalent query, both produce the XML

<!--<?&-&lt;&#x20;><![CDATA[x]]>-->

If however the xqx:value-of element is replaced by the element that you suggested, then the standard xqueryx stylesheet produces

 comment{"&lt;?&amp;-&lt;&amp;#x20;>&lt;![CDATA[x]]>"}

which is not an equivalent query, when executed it produces
<!--<?&-<&#x20;><![CDATA[x]]>-->


which is an entirely different XML comment.



David
Comment 22 David Carlisle 2006-09-29 11:46:20 UTC
(In reply to comment #21)

> > For example:
> > 
> > <!--<?&-&lt;&#x20;><![CDATA[x]]>-->

Note also that this thread is about character and entity references, but that example does not have any character or entity references (just as it does not have any PI constructor or CDATA section) so is not really an example of anything discussed here. 
Comment 23 Maxim Orgiyan 2006-09-29 17:27:57 UTC
(In reply to comment #21)
> (In reply to comment #19)
> 
> > David, it looks like in certain cases the XQueryX implementation
> > should escape &amp;.
> > ..
> > So, it doesn't seem to be a blind replace of &amp with & as suggested above
> 
> I'd assumed that your convertor was always double escaping and so removing one
> level would fix it (it certainly fixes most) if your convertor is sometimes
> double escaping and sometimes not, them clearly you only need to remove the
> double escaping at those places where it was added.
> 
> >>> I see. Well, the convertor is not "sometimes double escaping and 
  >>> sometimes    not".
  >>> It is always replacing "&" with "&amp;" (which, I agree,
  >>> is likely not correct
  >>> given the common-sense interpretation of how entity/character refs
  >>> should be resolved).
  >>>
  >>> As far as the query - that's a bug.. I copied the text
  >>> from the wrong query yesterday. Obviously the "&" before "lt;" is 
  >>> "&amp;" in the current encoding, because all "&" are replaced with "&amp":
  >>>
  >>> &lt;?&amp;-&amp;lt;&amp;#x20;&gt;&lt;![CDATA[x]]&gt;
Comment 24 Maxim Orgiyan 2006-09-29 17:34:49 UTC
(In reply to comment #20)
> (In reply to comment #18)
> 
> > Given these descriptions, one possible algorithm, for example, is to process a
> > string
> > by first applying all entity ref replacements, and then all the character
> > reference replacements on the resulting string. Which is what at least
> > one processors I tried appears to do.
> 
> 
> The spec does not explictly say that algorithm is not used, but it can not list
> all possible non-used algorithm. if such double parsing were to be used the
> string "&amp;" would, like the string "&" be a syntax error (unterminated
> reference, element content of &lt;a/&gt; would generate an element node, etc.
> There is no way that the spec can be interepreted in that way.
> 
> David
>>> Not necessarily. I think you're assuming the
>>> XQuery rules would apply after the second pass of such algorithm,
>>> but that doesn't have to be the case.
>>>
>>> I am not asking XQuery to like all possible non-used algorithm. 
>>> It could, very easily and precisely, however, give the *one*
>>> algorithm to be used. Which would eliminate potential ambiguities.
> 
Comment 25 Maxim Orgiyan 2006-09-29 17:51:42 UTC
(In reply to comment #22)
> (In reply to comment #21)
> 
> > > For example:
> > > 
> > > <!--<?&-&lt;&#x20;><![CDATA[x]]>-->
> 
> Note also that this thread is about character and entity references, but that
> example does not have any character or entity references (just as it does not
> have any PI constructor or CDATA section) so is not really an example of
> anything discussed here. 
> 

I think this is definitely relevant. This is about encoding "&"
correctly, with the entity reference &amp;.

Moreover, I was commenting on the suggestion to 
" fix the xqueryx files in the distribution (running
sed -e s/&amp;/&/
over them all would do the job)", which would not be correct
given what the current generator outputs.

In any case, thanks for the feedback... I am making a tentative
modification to the generator
based on the common-sense interpretation of entity/character
ref processing, but will await Jim's confirmation before committing.
Comment 26 David Carlisle 2006-10-01 20:58:48 UTC
> In any case, thanks for the feedback... I am making a tentative
> modification to the generator
> based on the common-sense interpretation of entity/character
> ref processing, but will await Jim's confirmation before committing.
> 


I trust that this will be done _before_ any update to XQTS is made.
It would be unreasonable to ask any implementors to do any CR testing of
an XqueryX implementation before this is fixed.