This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 3776 - fn:codepoints-to-string should allow XML 1.1 characters
Summary: fn:codepoints-to-string should allow XML 1.1 characters
Status: CLOSED FIXED
Alias: None
Product: XML Query Test Suite
Classification: Unclassified
Component: XML Query Test Suite (show other bugs)
Version: unspecified
Hardware: PC Linux
: P2 normal
Target Milestone: ---
Assignee: Frans Englich
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-09-29 06:49 UTC by Per Bothner
Modified: 2007-01-16 01:42 UTC (History)
0 users

See Also:


Attachments

Description Per Bothner 2006-09-29 06:49:04 UTC
Some of the K-CodepointToStringFunc-NN are very strict as to what characters are allowed.  They test for characters that are not XML 1.0 characters.  I think it is reasonable for an implementation to allow XML 1.1 characters, but for example K-CodepointToStringFunc-11 does not allow codepoints-to-string(11).

(Perhaps, I think if there is a usecase for codepoints-to-string at all then probably there is a usecase for it allowing *all* code points, including 0.)
Comment 1 Michael Kay 2006-09-29 07:44:21 UTC
>Perhaps, I think if there is a usecase for codepoints-to-string at all then
probably there is a usecase for it allowing *all* code points, including 0.

I think I argued at one time that 0 should be allowed in the string data type even though it is not allowed in text and attribute nodes. But apparently there are some people still coding in an old 1960s language called C which has trouble with the concept of 0 as a character, and they objected.
Comment 2 Frans Englich 2006-09-29 08:54:06 UTC
Obviously XML 1.1 characters should be allowed, I'll look into this.

It's a pity the XQTS isn't sufficiently fine-grained to be able to test all features. It's not possible to write XML 1.0 and XML 1.1 tests. (I *think* the XSL-T catalog is better in this regard.)

Frans
Comment 3 David Carlisle 2006-09-29 09:42:02 UTC
> Obviously XML 1.1 characters should be allowed

Actually that is not obvious.
The F&O spec says:

If any of the code points in $arg is not a legal XML character, an error is raised
 
"legal XML character" isn't as far as I know a defined term, and in the case of XMl 1.1 there is some ambiguity as to whether it should include #1 (which is not allowed _as a character_ in XML, only as a character reference.
F&O should probably instead refer to character allowed by xml infoset (or xsd:string) which then in the case of xml 1.1 would allow control characters.


David
Comment 4 Frans Englich 2006-09-29 10:09:02 UTC
I meant if XML 1.0 is in use, XML 1.1-specific character are considered invalid, while if XML 1.1 is in use, the full range of XML 1.0+1.1 characters are valid. Considering that the XQTS doesn't distinguish between XML 1.0/1.1, it effectively means that every test must work for 1.0 and 1.1(unfortunately). The best one can do is to have tests with codepoints that are invalid in XML 1.0 but valid in 1.1, and let implementations both pass and fail.

Feel of course free to open up an editorial bug on F&O. I agree it isn't as clear here as it usually is in areas specific to the XML version.
Comment 5 Per Bothner 2006-09-29 16:34:52 UTC
(In reply to comment #1)
> I think I argued at one time that 0 should be allowed in the string data type
> even though it is not allowed in text and attribute nodes. But apparently there
> are some people still coding in an old 1960s language called C which has
> trouble with the concept of 0 as a character, and they objected.

For the same reason we should remove the xs:decimal type and only support 32-bit integer arithmetic.  Frankly, I have a hard time understanding this concern: 0 is a perfecticly valid character is C, and in Java, which is commonly implemented using C.  This is only an issue if you want to represent XQuery strings using zero-terminated C strings, which doesn't seem like a great idea.

Comment 6 Per Bothner 2006-09-29 16:51:09 UTC
(In reply to comment #4)
> I meant if XML 1.0 is in use, XML 1.1-specific character are considered
> invalid, while if XML 1.1 is in use, the full range of XML 1.0+1.1 characters
> are valid.

Why?  The assumption is that XQuery is only useful for data from or destined for XML text files, and that there is no use for any kind of string that can't written portably into an XML file.  But XQuery is more generaly useful for many kinds of  data that has the nested structure of XML, not just the physical structure.

The infoset specification says about characters:
  [character code] The ISO 10646 character code (in the range 0 to #x10FFFF,
  though not every value in this range is a legal XML character code) of the
  character.

By reference, the XQuery Data Model would also seem to allow all character codes in that same range.  After all the data Model explicitly allows other values that cannot be represented as valid XML.
Comment 7 Frans Englich 2007-01-12 18:52:34 UTC
An attempted fix has been committed to CVS, and should be part of XQTS_current.zip. Feel free to verify that the fix is acceptable, and if so, change status to CLOSED. If the attempted fix is not acceptable, reopen this report.

If no opinion about this resolution is expressed within two weeks, it will be closed.

Along with the fix for this report, was committed fixes for other reports as well. Also, a significant amount of new tests were added to cover missing areas and changes in the specifications.
Comment 8 Per Bothner 2007-01-13 22:13:02 UTC
Almost.  However, there is a typo in K-CodepointToStringFunc-8.txt:
  &#x9,
First, there is a comma rather than a semicolon.
Second, the 9 should be an 8.
Comment 9 Frans Englich 2007-01-15 16:46:06 UTC
An attempted fix has been committed to CVS, and should be part of
XQTS_current.zip. Feel free to verify that the fix is acceptable, and if so,
change status to CLOSED. If the attempted fix is not acceptable, reopen this
report.

If no opinion about this resolution is expressed within two weeks, it will be
closed.
Comment 10 Per Bothner 2007-01-16 01:42:29 UTC
Thanks.  It workd for me.