This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 20547 - [QT3TS] fn-unparsed-text-lines-040
Summary: [QT3TS] fn-unparsed-text-lines-040
Status: RESOLVED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: XQuery 3 & XPath 3 Test Suite (show other bugs)
Version: Last Call drafts
Hardware: PC All
: P2 normal
Target Milestone: ---
Assignee: Tim Mills
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-01-02 10:36 UTC by Michael Kay
Modified: 2013-01-04 09:10 UTC (History)
0 users

See Also:


Attachments

Description Michael Kay 2013-01-02 10:36:28 UTC
The input file for this test is in UTF-16 encoding, and the expected test results can only be achieved if the processor is able to infer that the encoding is UTF-16. There is nothing in the spec that guarantees the processor will be able to make this inference. If the processor doesn't spot that the file is in UTF-16, it will assume UTF-8, and fail with a decoding error.
Comment 1 Michael Kay 2013-01-02 10:43:12 UTC
Also affects test cases -047, -051, -052 in the same test set.

(Note that while the processor MAY take account of a BOM in inferring the encoding (under the "implementation-defined heuristics" provision), it is not required to do so.)
Comment 2 O'Neil Delpratt 2013-01-02 12:10:06 UTC
We could get around this by including the encoding as an argument in the function. The spec allows this as an alternative. I can make the change if it is ok with Mike and Tim?
Comment 3 Michael Kay 2013-01-02 20:13:36 UTC
Yes, we could include the encoding as an explicit argument. The only question is whether that would spoil the intent of the test. The alternative might be to have two alternate results, one for processors that correctly infer the encoding using implementation-defined heuristics, and one for processors that fall back to UTF-8. (I'm not sure if the spec allows for the possibility that implementation-defined heuristics will be invoked and will give the wrong answer...)
Comment 4 Tim Mills 2013-01-03 07:21:49 UTC
I suspect that I should have included encoding attributes for each of the resource elements.  Would that make the reported problem go away?
Comment 5 Michael Kay 2013-01-03 09:18:46 UTC
Adding encoding and/or media-type attributes might provide a solution. It's not clear however that implementations are obliged to provide an API that allows the application (in this case the test driver) to supply the media-type or encoding of a resource in this way, so it's not the whole answer. I would be inclined to add this information, and still add an alternative result for implementations that use the fallback encoding.
Comment 6 Tim Mills 2013-01-03 11:02:42 UTC
Agreed.  I was about to quote the text from the XML spec, but of course this is plain text.  There might be an argument for requiring similar behaviour for plain text as with XML regarding utf-8 and utf-16, but as the spec stands, what you propose is correct.
Comment 7 Tim Mills 2013-01-04 09:10:36 UTC
I've attempted a fix.  Please mark as CLOSED if you agree with the resolution.  Otherwise, REOPEN.