This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 3788 - Byte Order Mark in ExpectedTestResults files
Summary: Byte Order Mark in ExpectedTestResults files
Status: RESOLVED FIXED
Alias: None
Product: XML Query Test Suite
Classification: Unclassified
Component: XML Query Test Suite (show other bugs)
Version: unspecified
Hardware: PC Linux
: P2 minor
Target Milestone: ---
Assignee: Andrew Eisenberg
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-10-03 17:08 UTC by Per Bothner
Modified: 2006-10-05 19:15 UTC (History)
0 users

See Also:


Attachments

Description Per Bothner 2006-10-03 17:08:41 UTC
A few files in ExpectedTestResults start out with "Byte Order Mark" U+FEFF.
For example:  ExpectedTestResults/Expressions/FLWORExpr/ForExprType/ForExprType056.xml
I'm no Unicode expert, but this seems wrong.  Byte Order Mark only makes sense in UTF-16 to distinguish big-endian from little-endian.  It does not belong in UTF-8, which these files are.

Of course it's easy to just ignore the initial Byte Order mark, but I wanted to at least put this bug report on the record.
ForExprType036.xml
ForExprType055.xml
ForExprType056.xml
ForExprType058.xml
ForExprType059.txt
ForExprType060.txt
ForExprType062.xml
Comment 1 Michael Kay 2006-10-03 18:46:47 UTC
The XML 1.0 specification (section 4.3.3) states that entities encoded in UTF-8 may begin with a byte order mark and that XML parsers must handle this. However, this was a late change to the spec and many older parsers don't like it. I'd suggest you find a more recent parser.
Comment 2 Per Bothner 2006-10-03 19:56:13 UTC
(In reply to comment #1)
> The XML 1.0 specification (section 4.3.3) states that entities encoded in UTF-8
> may begin with a byte order mark and that XML parsers must handle this.
> However, this was a late change to the spec and many older parsers don't like
> it. I'd suggest you find a more recent parser.

These output files are "Fragment", so they're not even supposed to be valid XML.  For example ForExprType059.txt is a lone processing-instruction.  So the framework has to treat the expected file as an external parsed entity, which is another useless complication.

There is the same argument as I med for bug #3756: If the testsuite can use raw textual comparison between the actual output and thex epected output, then the testsuite is more robust. Anything that requires parsing or messaging the expected output file is another opportunity for not cathcing a bug.

This is not a big deal, since it's trivial to work around, but I think it is a blemish (at least) in the testsuite.
Comment 3 David Carlisle 2006-10-03 22:08:54 UTC
I think Per is correct that the BOM here is incompatible with the fragment comparison description which requires that the expected result files
are inserted into element content to be made well formed before being parsed.

Actually I notice that the guidelines say

> For XML fragments, the same root node must be created for both

which isn't very clear, but given the context I think an element node is meant rather than a document node. Certainly that's what I do.


I hadn't noticed these BOM as I read the file using xslt's unparsed-text function which uses xml's algorithm, and so swallows the BOM, before adding a start and end tag and parsing the result, but I don't think that that is required by the guidelines as written, and the BOM should be removed from expected results to be read using the fragment comparison.

David
Comment 4 Frans Englich 2006-10-05 14:40:13 UTC
I lean towards Per's suggestion. If it was a query file or an XML file it would have been a different thing, but in this case it's a fairly exotic file with special treatment. So I would prefer to see the BOMs removed.
Comment 5 Carmelo Montanez 2006-10-05 19:15:49 UTC
All:

Removed the offending "BOM" characters.

Thanks,
Carmelo