This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 802 - double utf-8 bom yields in encoding errors in Validation result
Summary: double utf-8 bom yields in encoding errors in Validation result
Status: RESOLVED FIXED
Alias: None
Product: Validator
Classification: Unclassified
Component: check (show other bugs)
Version: 0.6.7
Hardware: Other other
: P2 normal
Target Milestone: ---
Assignee: Terje Bless
QA Contact: qa-dev tracking
URL: http://validator.w3.org/check?uri=htt...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-06-16 13:54 UTC by Bj
Modified: 2005-08-18 03:19 UTC (History)
0 users

See Also:


Attachments

Description Bj 2004-06-16 13:54:38 UTC
http://validator.w3.org/check?uri=http%3A%2F%2Fwww.websitedev.de%2Fmarkup%
2Fvalidator%2Ftests%2Fdouble-utf-8-bom.html

Is invalid,

http://validator.w3.org/check?uri=http%3A%2F%2Fvalidator.w3.org%2Fcheck%3Furi%
3Dhttp%253A%252F%252Fwww.websitedev.de%252Fmarkup%252Fvalidator%252Ftests%
252Fdouble-utf-8-bom.html

  Sorry, I am unable to validate this document because on line 172 it
  contained one or more bytes that I cannot interpret as utf-8 (in other
  words, the bytes found are not valid values in the specified Character 
  Encoding). Please check both the content of the file and the character
  encoding indication.
Comment 1 Bj 2004-09-06 21:13:12 UTC
This is probably a duplicate of the bug that deals with using the UTF-8 flag 
for truncate_line() etc.
Comment 2 Ville Skyttä 2004-09-06 21:33:14 UTC
Offtopic, but BOM related: this might be interesting sometime:
http://search.cpan.org/dist/File-BOM/
Comment 3 Terje Bless 2004-09-11 13:15:29 UTC
One way to deal with this is to pass our complete output data through the UTF-8 checker (charlint),
possibly modified to tag illegal byte sequences and continue instead of croaking.

BTW, cf. Comment #1, I can't seem to find this bug you're refering to; care to provide a bug number?
Comment 4 Bj 2004-09-11 23:08:39 UTC
See the relevant comment in the source about Perl 5.8.x --- and I am not sure 
how your suggestion would help. The problem here is that the string is 
considered a byte string and thus substr etc. do not work as expected.
Comment 5 Bj 2005-08-18 03:19:59 UTC
Fixed in HEAD.