This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 802 - double utf-8 bom yields in encoding errors in Validation result
Summary: double utf-8 bom yields in encoding errors in Validation result
Alias: None
Product: Validator
Classification: Unclassified
Component: check (show other bugs)
Version: 0.6.7
Hardware: Other other
: P2 normal
Target Milestone: ---
Assignee: Terje Bless
QA Contact: qa-dev tracking
Depends on:
Reported: 2004-06-16 13:54 UTC by Bj
Modified: 2005-08-18 03:19 UTC (History)
0 users

See Also:


Description Bj 2004-06-16 13:54:38 UTC

Is invalid,

  Sorry, I am unable to validate this document because on line 172 it
  contained one or more bytes that I cannot interpret as utf-8 (in other
  words, the bytes found are not valid values in the specified Character 
  Encoding). Please check both the content of the file and the character
  encoding indication.
Comment 1 Bj 2004-09-06 21:13:12 UTC
This is probably a duplicate of the bug that deals with using the UTF-8 flag 
for truncate_line() etc.
Comment 2 Ville Skyttä 2004-09-06 21:33:14 UTC
Offtopic, but BOM related: this might be interesting sometime:
Comment 3 Terje Bless 2004-09-11 13:15:29 UTC
One way to deal with this is to pass our complete output data through the UTF-8 checker (charlint),
possibly modified to tag illegal byte sequences and continue instead of croaking.

BTW, cf. Comment #1, I can't seem to find this bug you're refering to; care to provide a bug number?
Comment 4 Bj 2004-09-11 23:08:39 UTC
See the relevant comment in the source about Perl 5.8.x --- and I am not sure 
how your suggestion would help. The problem here is that the string is 
considered a byte string and thus substr etc. do not work as expected.
Comment 5 Bj 2005-08-18 03:19:59 UTC
Fixed in HEAD.