This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
A UTF-8 BOM in XHTML breaks the CSS validator, see the "Valid CSS" link at the bottom of the URL provided.
Indeed (in fact, that's probably a known issue, but I am not sure whether someone filed a bug already). We might be able to fix this by upgrading to a more recent version of Xerces.
Xerces version is currently 2.6.2, can you check again?
Okay, it seems this happens if Content-Type:text/html with no charset parameter and a BOM. So this is probably the result of how the HTML parser with its XHTML sniffing interact with xerces. The Validator might be transcoding to UTF-8 before it passes the document to Xerces and in a character stream a bom may indeed not appear. It seems to work for application/xhtml+xml and text/html with a charset parameter in the HTTP header.
that did it - declared it as utf-8 in the http header and it now works.
(In reply to comment #3) > Okay, it seems this happens if Content-Type:text/html with no charset parameter > and a BOM. So this is probably the result of how the HTML parser with its XHTML > sniffing interact with xerces. The Validator might be transcoding to UTF-8 > before it passes the document to Xerces and in a character stream a bom may > indeed not appear. It seems to work for application/xhtml+xml and text/html > with a charset parameter in the HTTP header. The current code does this if the mime type has a charset parameter use it, if not, then if the mime type is text/html -> use iso-8859-1
changing URL to be test case on i18n web site
switching to tagsoup library as html parser has made this issue moot. (there are still issues with BOM-toting CSS files, but will open another bug for them)