This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
In addition to plain charset conversion, transcoding should also modify the encoding in XML declaration, as well as <meta http-equiv> and <meta charset> (HTML5), preferably the same way as doctype override does (leaves the existing one there in comments). Not doing the above replacements results in issues when the transcoded content is passed to other validators that care about the encoding specified in one or more of the above. There's already a hack in place for XML::LibXML (bug 4867) and some workarounds are attempted for the HTML5 validator in html5_validate() which are not enough when there's a charset or doctype override in effect, but I think it would be better to do this centrally (as part of the transcoding process?) and get rid of the parser specific hacks and workarounds.
More info: - http://www.w3.org/mid/200902122336.17233.ville.skytta%40iki.fi - http://dev.w3.org/cvsweb/validator/httpd/cgi-bin/check.diff?r1=1.626&r2=1.627