Bug handling utf-16 in w3ctestlib

Hi Peter,

I discovered yesterday that there's a file name conflict between
css-backgrounds-3/border-image-slice-001.xht and
css-backgrounds-3/border-image-slice-001.htm that isn't caught by the
build system.

It turned out that border-image-slice-001.htm (which is encoded in
utf-16-le) was being parsed as windows-1252, so no elements were
recognized and the file was dropped as "not a test". The file wasn't
detected as utf-16-le in HTMLSource.parse because of the encoding
handling there.

As HTMLBinaryInputStream.__init__ already calls detectEncoding(), the
UTF-16 BOM is no longer in the stream when HTMLSource.parse calls
detectEncoding() manually. This causes detectEncoding() not to find
anything interesting, and return windows-1252. Attached is a patch to
remove the manual handling, instead depending on HTMLParser.parse to
handle the encoding detection itself.

Could you apply the patch to <https://hg.csswg.org/dev/w3ctestlib>? I
don't believe I have push access myself.

Thanks
Ms2ger

Received on Tuesday, 22 September 2015 11:14:08 UTC