Character encoding errors (detailed review of parsing algorithm) from Henri Sivonen on 2007-07-18 (public-html@w3.org from July 2007)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Wed, 18 Jul 2007 12:02:41 +0300
To: "public-html@w3.org WG" <public-html@w3.org>
Message-Id: <85BE6153-3AF3-49AF-8A5E-89E804A173CC@iki.fi>

(This is part of my detailed review of the parsing algorithm.)

The spec says:
> Bytes or sequences of bytes in the original byte stream that could  
> not be converted to Unicode characters must be converted to U+FFFD  
> REPLACEMENT CHARACTER code points.

The spec should probably say explicitly that such byte sequences are  
parse errors.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Wednesday, 18 July 2007 09:02:50 UTC