ISSUE-495: note about legacy encodings such as windows-1252 is invalid ⓟ

note about legacy encodings such as windows-1252 is invalid ⓟ

Raised by:
Addison Phillips
Opened on:

In the introduction we find this note:

This specification defines the behavior for documents using a Unicode character encoding, such as UTF-8. Behavior for documents using legacy character encoding, such as windows-1252, may be anomolous.

Since the document processing model for Web pages and other parts of the Open Web stack is based entirely on Unicode, the character encoding used to transmit or serialize a page being searched is not germane to finding text.

How the document is converted to Unicode may matter: CharMod recommends that a "normalizing transcoder" be used. However, the specification is not about searching byte streams. It is about searching the converted Unicode character stream. There will be no anomalous search behavior unless something is very wrong with the APIs in this document. This note invites developers and implementers to question something that they really shouldn't be concerned about.

(editorial nit: anomalous is misspelled)
Related Actions Items:
No related actions
Related emails:
  1. I18N-ISSUE-495: note about windows-1252 is invalid ⓟ [find-text] (from on 2015-10-15)

Related notes:

Richard Ishida, 17 Mar 2016, 12:29:25

Display change log ATOM feed

Addison Phillips <>, Chair, Richard Ishida <>, Fuqiao Xue <>, Atsushi Shimono <>, Staff Contacts
Tracker: documentation, (configuration for this group), originally developed by Dean Jackson, is developed and maintained by the Systems Team <>.
$Id: index.php,v 1.326 2018/10/13 17:29:51 vivien Exp $