ISSUE-495: note about legacy encodings such as windows-1252 is invalid ⓟ
note about legacy encodings such as windows-1252 is invalid ⓟ
- State:
- CLOSED
- Product:
- find-text
- Raised by:
- Addison Phillips
- Opened on:
- 2015-10-15
- Description:
- http://www.w3.org/TR/2015/WD-findtext-20151015/#introduction
In the introduction we find this note:
--
This specification defines the behavior for documents using a Unicode character encoding, such as UTF-8. Behavior for documents using legacy character encoding, such as windows-1252, may be anomolous.
--
Since the document processing model for Web pages and other parts of the Open Web stack is based entirely on Unicode, the character encoding used to transmit or serialize a page being searched is not germane to finding text.
How the document is converted to Unicode may matter: CharMod recommends that a "normalizing transcoder" be used. However, the specification is not about searching byte streams. It is about searching the converted Unicode character stream. There will be no anomalous search behavior unless something is very wrong with the APIs in this document. This note invites developers and implementers to question something that they really shouldn't be concerned about.
(editorial nit: anomalous is misspelled) - Related Actions Items:
- No related actions
- Related emails:
- I18N-ISSUE-495: note about windows-1252 is invalid ⓟ [find-text] (from sysbot+tracker@w3.org on 2015-10-15)
Related notes:
https://github.com/w3c/i18n-activity/issues/101
Richard Ishida, 17 Mar 2016, 12:29:25Display change log