W3C

Edit comment LC-2023 for Mobile Web Best Practices Working Group

Quick access to

Previous: LC-2052 Next: LC-2029

Comment LC-2023
:
Commenter: casays <casays@yahoo.com>

or
Resolution status:

5) Section 4.3.6.1

I miss any discussion or reference in the document about the issue of
character encodings.

Transforming content across different charsets is a mine-field and
affects a number of aspects:
a) Content may rely upon widely different character encodings,
depending on the targetted devices and markets. In particular, the
trio China - Japan - Korea (CJK) continues to rely on a number of
encodings (such as Shift_JIS, BIG5, etc) whose handling is a complex
matter; for instance, there are not necessarily bijective mappings
between these encodings and others, including UTF-8.
b) Documents may have multi-encoding representations. Different
encodings may be associated with external entities through the charset
attribute (see HTML 4.0.1). How transformation proxies deal with such
a situation is left undefined.
c) Similarly, the draft does not explain what happens when a server
associates an attribute accept-charset to a form, and whether proxies
respect or manipulate such information.
d) In i-Mode, and at least in the Softbank environment (Japan),
unreserved character points in the character encoding space are used
to represent pictograms. Any attempt to convert these characters
directly will fail; they should therefore not be transformed, but
preserved, taking into account the fact that the character points thus
referred to differ between Unicode and Shift_JIS, and that DoCoMo and
Softbank do not use the same code points for the same pictograms.

A consequence of all this is that if a proxy does not operate natively
with the character encoding of the content returned by the server, or
is not able to ensure a bijective mapping between this encoding and
other encodings it deals with, recurrent and irrecoverable problems
will creep.
A simple way that could go some way towards alleviating this risk
would be to forbid any transformation if the server announces (either
via the HTTP field Content-type: charset=..., the XML declaration, or
a meta-tag) an encoding different from ASCII or perhaps UTF-8.
(space separated ids)
(Please make sure the resolution is adapted for public consumption)


Developed and maintained by Dominique Hazaël-Massieux (dom@w3.org).
$Id: 2023.html,v 1.1 2017/08/11 06:43:16 dom Exp $
Please send bug reports and request for enhancements to w3t-sys.org