This is an archive of an inactive wiki and cannot be modified.

Selecting a Character Encoding

In general determining the supported character encodings will depend on the adaptation strategy being used.

Strategies for determining supported character encodings

The simplest strategy is to evaluate the content of the Accept-Charset header. It should be noted however that not all phones send this header. Applications should be prepared for the eventuality that the Accept-Charset header is not present.

A more advanced strategy would involve obtaining the supported character encodings from a device profile. Typically the device profile would be obtained from a repository of device profiles using the User-Agent and/or UAProf header string to look-up the profile for a specific device. (How the information in the device profile is compiled is out of scope here but typically this will come from the Accept-Encoding header, a UAProf device profile or device information provided directly by the manufacturer. See this mail for a general description of the process.)

Supported code points

The character encoding chosen should of course contain all the required code points for the relevant content i.e. if the content consists of Hangul characters, it will not be possible to represent it using the US-ASCII encoding. Naturally Unicode encodings provide the widest coverage of code points - however it should be taken into account that due to the limited memory available on mobile devices the number of characters actually contained in device fonts may be limited.

Efficiency of character encoding

Consider the efficiency of the character encoding when selecting the encoding. For latin character based content this is generally irrelevant since all encodings will be of roughly equal efficiency, however content encodings for oriental scripts may vary substantially in their efficiency. For example, for typical Japanese text, using the Shift-JIS encoding will typically result in output 30% smaller than using UTF-8.

Currency symbols

Particular care should be taken with currency symbols since they are often do not render as expected and there are a number of known issues which complicate this issue (for example, the Euro currency symbol due to its relatively recent creation is not widely supported on older phones; legacy encodings of Japanese and Korean used the same code point for the Yen and Won currency symbols etc.)

Internationalization

In general for the purpose of improving the multilingual web, all else being equal, it is recommended to use a Unicode encoding.

Static Content/Default Encoding

In the static case i.e. where no attempt is made to optimize content for individual devices, it is recommend to use the UTF-8 encoding.

Back to BestPracticesList


CategoryBpCharacterEncodingSupport

Contributions to this wiki are governed by the W3C policies for Contribution to W3C' wiki on Mobile Web.