Upgrading from language-specific legacy encoding to Unicode encoding
You have heard that using Unicode is a good idea and that there are benefits such as standards compatibility, multilingual display on a single page, pan-organisation applications.
Numerous large organizations have switched to Unicode. This document will point you to resources that will assist your migration to Unicode.
A migration to Unicode involves more than just converting your HTML or XHTML templates to an appropriate Unicode encoding. Yo will need to plan your migration. Internal data sorces will need to be converted to Unicode and external data sources may been to be transcoded before integration into your site. You also need to know the state of Unicode support in software components you rely on.
Detailed information is available in the article on Unicode migration
How well is Unicode supported for my end users?
This depends on:
- browser support
- suitable fonts
- rendering software
Modern browsers support Unicode:
- Internet Explorer
Although many mobile phones support UTF-8, some do not. Additionally, if they use a legacy encoding, which encoding may vary with different devices. Investigation is required if you are targeting a large mobile phone market.
What do I need to do?
Convert X/HTML, XML and CSS files to UTF-8
Unicode has three main encodings: UTF-8, UTF-16, UTF-32. UTF-8 is the Unicode encoding consistently used for web pages. It provides:
- Better compatibility with legacy data, where that legacy data uses ASCII as the 128 codepoints in ASCII match the first 128 codepoints in UTF-8.
- No byte order problems.
It is necessary to:
- Convert data to an appropriate Unicode encoding.
- Declare the encoding in HTML, XHTML, XML and CSS files.
- Ensure that your server does the right thing; check HTTP Response header.
- Test your web site.
You will need to specify the encoding of your documents. There is a tutorial on Character sets & encodings in XHTML, HTML and CSS. Basic principles are:
For HTML and XHTML served as text/html: always use a <meta> element
For XHTML served as text/html: where practical use an XML declaration with an encoding attribute
For XML files and XHTML served as XML: always use an XML declaration with an encoding attribute
For CSS style sheets: use the @charset rule
Other useful documents include:
Convert scripts and database tables to Unicode
Unicode offers three encoding forms: UTF-8, UTF-16, and UTF-32. The software and databases you use may require specific Unicode encodings. Although UTF-8 is used as the encoding for web pages, UTF-16 is often used in the back-end.
Marking up the primary language of a document and any change of language of a document is good internationalization proactice. It can also be critical to correctly culturally appropriate rendering of CJK data.