Accesskey n skips to in-page navigation. Skip to the content start.

s_gotoW3cHome Internationalization
 

Changing (X)HTML page encoding to UTF-8

Intended audience: newcomers to internationalization who want to change the encoding of their (X)HTML pages.

Question

How do I change the encoding of my (X)HTML pages to UTF-8?

Background

So you've heard that it's useful to encode your pages in UTF-8 rather than a legacy encoding such as Windows 1252 or ISO 8859-1, and you've heard that others are doing it, but you're not sure how to do it. This page will help.

Answer

This article draws summarises the information you need. Follow the embedded links to other articles on the site if you need to get detailed information about any step.

Step 1: Save the data as UTF-8

It is not sufficient to just change the declarations inside your pages to say that the page is encoded in UTF-8. You must ensure that your data is actually encoded, ie. saved, in UTF-8. If you are working with hand-edited files then you should use your editor to save the file in UTF-8 rather than the encoding you were using. If you are building files from scripts and databases, you should ensure that the data is converted as necessary and that the correct parameters are set in your scripting environment.

Note that you may have to ensure that the data does not include a UTF-8 signature, also known as a byte-order mark (BOM).

Step 2: Declare the encoding in your page

You should change the character encoding declaration in your page (or add one if you don't already declare it).

Step 3: Ensure that your server does the right thing

Although your data is in UTF-8 and you have declared it in the page, your server may still be serving the page with an accompanying HTTP header that says it is something else. The declaration in the HTTP header will override information inside the page.

To address this you need to check whether this is actually a problem or not, and then, if it is, take steps to rectify it.

Server admin privileges are needed to change the encoding sent in the HTTP header, though you may be able to do so yourself even if you are serving files via an ISP. Consult your server admin person. See the explanation of one way to do this for an Apache server.

Tell us what you think (English).

Subscribe to an RSS feed.

New resources

Home page news

Twitter (Home page news)

‎@webi18n

Further reading

By: Richard Ishida, W3C.

Valid XHTML 1.0!
Valid CSS!
Encoded in UTF-8!

Content first published 2005-08-26. Last substantive update 2005-08-26 12:51 GMT. This version 2010-08-27 20:50 GMT

For the history of document changes, search for qa-changing-encoding in the i18n blog.