Changing an HTML page encoding to UTF-8

Intended audience: newcomers to internationalization who want to change the encoding of their HTML pages.



How do I change the encoding of my HTML pages to UTF-8?

So you've heard that it's useful to encode your pages in UTF-8 rather than a legacy encoding such as Windows 1252 or ISO 8859-1, and you've heard that others are doing it, but you're not sure how to do it. This page will help.

Quick answer

This article draws summarises the information you need. Follow the embedded links to other articles on the site if you need to get detailed information about any step.

Step 1: Save the data as UTF-8

It is not sufficient to just change the declarations inside your pages to say that the page is encoded in UTF-8. You must ensure that your data is actually encoded, ie. saved, in UTF-8. If you are working with hand-edited files then you should use your editor to save the file in UTF-8 rather than the encoding you were using. If you are building files from scripts and databases, you should ensure that the data is converted as necessary and that the correct parameters are set in your scripting environment.

Note that you may have to ensure that the data does not include a UTF-8 signature, also known as a byte-order mark (BOM).

Step 2: Declare the encoding in your page

You should change the character encoding declaration in your page (or add one if you don't already declare it).

Step 3: Ensure that your server does the right thing

Although your data is in UTF-8 and you have declared it in the page, your server may still be serving the page with an accompanying HTTP header that says it is something else. The declaration in the HTTP header will override information inside the page.

To address this you need to check whether this is actually a problem or not, and then, if it is, take steps to rectify it.

Server admin privileges are needed to change the encoding sent in the HTTP header, though you may be able to do so yourself even if you are serving files via an ISP. Consult your server admin person. See the explanation of one way to do this for an Apache server.