Accesskey n skips to in-page navigation. Skip to the content start.

s_gotoW3cHome Internationalization
 

Setting encoding in web authoring applications

Intended audience: users, XHTML/HTML coders (using editors or scripting), content developers, and anyone who wants some tips on saving documents from their editor in a particular encoding.

Updated 2005-10-12 12:30

Question

How do I set character encoding in my web authoring application?

Background

Content on the web can be authored using a variety of software applications. Even within a single site, the content may have been created using multiple authoring tools. For example, a web site that was created using Macromedia Dreamweaver might also include a page created using Microsoft Access' data access page feature, as well as a dynamic Flash movie that allows for language selection. In order for all of these files to serve the correct text, they need to be properly encoded.

The purpose of this article is to identify where some of the key functionality for encoding exists within some of the more popular web authoring applications.

Answer

Specific options for setting character encodings often vary depending on the user's version, and so these are not discussed in detail for each application. For more detailed information, refer to the specific application's help content or user manuals. Common index and search keywords include Character Encoding, Internationalization, Multilingual, Unicode, and UTF.

There are two main points to remember when creating properly encoded files:

  1. The markup within the document must properly designate the encoding (such as charset=iso-8859-1 in an XHTML/HTML meta tag, or encoding="UTF-8" in an XML declaration statement).
  2. The file, itself, must be saved in the proper encoding format (such as UTF-8).

Following are a few points to consider when using these applications:

We recommend the use of Unicode encoding because it greatly simplifies the effort of creating multilingual Web sites.

Adobe FrameMaker 7.0

Encoding options exist within the HTML Options Table.

Adobe Golive 6 and CS (Mac & Win)

To specify the character encoding for new pages, go to: Edit > Preferences > Encodings category. In GoLive CS (Mac), the Preferences Menu is located in the main Application menu (according to Apple Spec)

To change the encoding of a page go to File > Document Encoding. In GoLive CS you can use Edit > Document Content > Change Encoding..., too.

Apple TextEdit (Mac OS X 10.2)

You will need to enter the proper encoding markup into the XHTML/HTML file. Files are natively saved as UTF-8.

Helios TextPad (Windows)

The proper markup for encoding will need to be entered into the file. When saving the document, the proper file format can be selected here: File > Save As > Encoding dropdown menu.

Macromedia ColdFusion (Windows)

To properly configure a ColdFusion application, become familiar with the various encoding-related commands and functions (a few of which include setEncoding, cfcontent, and the form attribute enctype).

Macromedia Dreamweaver MX (Mac & Windows)

To specify the character encoding for your pages, go to Modify > Page Properties. Select the proper encoding from the Document Encoding dropdown menu.

You might also need to specify the character encoding for viewing pages while editing. Go to Edit > Preferences > Fonts category (Dreamweaver > Preferences > Fonts category on Mac).

Macromedia Flash MX (Mac & Windows)

When efficiently designed, multilingual Flash movies often store the text for each language in separate include files (#include), reducing the time needed to download a flash movie by only sending the selected language data. UTF-8 text can be stored in an include file. The include file should start with //!-- UTF8 and must be saved in UTF-8 format.

UTF-8 character notation can also be specified in Flash's ActionScript environment. U+0065 would be written using the escape sequence \u0065 within the ActionScript code.

Another setting worth noting is the encoding setting for the end-user's Flash Player. This is defaulted to false (system.useCodepage = false;), which will use UTF-8. There are times when this may have been changed for some special purpose, but must be changed back to “false” before displaying UTF-8 text again by placing the proper ActionScript in the timeline before calling any new text.

Macromedia HomeSite+

You need to enter the encoding markup into the file. When saving the file, select: File > Save As and select the proper encoding using the Encoding dropdown menu.

There is also an HTML Tidy feature that validates your code as you type. When using this feature, be sure to set this to the same encoding format. Go to: Options > Settings > CodeSweeper category > HTML Tidy CodeSweeper subcategory > Macromedia HTML subcategory > Char encoding dropdown menu.

Microsoft Frontpage 2000-2003 (Windows)

The encoding options are under Language (character set). Go to: Tools > Page Options > Default Font tab (or Unicode (UTF-8) tab). You will notice an option that says “Multilingual (UTF-8).”

Microsoft Notepad 2000/XP (Windows)

Notepad on Win2k/XP offers four choices, 'ANSI' (the codepage corresponding to the default system locale), 'Unicode' (meaning UTF-16LE on ix86), 'Unicode Big endian', and UTF-8.

You will need to specify the character encoding and language when you write the markup code. When you save the document, select File > Save as and select the proper encoding from the Encoding dropdown menu.

Be aware that Notepad adds a signature (byte order mark) to beginning of the file before saving as UTF-8. This can lead to issues when viewing the page in older browsers.

Microsoft WordPad 2000/XP (Windows)

You will need to specify the character encoding and language when you write the markup code. When you save the document, select File > Save as and select the proper encoding from the Save as type dropdown menu.

Note that WordPad does not allow you to save as UTF-8, only as UTF-16 LE.

Mozilla/Netscape Composer (Windows, Mac OS, Unix/Linux, OS/2, VMS, BeOS)

Character encoding for a document can be set here: View > Character Coding menu. A file can be saved using a different character encoding here: File > Save As Charset.

Vim (Windows, Mac OS, Unix/Linux, Amiga, MS-DOS, OS/2 etc.)

Encoding can be set in command mode with the command :set encoding=utf-8. “utf-8” can be replaced by any character encoding supported.

W3C Amaya (Mac, Unix, Windows)

When saving the file, go to File > Save as. Amaya will make sure that the encoding is correct in the XML declaration (for XHTML) and the meta statement. Amaya also uses the appropriate encoding (charset) in the HTTP headers when it saves a document remotely using PUT. Amaya also understands several other encodings when loading a document, but is not able to save in any of these.

XyEnterprise XML Professional Publisher (XPP)

XPP can receive Unicode files. Character encoding options exist within preprocessing and postprocessing controls. Specific character encoding for XHTML/HTML output is usually performed by XyChange or the HTML Toolkit.

By the way

Another key element in the markup is the language indicator. Many of the applications listed here combine the encoding and language in the user-selectable options. If the language is not included by the application, it is good practice to also include that in the markup manually. Also, some applications may acquire the regional settings of your operating system to create a locale tag.

Keep in mind that the end user can select both the encoding to use, as well as the font to use for each encoding. Another option that is selectable by the user is the option to “Always send URLs as UTF-8.” In Microsoft Internet Explorer, for example, this can be found here: Tools > Internet Options > Advanced tab > Browsing category. If your site requires options that might not be standard, it may be proactive to include viewing requirements for a site, which direct the user to the encoding and font settings to properly view the site in the intended manner.

When content is ready to be published, it is always good practice to validate your content using the W3 validation tool.

Tell us what you think (English).

Subscribe to an RSS feed.

New resources

Home page news

Twitter (Home page news)

‎@webi18n

Further reading

By: Phil Arko, Siemens. Changed by: Richard Ishida, W3C.

Valid XHTML 1.0!
Valid CSS!
Encoded in UTF-8!

Content first published 2003-11-06. Last substantive update 2005-10-12 12:30 GMT. This version 2011-05-03 18:00 GMT

For the history of document changes, search for qa-setting-encoding-in-applications in the i18n blog.