GeoQuickTips
GEO WG Collaborative editing page
Follow the conventions for editing this page.
Status: Working Draft ie. wordsmithing and refining at this stage.
Authors: Andrew Cunningham, Richard Ishida
The first section is more concise to fit on the cards. The second section is aimed at the Web site and adds explanations and links.
Quick Internationalization Tips for the Web
Shortened version for the cards.
- Encoding. Use Unicode wherever possible for content, databases, etc. Always declare the encoding of content.
- Escapes. Only use escapes (ie. or ) in specific circumstances.
- Language. Declare the text-processing language of documents and indicate any internal language changes.
- Presentation vs. content. Use stylesheets for presentational information. Restrict markup to semantics.
- Images, animations & examples. Check for translatability and inappropriate cultural bias.
- Forms. Use an appropriate encoding on both form and server. Support local formats of names/addresses, times/dates, etc.
- Text authoring. Use simple, concise text. Do not compose sentences from multiple strings using scripting.
- Navigation. On each page include clearly visible navigation to localized pages or sites. Use the target language.
- Right-to-left text. For XHTML, add dir="rtl" to the html tag. Only re-use it to change directionality.
- Check your work. Validate! Use techniques, tutorials, and articles at http://www.w3.org/International/
Proposed longer version for the web site
- Encoding. Use Unicode wherever possible for content, databases, etc. Always declare the encoding of content.
The character encoding you choose determines how bytes are mapped to characters in your text. Normally character encodings limit you to a particular script or set of languages. Unicode allows you to deal simply with almost all scripts and languages in use around the world. In this way Unicode simplifies the handling of content in multiple languages, whether within a single page or across one or more sites. Unicode is particularly useful when used in forms, scripts and databases, where you often need to support multiple languages. Unicode also makes it very straightforward to add new languages to your content.
Unless you appropriately declare which character encoding you are using your users may be unable to read your content. This is because incorrect assumptions may be made by the application interpreting your text about how the bytes map to characters.
The W3C internationalization resources provide guidance on how to declare the character encoding of (X)HTML and CSS content. Note that this is not always so straightforward as it may appear.
- Escapes. Only use escapes (ie. or ) in specific circumstances.
Numeric Character References (NCRs), and entities are ways of representing any Unicode character in X/HTML using only ASCII characters. For example, you can represent the character á in X/HTML as á or á or á.
Such escapes are useful for clearly representing ambiguous or invisible characters, and to prevent problems with syntax characters such as ampersands and angle brackets. They may also be useful on occasion to represent characters not supported by your character encoding or unavailable from your keyboard. Otherwise you should always use characters rather than escapes.
The W3C internationalization resources provide additional information about the use of escapes in markup languages. In particular, entities (such as á) should be used with caution.
- Language declaration. Declare the text-processing language of documents and indicate any internal language changes.
Information about the (human) language of content is already important for accessibility, styling, searching, editing, and other reasons. As more and more content is tagged and tagged correctly, applications that can detect language information will become more and more useful and pervasive.
The W3C internationalization resources provide information about how to declare the language of a document as a whole or fragments of text in a different language. It is also essential to understand the difference between the concepts of text-processing language vs. primary language metadata.
- Presentation vs. content. Use stylesheets for presentational information. Restrict markup to semantics.
It is an important principle of Web design to keep the way content is styled or presented separate from the actual text itself. This makes it simple to apply alternative styling for the same text, for example in order to display the same content on both a conventional browser and a small hand-held device.
This principle is particularly useful for localization, since different scripts have different typographic needs. For example, due to the complexity of Japanese characters, it may be preferable to show emphasis in Japanese X/HTML pages in other ways than bolding or italicisation. It is much easier to apply such changes if the presentation is described using CSS, and markup is much cleaner and more manageable if text is correctly and unambiguously labelled as 'emphasised' rather than just 'bold'.
It can also save considerable time and effort during localization to work with CSS files rather than have to change the markup, because any needed changes can be made in a single location for all pages, and the translator can focus on the content rather than the presentation.
- Images, animations & examples. Check for translatability and inappropriate cultural bias.
If you want your content to really communicate with people, you need to speak their language, not only through the text, but also through local imagery, color, objects and concerns. It is easy to overlook the culture-specific nature of symbolism, behaviour, concepts, body language, humor, etc. You should get feedback on the suitability and relevance of your images, video-clips, and examples from in-country users.
You should also take care when incorporating text in graphics when content is translated. Text on complex backgrounds or in restricted spaces can cause considerable trouble for the translator. You should provide graphics to the localization group that have text on a separate layer, and you should bear in mind that text in languages such as English and Chinese will almost certainly expand in translation.
- Forms. Use an appropriate encoding on both form and server. Support local formats of names/addresses, times/dates, etc.
- Text authoring. Use simple, concise text. Do not compose sentences from multiple strings using scripting.
Simple, concise text is easier to translate. It is also easier for people to read when no translation is available.
You should take considerable care when composing messages from multiple substrings, or when inserting variable text into strings. For example, suppose your site uses JSP scripting, and you decide to compose certain messages on the fly. You may create messages by concatenating separate substrings, such as 'Only' or 'Don't', ' return results in ', and 'any format' or 'HTML'. Because the order of text in sentences of other languages can be very different, translating this may present major difficulties.
Similarly, it is important to avoid fixing the positions of variables in text such as "Page 1 of 10.". The syntax of other languages may require the numbers to be reversed to make sense. If you use PHP, this would mean using a formatting string such as "Page %1\$d of %2\$d.", rather than the more simple "Page %d of %d.". The latter is untranslatable in some languages.
Read more about text fragmentation and re-use.
- Navigation. On each page include clearly visible navigation to localized pages or sites. Use the target language.
Where you have versions of a page or site in a different language, or for a different country or region, you should provide a way for the user to view the version they prefer. This should be possible from any page on your site where an alternative exists.
When providing links to pages in other languages, use the name of the target language in the native language and script. Don't assume that the user can read English. For example, in a link to a French page, 'French' would be written 'français'. This also applies if you are guiding the user to a country- or region-specific page or site, eg. 'Germany' would be 'Deutschland'.
The W3C internationalization resources provide information about [http://www.w3.org/TR/i18n-html-tech-bidi/ using <select> to link to localized content. Additional articles on navigation appear in the near future.
- Right-to-left text. For XHTML, add dir="rtl" to the html tag. Only re-use it to change directionality.
Text in languages such as Arabic, Hebrew, Persian and Urdu is read from right to left. This reading order typically leads to right-aligned text and mirror-imaging of things like page and table layout. You can set the default alignment and ordering of page content to right to left by simply including dir="ltr" in the html tag.
The direction set in the html tag cascades down through all the elements on the page. It is not necessary to repeat the attribute on lower level elements unless you want to explicitly change the directional flow.
Embedded text in, for example, Latin script still runs left to right within the overall right to left flow. So do numbers. If you are working with right to left languages, you should become familiar with the basics of the Unicode bidirectional algorithm. This algorithm takes care of much of this bidirectional text without the need for intervention from the author. There are some circumstances, however, where markup or Unicode control characters are needed to ensure the correct effect.
Amongst other things, the W3C internationalization resources provide information about how to work with right to left scripts, and a gentle introduction to the basics of handling inline bidirectional text.
- Check your work. Validate! Use techniques, tutorials, and articles at http://www.w3.org/International/
Quick Internationalization Tips for the Web
Original longer version:
- Encoding: Use Unicode wherever possible for content, forms, scripts, databases, etc.; always declare the encoding in the page, and, where appropriate, using the HTTP header.
- Escapes: Only use escapes for characters in exceptional circumstances; use numeric character references rather than character entities if possible.
- Language declaration: Declare the default text-processing language of the document in the document element (eg. the html tag), and use attributes to declare language changes in your document.
- Separate presentation and content: Use stylesheets for presentational information, and use markup in a semantically meaningful way.
- Images, animations & examples: Check these for inappropriate cultural bias and translatability.
- Forms: Take care to support appropriate encodings on both the form and the server, and localized formats for names and addresses, times and dates, etc.
- Text authoring: Use simple, concise text for international sites; avoid scripting techniques that compose sentences from multiple strings.
- Navigation: On each page include clearly visible navigation to any localized pages or sites, and use the target language for navigation prompts.
- Right-to-left text: For XHTML, add dir="rtl" to the html tag and only re-use it where necessary to change directionality; use logical rather than visual ordering.
- Check your work: Validate! Use techniques, tutorials, and FAQs at [http://www.w3.org/International]