From W3C Wiki
Jump to: navigation, search

GEO WG Collaborative editing page

Follow the conventions for editing this page.

Status: Initial Draft ie. please focus on technical content, rather than wordsmithing at this stage.

See the [GEO home page].

Author: Richard Ishida

Getting Started

Finding information on this site

Ways of finding information you need on this site include:

  • Topic Index: describe....
  • Techniques Index: describe ...
  • Site search: At the top right of the Activity home page and the second level navigation pages you can find a small field that allows you to search for text in documents within the /International area of the W3C site.
  • Task Index: If you have in mind one of the specific tasks we have created tags for, you can link to a list of documents that contain that tag. The current list is:
    • browsing content
    • creating XHTML/HTML code (either using an editor or via scripting)
    • developing scripts (such as PHP, JSP, etc.)
    • developing CSS code
    • developing document schemas (DTDs, XML Schema, RelaxNG, etc.)
    • developing XSLT applications
    • managing Web projects
    • etc etc (to be completed)

Overview of information on the site (pretty lame title, needs improving)

This section (or page) provides an overview of the information you will find in the W3C Internationalization Activity site, and introduces you to concepts. It also helps you decide what information is relevant to you, and sets you on the path to finding it.

Character sets and encodings

You should read this if you: create XHTML/HTML code, develop scripts, develop CSS code, develop XSLT applications,...

A character encoding is the key to transforming the bytes and numbers in a computer to the expected characters required to display content. There are many possible encodings, and it is important that the content is correctly labelled if you want people to be able to read it.

Everyone developing content, whether content authors or programmers, must decide what character encoding to use. UTF-8 is a popular recommendation these days, but there may still be things you should consider before using it.

Once you have decided what encoding to use, content developers and programmers must ensure that it is declared in the right way. (This is not always straightforward with a technology such as XHTML.) You must also ensure that your data is saved in the encoding you have chosen.

Content developers and webmasters may also need to ensure that the server serves content with the correct character encoding declarations.

Escapes provide a way of representing characters that are not available in the character encoding you are using, or avoiding the use of the character for other reasons (such as when they may conflict with syntax). Articles on the site explore when and how these escapes should be used.

Character encoding decisions and an understanding of how character encodings work are also important when dealing with forms and Web addresses.

There is additional material on the site to help you with such things as:

  • understanding byte order marks and how to handle them
  • knowing when to use Unicode characters and when to use markup
  • handling control characters
  • checking that you file is in the encoding you think it is, and check the encoding of documents sent to your user agent.

To find more detailed information about character sets and encodings, see the [topic index].


DRC 24 Aug 05 - Since this is a getting started document, we should make it clear that this refers to human language, not programming language If content is labelled with language information, this can be used to affect

More along the lines of the previous section for this and following sections that echo the topic index, ie.

  • Fonts
  • Bidirectional text
  • Text formatting
  • Resource identifiers
  • Locale specific data
  • Forms
  • Document structure and metadata
  • Navigation
  • Miscellaneous