This page may contain examples using non-Latin characters. Use accesskey "n" to jump to the internal navigation links at any point. Right now you can skip to:

Go to W3C Home Page Go to Architecture Domain home page. Go to Internationalization Activity home page.

DRAFT! FAQ: What happens when characters get input in a form with a different encoding

Question

What happens to characters input into a form with an encoding that does not include these characters?

For example: Assume the form page is encoded in iso-8859-1 (Latin-1), and somebody tries to input a Japanese character, or the other way round. Does that work?

How can I assure that my form can handle all the characters input my the user?

Answer

It may look as it works, but it does not. For example, the character typed in may show up correctly in the form page, or even sometimes in page showing the results of your input. But sooner or later, there will be problems. The character may not show up correctly in a results page, it may appear garbled later, e.g. in an email that gets sent to you to conform your input, and it may end up in a database in a wierd form that can't be matched again. It may also just get lost. What happen depend on your browser, and will not work across browsers.

What is the right thing to do as an end user?

Sad though it may be, don't try to go too far. If the page is e.g. in English only, it is probably safer to stick to the letters of the basic Latin alphabet (i.e. US-ASCII) for input. If the page is in Japanese, don't try to enter Chinese, Korean, Latin accents, and so on, unless the page suggests that this is okay. For example, you may prefer to have your say in how you want your name represented with a limited set of characters, rather than to have the system mangle your name in an arbitrary way.

Even if the form itself works (for example by using UTF-8), there is unfortunately no guarantee that all pieces of the system work equally well. For example, everything might work, but then in the end, the printer's fonts can't print your character. And even if it would work for the printer, the relevant people (e.g. the postman) may not be able to read the text.

What is the right thing to do as an author/programmer?

First, make sure you know what set of characters your overall system can handle. Second, if possible use UTF-8 as the encoding of your form page, to make sure there is no loss of information between browser and server, and to avoid to have to change encodings when your overall system gets better and can handle more characters. Third, make the expectations clear to the users of your forms, either in advance near to the relevant form fields, or in a reply after submission. Fourth, make sure you check what gets sent from the client back to the server.

How does the charset attribute on the form element influence this?

What's xx-urlencoded have to do with this?

By the way

Background

Useful links


Contributed by Martin Dürst,W3C.

Valid XHTML 1.0! Valid CSS!Encoded in UTF-8!

First published 14 January, 2004.
Version: $Id qa-apache-lang-neg.html,v 1.13 2004/01/14 12:57:32 rishida Exp $