Accesskey n skips to in page navigation. Skip to the content start

Go to W3C Home PageGo to Architecture Domain home page  Internationalization 
 

FAQ: Non-English tags

Question

Can I write HTML and XML element and attribute tag names in languages and scripts other than English?

Answer

HTML or XHTML tags are all pre-defined (in English) and must remain that way if they are to be correctly recognized by user agents (eg. browsers).

In XML it is possible to define your own tag names. You can do this in any language and script supported by Unicode. (More specifically XML 1.0 supports selected characters from the Unicode Standard version 2.0. XML 1.1 supports nearly all characters defined by the Unicode Standard versions 3.0 and above.)

Although all XML processors must support Unicode, it is sensible to apply some caution here. If a person has to work with a tag set in, say, Chinese, Arabic or Hindi it might prove difficult if they don't speak those languages or don't have the right fonts and rendering software on their system. English tag names have an advantage for DTDs that are used by multinational groups because people from a large number of countries are likely to be able to easily view and understand the meaning of the tags you are using.

On the other hand, non-English tag names can be useful for educational materials. For example, it is common in Japanese XML primers.

By the way

Note also that, because NCRs are not allowed in tag names, using non-ASCII tag names requires you to use a character encoding that supports the characters needed. Using a Unicode encoding such as UTF-8 is usually the best approach.

If you are using XML 1.1 almost any character is allowed, but not every character is sensible. For a set of recommendations about which characters to use, see Appendix B of the XML 1.1 spec.

For specific information about which characters are allowed in XML tags see the further reading listed below.

Tell us what you think

Was this useful?
Poor Excellent

I am a:




Further reading

Author: Richard Ishida, W3C.

Valid XHTML 1.0!
Valid CSS!
Encoded in UTF-8!

Content first published 9 June, 2003. Last substantive update 2004-06-28 09:57 GMT. This version 2005-08-22 14:35 GMT

For a summary of significant changes, search for qa-non-eng-tags in the change log.