Accesskey n skips to in page navigation. Skip to the content start
Can I write HTML and XML element and attribute tag names in languages and scripts other than English?
HTML or XHTML tags are all pre-defined (in English) and must remain that way if they are to be correctly recognized by user agents (eg. browsers).
In XML it is possible to define your own tag names. You can do this in any language and script supported by Unicode. (More specifically XML 1.0 supports selected characters from the Unicode Standard version 2.0. XML 1.1 supports nearly all characters defined by the Unicode Standard versions 3.0 and above.)
Although all XML processors must support Unicode, it is sensible to apply some caution here. If a person has to work with a tag set in, say, Chinese, Arabic or Hindi it might prove difficult if they don't speak those languages or don't have the right fonts and rendering software on their system. English tag names have an advantage for DTDs that are used by multinational groups because people from a large number of countries are likely to be able to easily view and understand the meaning of the tags you are using.
On the other hand, non-English tag names can be useful for educational materials. For example, it is common in Japanese XML primers.
Note also that, because NCRs are not allowed in tag names, using non-ASCII tag names requires you to use a character encoding that supports the characters needed. Using a Unicode encoding such as UTF-8 is usually the best approach.
If you are using XML 1.1 almost any character is allowed, but not every character is sensible. For a set of recommendations about which characters to use, see Appendix B of the XML 1.1 spec.
For specific information about which characters are allowed in XML tags see the further reading listed below.
XML 1.0 Specification, 2.3 Common Syntactic Constructs http://www.w3.org/TR/REC-xml#sec-common-syn
XML 1.1 Specification, 2.3 Common Syntactic Constructs http://www.w3.org/TR/xml11/#sec2.3
Other W3C I18N resources relating to Character encodings http://www.w3.org/International/resource-index#charset
Content first published 9 June, 2003. Last substantive update 2004-06-28 09:57 GMT. This version 2005-08-22 14:35 GMT
For a summary of significant changes, search for qa-non-eng-tags in the change log.
Copyright © 2005 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply. Your interactions with this site are in accordance with our public and Member privacy statements.