Non-English tags

Intended audience: XHTML/HTML coders (using editors or scripting), XML content authors, and schema developers (DTDs, XML Schema, RelaxNG, etc.), and anyone who wants to know whether they can use element names in other languages than English.



Can I write HTML and XML element and attribute tag names in languages and scripts other than English?


HTML or XHTML tags are all pre-defined (in English) and must remain that way if they are to be correctly recognized by user agents (eg. browsers).

In XML it is possible to define your own tag names. You can do this in any language and script supported by Unicode.

Although all XML processors must support Unicode, it is sensible to apply some caution here. If a person has to work with a tag set in, say, Chinese, Arabic or Hindi it might prove difficult if they don't speak those languages or don't have the right fonts and rendering software on their system. English tag names have an advantage for DTDs that are used by multinational groups because people from a large number of countries are likely to be able to easily view and understand the meaning of the tags you are using.

On the other hand, non-English tag names can be useful for educational materials. For example, it is common in Japanese XML primers.

By the way

Note also that, because NCRs are not allowed in tag names, using non-ASCII tag names requires you to use a character encoding that supports the characters needed. Using a Unicode encoding such as UTF-8 is always the best approach.

If you are using XML 1.1 almost any character is allowed, but not every character is sensible. For a set of recommendations about which characters to use, see Appendix B of the XML 1.1 spec.

For specific information about which characters are allowed in XML tags see the further reading listed below.