Characters or markup?

Answer

The answer depends on which characters are being considered. For more detail you should read the W3C Note and Unicode Technical Report Unicode in XML & Other Markup Languages. This article will summarize some of that information.

Some Unicode characters are not suitable for use with markup

The following table lists Unicode characters that should not be used in a markup context, according to Unicode in XML & Other Markup Languages. You should use markup instead.

Names/ Description	Short Comment
Line and paragraph separator	use <br>, <p>, or equivalent
BIDI embedding controls (LRE, RLE, LRO, RLO, PDF)	Strongly discouraged where markup exists.
Activate/Inhibit Symmetric swapping	Deprecated in Unicode
Activate/Inhibit Arabic form shaping	Deprecated in Unicode
Activate/Inhibit National digit shapes	Deprecated in Unicode
Interlinear annotation characters	Use ruby markup
Byte order mark / ZWNBSP	Use only as byte order mark. Use U+2060 Word Joiner instead of using U+FEFF as ZWNBSP
Object replacement character	Use markup, e.g. HTML <object> or HTML <img>
Scoping for Musical Notation	Use an appropriate markup language
Language Tag code points	Use lang and/or xml:lang

The bidirectional text embedding controls, in particular, often cause confusion. There are some places where these have to be used to produce correctly ordered bidirectional text in languages that use right-to-left scripts, such as Arabic, Hebrew, Thaana, etc. These are places where an element doesn't allow embedded markup, such as the title element. Where markup is available, however, you should use it. For more information about this, see Unicode controls vs. markup for bidi support. For guidance on how to use the embedding controls in situations where markup cannot be used, see Using Unicode controls for bidi text.

Other Unicode characters are OK

This is not an exhaustive list. It is merely intended to provide some examples of Unicode characters that are valid for use in addition to markup to provide information about the text.

Names/ Description	Short Comment
Various	No-break space, Soft Hyphen, Combining Grapheme Joiner, Non breaking Hyphen, Word Joiner, etc.
Zero-width Joiners (ZWJ and ZWNJ)	eg. required for Persian
Implicit directional marks (LRM and RLM)
Subtending marks	common feature in the Arabic and Syriac scripts
Variation Selectors	eg. required for Mongolian
Ideographic Description Characters	indicate the composition of ideographs

'Compatibility characters' vary in appropriateness

This is taken from Unicode in XML & Other Markup Languages:

The Unicode Standard provides compatibility mappings for a number of characters. Compatibility mappings indicate a relationship to another character, but the exact nature of the relationship varies. In some cases the relationship means "is based on", in some other cases it denotes a property. When plain text is marked up, it may make sense to map some of these characters to their compatibility equivalents and suitable markup. It is important to understand the nature of the distinctions between characters and their compatibility equivalents and the context in which these distinctions matter. It is never advisable to apply compatibility mappings indiscriminately.

The following table gives an non-exhaustive list of examples.

Names/ Description	Examples	Verdict
Circled letters and digits used for list item markers	① ② ③ Ⓐ Ⓑ Ⓒ ㊂㊃㊄㊓㊔㊕㋝㋞㋟	OK
Parenthesized or dotted number used as list item markers	⑴ ⑵ ⑶	use list item marker style
Arabic Presentation forms	ﻉ ﻊ ﻋ ﻌ	normalize
Half-width and full-width characters	ﾔﾕﾖﾗａｂｃｄ	OK
Superscripted and subscripted characters	¹ ² ³ ₁ ₂ ₃	use <sup> or <sub> markup

Characters or markup?

Question

Answer

Some Unicode characters are not suitable for use with markup

Other Unicode characters are OK

'Compatibility characters' vary in appropriateness

Further reading