Common HTML entities used for typography

From W3C Wiki

Introduction

This part of the Web Standards Curriculum looks at the different codes that can be used to represent text characters when there is a need to escape them. There are a number of HTML entities that come in handy when there’s a need for first-rate typesetting. Many of those listed in Table 1 are useful only when used in foreign language copy (and copy written in specific dialects of English), so context should be taken into account before the choice is made to use them.

For the sake of portability, Unicode entity references should be reserved for use in documents certain to be written in the UTF-8 or UTF-16 character sets. In all other cases, the alphanumeric references should be used.

Character(s) Literal(s) Alphanumeric value(s) Unicode value(s) Prefer to
Cent (currency) ¢ ¢ ¢  
Pound (currency) £ £ £  
Section 1 § § §  
Copyright © © © (c)
Guillemets 2 « » « » « » "
Registered trademark ® ® ® (R)
Degree(s) ° ° °  
Plus/minus ± ± ± +/-
Pilcrow (paragraph) 3 ¶ ¶  
Middle dot 4 · · ·  
Fractional half 5 ½ ½ ¼ 1/2
En dash 6, 7 – – - for ranges
Em (long) dash 7, 8 — — - enclosed by spaces, or --
Single quotes 9, 10 ‘ ’ ‘ ’ ‘ ’ ' or '
Single low quote 11 ‚ ‚ ' or comma
Double quotes 9 “ ” “ ” “ ” ", ", , or ``
Double low quote 11 „ „ " or ,,
Single & double daggers † ‡ † ‡ † ‡ * and **
Bullet • • *
Ellipsis 12 … … ...
Prime & double prime 13 ′ ″ ′ ″ ′ ″ ', , ', ", minutes:seconds elapsed
Euro sign € €  
Trademark ™ ™ (tm)
Almost equal to ≈ ≈ ~
Not equal to ≠ ≠ !=
Less/greater than or equal to ≤ ≥ &le; &ge; &#8804; &#8805; <= or >=
Less/greater than < > &lt; &gt; &#062; &#060;

Table 1: HTML entities useful for proper typesetting, listed in order by decimal Unicode position.


Note that guillemets are used for quotes in certain European languages (such as French and Norsk); in these situations, you should always use q elements instead.


HTML entity usage notes

  1. Citations of statute law, eg, “29 USC § 794 (d),” are the matter most likely to reference this character.
  2. Guillemets often enclose the names of stories, songs, films, public accommodations (eg, «Rick’s Café Americain»), and popular toponyms in European languages, particularly those of the Romance sub-family. They are also used for quotes in certain European languages (such as French and Norsk); in these situations, you should always use q elements instead.
  3. The pilcrow, used to mark the beginning of paragraphs that might otherwise be ambiguous, is useful when setting teaser copy. The print distribution of Rolling Stone magazine has often used such an approach. In technical writing, it might also be useful for marking an orphaned first line of a paragraph. ¶ Paragraphs marked with this symbol will most often be assigned a display value of inline, which will be explained in the introduction to the CSS layout model.
  4. The middle dot is an anachronistic analogue to the decimal point, still used by some designers to enumerate amounts of decimalized currency.
  5. HTML also provides references to the code positions for one-quarter and three-quarters fractions.
  6. The en dash is used between two quantities or dates to suggest a range, and is indistinguishable from a proper minus sign (&minus;/&#8722;). However, it should always be distinguished from a hyphen (&#45;), which is used to separate the parts of an ad hoc compound word.
  7. Browsers create soft linebreaks after hyphens (see above), but not after en dashes or em dashes.
  8. The exclusive use of the em dash in English is to mark one or both ends of a dependent clause in lieu of parentheses, and to indicate that if spoken aloud the clause should be preceded and followed by uninflected pauses. In several other languages — particularly those of the Slavic sub-family — em dashes indicate dialogue from the beginning of a paragraph. Tradition dictates that this character not be enclosed itself by spaces, but the thoughtful user of markup may wish to do just that in order to avoid an especially ragged line.
  9. These are the members of the automated “Smart Quotes” set of characters incorporated into most popular word processing platforms. They are often encoded at vendor-specific code positions rather than Unicode or ISO Latin code positions, which can cause problems when they are copied into a Web document.
  10. The single close quote character is also used in English as the apostrophe.
  11. Low quotes are used in several Central and Eastern European langauges in preference to the analogous English opening quote characters.
  12. Since the ellipsis is a single character, the tracking of its constituent glyphs will not be affected by any value set for the letter-spacing or text-align properties.
  13. Primes are used to denote minutes (of both time elapsed and arc) and feet as units of measurement; the double prime in its turn denotes seconds and inches. The use of these characters in relation to units of time elapsed has decreased in popularity in recent years, a decrease that correlates strongly with the increased availability of word processing systems (and their common use by non-specialist operators). Many fonts use prime and double prime characters indistinguishable from single and double close quotes, but for reasons of portability these entities should still be used when called for, notwithstanding the characteristics of the intended display face.

Note: This material was originally published as part of the Opera Web Standards Curriculum, available as Supplementary: Common HTML entities used for typography, written by Ben Henick. Like the original, it is published under the Creative Commons Attribution, Non Commercial - Share Alike 2.5 license.