Mathematical Markup Language (MathML) Version 2.0
5 Combining Presentation and Content Markup
6 Entities, Characters and Fonts
6.1 Introduction
6.1.1 The Intent of Entity Names
6.1.2 The STIX Project
6.1.3 Entity Listings
6.1.4 NonMarking Entities
6.1.5 Printing Entity Listings
6.1.6 Special Constants
6.1.7 Alphabetical Lists
6.1.8 ISO Entity Set Groupings
7 The MathML Interface
Notation has proved very important for mathematics. Mathematics has grown in part because of the succinctness and suggestiveness of its evolving notation. There have been many new signs evolved for use in mathematical notation, and mathematicians have not held back from making use of many symbols originally developed elsewhere. The result is that mathematics makes use of a very large collection of symbols. It is difficult to write mathematics fluently if these characters are not available for use in coding. It is difficult to read mathematics if glyphs are not available for presentation on specific display devices.
This situation poses a problem for the W3C Math Working Group. It does not fall naturally within the purview of a mathematics for HTML specification and DTD production to worry about more than the entities allowed in the DTD. Moreover, as experience has shown, a long list of entities with no means to display them is of little use, and a cause of frequent frustrations in trying use a standard. On the other hand, a large collection of glyphs or characters without a standard way to refer to them is not of much use either.
The W3C Math Working Group has therefore taken on directly specification of part of the full mechanism of proceeding from notation to final presentation, and is collaborating with organizations undertaking specification of the rest.
For instance, we try to use entity names that are contained in ISO TR 9573, which supersedes the ISO TR 8879 annex as far as mathematics is concerned. There are considerations of mathematical usage that do on occasion militate against this, and the TR 9573 lists need supplementing. We hope to be able to agree with the TR 9573 WG on suitable extensions, in the course of the revision of their document that they are presently undertaking.
The STIX project of the STIPUB group of scientific and technical publishers has also been working toward a common collection of mathematical symbols and names. The W3C Math Working Group expects to issue further updates on the matter of character entities as a consequence of this project's useful work. For the latest character tables and fonts information, see the W3C Math Working Group home page.
The STIX project team leader, Nico Poppelier, is a member of the W3C Math Working Group. The STIX project, set up by the STIPUB group of publishers, aims to formulate a collection of characters needed in the course of scientific and technical publishing. A database of characters in common use is being produced by collaborating publishing organizations. The team will propose to the Unicode consortium the additions to the next revision of the Unicode character set that this process shows are needed, together with the appropriate character codes. Finally the STIX project will commission the production of a complete set of fonts covering those Unicode characters for science and technology, to be made available to the public under license, but free of charge. The STIPUB group recognizes that easy availability of the characters and fonts greatly facilitates communication and publication.
This chapter of the MathML Specification contains a listing of entities for use in MathML.
To provide more background on the characters used by mathematics we have used a larger comparative database showing codes and meanings in other common math environments. The W3C Math Working Group is very grateful to Elsevier Science and to Wolfram Research (makers of Mathematica ®) for making available to us so much useful data.
Some character entities, although important for the quality of print rendering do not directly have glyph marks that correspond. They are called here nonmarking entities. Below we have a table of those adopted for the purposes of MathML. Their roles are discussed in chapter 3 [Presentation Markup] and chapter 4 [Content Markup], respectively. The values of the spaces given are recommendations. Some of these characters do not already have Unicode values. Arbitrary values up in the Private Zone E8 range have been assigned. The correspondence between the spacing values mentioned below and those in the Unicode descriptions are not exact, but are good matches.
Entity name  Unicode  Description 
	 
0009  tabulator stop; horizontal tabulation 

 
000A  force a line break; line feed 
&IndentingNewLine; 
E891  force a line break and indent appropriately on next line 
⁠ 
E892  never break line here 
&GoodBreak; 
E893  if a linebreak is needed, here is a good spot 
&BadBreak; 
E894  if a linebreak is needed, try to avoid breaking here 
&Space; 
0020  one em of space in the current font 
  
00A0  space that is not a legal breakpoint 
​ 
200B  space of no width at all 
  
200A  space of width 1/18 em 
  
2009  space of width 3/18 em 
  
2005  space of width 4/18 em 
   
E897  space of width 5/18 em 
​ 
E898  space of width 1/18 em 
​ 
E899  space of width 3/18 em 
​ 
E89A  space of width 4/18 em 
​ 
E89B  space of width 5/18 em 
⁣ 
E89C  used as a separator, e.g. in indices
(section 3.2.4 [Operator, Fence, Separator or Accent
(mo )] 
⁣ 
E89C  short form of ⁣ 
⁢ 
E89E  marks multiplication when it is understood without a mark
(section 3.2.4 [Operator, Fence, Separator or Accent
(mo )] 
⁢ 
E89E  short form of ⁢ 
⁡ 
E8A0  character showing function application in presentation tagging
(section 3.2.4 [Operator, Fence, Separator or Accent
(mo )] 
⁡ 
E8A0  short form of ⁡ 
Since the situation concerning availability of character codes from Unicode and under ISO 957313 is not yet fully clear at the time of writing, we have decided to proceed conservatively.
We have taken the ISO 957313 proposal, as conveyed to us from Anders Berglund, and have added a number of additional aliases based in the practice of the mathematical typesetting community. Thus the main influence outside ISO has been the names to be found in the T_{E}X community.
To facilitate comprehension of a fairly large list of names, which totals over 2000 in this case, we offer the same information in more than one form.
We have entities listed by name and sample glyphs for all of them. Each entity name is accompanied by a code for a character grouping chosen from a list given below, a short verbal description, and a Unicode hex code if there is a corresponding sample glyph to be found in ISO 10646. Those codes beginning with the hex digit E, e.g. E321, indicate assignments to the private zone of Unicode. This indicates that the character in question is not at present an official Unicode character. It is highly recommended that authors use entity names instead of Unicode values, especially for those characters in the Unicode private zone, as those values may change. It is hoped that most of these characters will become officially endorsed by Unicode and ISO under its 10646 standard in due course. In any case we expect fonts for these characters to become publicly available as the use of MathML develops. If the entity name is an alias then a reference back to the ISO form is given if there is one, and to a preferred form if not. The ISO or preferred forms have references to their alternates where they exist.
Newly Revised. The entity listings by alphabetical and Unicode order in section 6.1.7 [Alphabetical Lists] have now been brought more into line with the corresponding ISO character sets, in that if some part of a set is included then the entire set is included. Also, ISOCHEM has been dropped. These changes have also been reflected in the entity declarations in the DTD in appendix A [Parsing MathML].
The tables of character sets with glyphs given in section 6.1.8 [ISO Entity Set Groupings] have not been revised from the original tables. In cases where information from section 6.1.7 [Alphabetical Lists] and section 6.1.8 [ISO Entity Set Groupings] conflict, the tables in section 6.1.6 [Special Constants] and the DTD should be considered normative.
To commence we list separately a few of the special characters which MathML has seen fit to be a little radical in introducing. There are two for special constants and one for calculus. They too must have private Unicode values.
Entity name  Unicode  Description 
ⅅ 
F74B  D for use in differentials, e.g. within integrals 
ⅅ 
F74B  short form of ⅅ 
ⅆ 
F74C  d for use in differentials, e.g. within integrals 
ⅆ 
F74C  short form of ⅆ 
ⅇ 
F74D  e for use for the exponential base of the natural logarithms 
ⅇ 
F74D  short form of ⅇ 
&false; 
E8A7  logical constant false 
ⅈ 
F74E  i for use as a square root of 1 
ⅈ 
F74E  short form of ⅈ 
&NotANumber; 
E8AA  used in section 4.3.2.9 [type ] 
&true; 
E8AB  logical constant true 
The first table offered is a very large ASCII listing of printing entity names, ordered alphabetically, with uppercase preceding lowercase as in ASCII order. The Unicode numbers beginning with E are arbitrary assignments in the Private Area where there is presently no Unicode character available. When there is no Unicode offered at all it is because the characters listed can be thought of as font variations of common Roman alphabetic characters.
There is also an ASCII listing of printing entities ordered by Unicode number. Next we have collections of the entities in entity sets which are similar to the groupings in the corresponding ISO documents.
In addition, we list the above material in the groupings used by ISO 957313 with an additional grouping of aliases introduced. This table makes explicit the entity groupings and provides links to ASCII listings of the groups and HTML tabular listings which display the glyphs, insofar as they are to be had, as well.
The symbols for mathematics that ISO have considered are organized, for both historical and mnemonic reasons into groupings with somewhat descriptive names. In the tables below we reproduce the newly proposed versions of these groups and give the corresponding Unicode sample glyphs. For each ISO 957313 group we give first an Extended version in ASCII listing which includes aliases, then a similar listing with sample glyphs, then the Basic ISO 957313 entity set and its version with included glyphs. The entries are organized alphabetically by entity name.
It should be noted that the sample glyphs given here are in GIF files intended for viewing on a monitor's screen at 72dpi. They are not suitable for printing, and in particular do not constitute a set of fonts covering the symbols of mathematics. In addition, it is important to note that the Unicode numbers assigned in the private zone, beginning with hex digits E2 and above, are arbitrary and only used here to ensure that sample glyphs are available for display. They do not constitute suggested assignments of codes. Such a set of fonts is under development in more than one context. The MathML Working Group is engaged in ensuring that fonts will be readily publicly available.
This first block of entity sets includes mostly nonletter symbols, along with a few letters loaded with mathematical semantics. At the end of the block we have included the table MMALIAS of the aliases introduced by MathML, which mostly come from the T_{E}X community, and MMEXTRA with the additional character entities added by MathML. Note that some of the blocks are placeholders for a possible future expansion of the tables.
Group  Descriptive Name 
ISOAMSA  Added Math Symbols: Arrows

ISOAMSB  Added Math Symbols: Binary Operators

ISOAMSC  Added Math Symbols: Delimiters

ISOAMSN  Added Math Symbols: Negated Relations

ISOAMSO  Added Math Symbols: Ordinary

ISOAMSR  Added Math Symbols: Relations

ISOTECH  General Technical

ISOPUB  Publishing

ISODIA  Diacritical Marks

ISONUM  Numeric and Special Graphic

ISOBOX  Box and Line Drawing

MMLALIAS  MathML Aliases

MMLEXTRA  MathML Additions

Mathematical literature displays the common use of particular font styles. Characters representing given letters which differ only in the glyph presentation are in principle not different for the purposes of a character registry such as Unicode, which is not supposed to take into account mere font differences. However usage has meant that both ISO and Unicode, like mathematics, recognize them as different entities. Therefore we include lists for Greek, script, open face (also known as double struck or blackboard bold), and fraktur (also known as gothic or German) fonts.
Group  Descriptive Name 
ISOGRK3  Greek Symbols

ISOMSCR  Math Alphabet Script

ISOMOPF  Math Alphabet Open Face

ISOMFRK  Math Alphabet Fraktur

For reference we provide a list of two additional ISO font entity sets which are really normally used for text.
Group  Descriptive Name 
ISOCYR1  Russian Cyrillic

ISOCYR2  NonRussian Cyrillic
