6 Entities, Characters and Fonts

Mathematical Markup Language (MathML) Version 2.0
5 Combining Presentation and Content Markup
6 Entities, Characters and Fonts
6.1 Introduction
   6.1.1 The Intent of Entity Names
   6.1.2 The STIX Project
   6.1.3 Entity Listings
   6.1.4 Non-Marking Entities
   6.1.5 Printing Entity Listings
   6.1.6 Special Constants
   6.1.7 Alphabetical Lists
   6.1.8 ISO Entity Set Groupings
   6.1.9 Additional Entity Set Grouping
7 The MathML Interface

6.1 Introduction

6.1.1 The Intent of Entity Names

Notation has proved very important for mathematics. Mathematics has grown in part because of the succinctness and suggestiveness of its evolving notation. There have been many new signs evolved for use in mathematical notation, and mathematicians have not held back from making use of many symbols originally developed elsewhere. The result is that mathematics makes use of a very large collection of symbols. It is difficult to write mathematics fluently if these characters are not available for use in coding. It is difficult to read mathematics if glyphs are not available for presentation on specific display devices.

This situation poses a problem for the W3C Math Working Group. It does not fall naturally within the purview of a mathematics for HTML specification and DTD production to worry about more than the entities allowed in the DTD. Moreover, as experience has shown, a long list of entities with no means to display them is of little use, and a cause of frequent frustrations in trying use a standard. On the other hand, a large collection of glyphs or characters without a standard way to refer to them is not of much use either.

The W3C Math Working Group has therefore taken on directly specification of part of the full mechanism of proceeding from notation to final presentation, and is collaborating with organizations undertaking specification of the rest.

For instance, we try to use entity names that are contained in ISO TR 9573, which supersedes the ISO TR 8879 annex as far as mathematics is concerned. There are considerations of mathematical usage that do on occasion militate against this, and the TR 9573 lists need supplementing. We hope to be able to agree with the TR 9573 WG on suitable extensions, in the course of the revision of their document that they are presently undertaking.

The STIX project of the STIPUB group of scientific and technical publishers has also been working toward a common collection of mathematical symbols and names. The W3C Math Working Group expects to issue further updates on the matter of character entities as a consequence of this project's useful work. For the latest character tables and fonts information, see the W3C Math Working Group home page.

6.1.2 The STIX Project

The STIX project team leader, Nico Poppelier, is a member of the W3C Math Working Group. The STIX project, set up by the STIPUB group of publishers, aims to formulate a collection of characters needed in the course of scientific and technical publishing. A database of characters in common use is being produced by collaborating publishing organizations. The team will propose to the Unicode consortium the additions to the next revision of the Unicode character set that this process shows are needed, together with the appropriate character codes. Finally the STIX project will commission the production of a complete set of fonts covering those Unicode characters for science and technology, to be made available to the public under license, but free of charge. The STIPUB group recognizes that easy availability of the characters and fonts greatly facilitates communication and publication.

6.1.3 Entity Listings

This chapter of the MathML Specification contains a listing of entities for use in MathML.

To provide more background on the characters used by mathematics we have used a larger comparative database showing codes and meanings in other common math environments. The W3C Math Working Group is very grateful to Elsevier Science and to Wolfram Research (makers of Mathematica ®) for making available to us so much useful data.

6.1.4 Non-Marking Entities

Some character entities, although important for the quality of print rendering do not directly have glyph marks that correspond. They are called here non-marking entities. Below we have a table of those adopted for the purposes of MathML. Their roles are discussed in chapter 3 [Presentation Markup] and chapter 4 [Content Markup], respectively. The values of the spaces given are recommendations. Some of these characters do not already have Unicode values. Arbitrary values up in the Private Zone E8 range have been assigned. The correspondence between the spacing values mentioned below and those in the Unicode descriptions are not exact, but are good matches.

Entity name Unicode Description
	 0009 tabulator stop; horizontal tabulation

 000A force a line break; line feed
&IndentingNewLine; E891 force a line break and indent appropriately on next line
⁠ E892 never break line here
&GoodBreak; E893 if a linebreak is needed, here is a good spot
&BadBreak; E894 if a linebreak is needed, try to avoid breaking here
&Space; 0020 one em of space in the current font
  00A0 space that is not a legal breakpoint
​ 200B space of no width at all
  200A space of width 1/18 em
  2009 space of width 3/18 em
  2005 space of width 4/18 em
   E897 space of width 5/18 em
​ E898 space of width -1/18 em
​ E899 space of width -3/18 em
​ E89A space of width -4/18 em
​ E89B space of width -5/18 em
⁣ E89C used as a separator, e.g. in indices (section 3.2.4 [Operator, Fence, Separator or Accent]
⁣ E89C short form of ⁣
⁢ E89E marks multiplication when it is understood without a mark (section 3.2.4 [Operator, Fence, Separator or Accent]
⁢ E89E short form of ⁢
⁡ E8A0 character showing function application in presentation tagging (section 3.2.4 [Operator, Fence, Separator or Accent]
⁡ E8A0 short form of ⁡

6.1.5 Printing Entity Listings

Since the situation concerning availability of character codes from Unicode and under ISO 9573-13 is not yet fully clear at the time of writing, we have decided to proceed conservatively.

We have taken the ISO 9573-13 proposal, as conveyed to us from Anders Berglund, and have added a number of additional aliases based in the practice of the mathematical typesetting community. Thus the main influence outside ISO has been the names to be found in the TEX community.

To facilitate comprehension of a fairly large list of names, which totals over 2000 in this case, we offer the same information in more than one form.

We have entities listed by name and sample glyphs for all of them. Each entity name is accompanied by a code for a character grouping chosen from a list given below, a short verbal description, and a Unicode hex code if there is a corresponding sample glyph to be found in ISO 10646. Those codes beginning with the hex digit E, e.g. E321, indicate assignments to the private zone of Unicode. This indicates that the character in question is not at present an official Unicode character. It is highly recommended that authors use entity names instead of Unicode values, especially for those characters in the Unicode private zone, as those values may change. It is hoped that most of these characters will become officially endorsed by Unicode and ISO under its 10646 standard in due course. In any case we expect fonts for these characters to become publicly available as the use of MathML develops. If the entity name is an alias then a reference back to the ISO form is given if there is one, and to a preferred form if not. The ISO or preferred forms have references to their alternates where they exist.

Newly Revised. The entity listings by alphabetical and Unicode order in section 6.1.7 [Alphabetical Lists] have now been brought more into line with the corresponding ISO character sets, in that if some part of a set is included then the entire set is included. Also, ISOCHEM has been dropped. These changes have also been reflected in the entity declarations in the DTD in appendix A [Parsing MathML].

The tables of character sets with glyphs given in section 6.1.8 [ISO Entity Set Groupings] have not been revised from the original tables. In cases where information from section 6.1.7 [Alphabetical Lists] and section 6.1.8 [ISO Entity Set Groupings] conflict, the tables in section 6.1.6 [Special Constants] and the DTD should be considered normative.

6.1.6 Special Constants

To commence we list separately a few of the special characters which MathML has seen fit to be a little radical in introducing. There are two for special constants and one for calculus. They too must have private Unicode values.

Entity name Unicode Description
ⅅ F74B D for use in differentials, e.g. within integrals
ⅅ F74B short form of ⅅ
ⅆ F74C d for use in differentials, e.g. within integrals
ⅆ F74C short form of ⅆ
ⅇ F74D e for use for the exponential base of the natural logarithms
ⅇ F74D short form of ⅇ
&false; E8A7 logical constant false
ⅈ F74E i for use as a square root of -1
ⅈ F74E short form of ⅈ
&NotANumber; E8AA used in
&true; E8AB logical constant true

6.1.7 Alphabetical Lists

The first table offered is a very large ASCII listing of printing entity names, ordered alphabetically, with upper-case preceding lower-case as in ASCII order. The Unicode numbers beginning with E are arbitrary assignments in the Private Area where there is presently no Unicode character available. When there is no Unicode offered at all it is because the characters listed can be thought of as font variations of common Roman alphabetic characters.

There is also an ASCII listing of printing entities ordered by Unicode number. Next we have collections of the entities in entity sets which are similar to the groupings in the corresponding ISO documents.

6.1.8 ISO Entity Set Groupings

In addition, we list the above material in the groupings used by ISO 9573-13 with an additional grouping of aliases introduced. This table makes explicit the entity groupings and provides links to ASCII listings of the groups and HTML tabular listings which display the glyphs, insofar as they are to be had, as well. ISO Symbol Entity Sets

The symbols for mathematics that ISO have considered are organized, for both historical and mnemonic reasons into groupings with somewhat descriptive names. In the tables below we reproduce the newly proposed versions of these groups and give the corresponding Unicode sample glyphs. For each ISO 9573-13 group we give first an Extended version in ASCII listing which includes aliases, then a similar listing with sample glyphs, then the Basic ISO 9573-13 entity set and its version with included glyphs. The entries are organized alphabetically by entity name.

It should be noted that the sample glyphs given here are in GIF files intended for viewing on a monitor's screen at 72dpi. They are not suitable for printing, and in particular do not constitute a set of fonts covering the symbols of mathematics. In addition, it is important to note that the Unicode numbers assigned in the private zone, beginning with hex digits E2 and above, are arbitrary and only used here to ensure that sample glyphs are available for display. They do not constitute suggested assignments of codes. Such a set of fonts is under development in more than one context. The MathML Working Group is engaged in ensuring that fonts will be readily publicly available.

This first block of entity sets includes mostly non-letter symbols, along with a few letters loaded with mathematical semantics. At the end of the block we have included the table MMALIAS of the aliases introduced by MathML, which mostly come from the TEX community, and MMEXTRA with the additional character entities added by MathML. Note that some of the blocks are place-holders for a possible future expansion of the tables.

Group Descriptive Name    
ISOAMSA Added Math Symbols: Arrows Extended Glyphs Basic Glyphs
ISOAMSB Added Math Symbols: Binary Operators Extended Glyphs Basic Glyphs
ISOAMSC Added Math Symbols: Delimiters Extended Glyphs Basic Glyphs
ISOAMSN Added Math Symbols: Negated Relations Extended Glyphs Basic Glyphs
ISOAMSO Added Math Symbols: Ordinary Extended Glyphs Basic Glyphs
ISOAMSR Added Math Symbols: Relations Extended Glyphs Basic Glyphs
ISOTECH General Technical Extended Glyphs Basic Glyphs
ISOPUB Publishing Extended Glyphs Basic Glyphs
ISODIA Diacritical Marks Extended Glyphs Basic Glyphs
ISONUM Numeric and Special Graphic Extended Glyphs Basic Glyphs
ISOBOX Box and Line Drawing Basic Glyphs
MMALIAS MathML Aliases Basic Glyphs
MMEXTRA MathML Additions Basic Glyphs ISO Entity Sets for Mathematics Alphabets

Mathematical literature displays the common use of particular font styles. Characters representing given letters which differ only in the glyph presentation are in principle not different for the purposes of a character registry such as Unicode, which is not supposed to take into account mere font differences. However usage has meant that both ISO and Unicode, like mathematics, recognize them as different entities. Therefore we include lists for Greek, script, open face (also known as double struck or blackboard bold), and fraktur (also known as gothic or German) fonts.

Group Descriptive Name
ISOGRK3 Greek Symbols ASCII Glyphs
ISOMSCR Math Alphabet Script ASCII Glyphs
ISOMOPF Math Alphabet Open Face ASCII Glyphs
ISOMFRK Math Alphabet Fraktur ASCII Glyphs Other ISO Font Entity Sets

For reference we provide a list of the names of several other ISO font entity sets which are really normally used for text. ISOGRK4 is actually a collection of emboldened forms of the Greek letters.

Group Descriptive Name
ISOGRK1 Greek Letters
ISOGRK2 Monotoniko Greek
ISOGRK4 Alternative Greek Symbols
ISOCYR1 Russian Cyrillic
ISOCYR2 Non-Russian Cyrillic

6.1.9 Additional Entity Set Grouping

In addition to the above listed, for the sake of completeness, we provide a table of other entities not within the ISO lists which are referred to somewhere in this specification. It is not certain that all these characters, though of mathematical significance, will reach incorporation within Unicode. The W3C Math WG continues to wrestle with the problems of the characters of mathematics.

&LeftSkeleton; E850 start of missing information
&RightSkeleton; E851 end of missing information
&LeftBracketingBar; F603 left vertical delimiter
&RightBracketingBar; E604 right vertical delimiter
&LeftDoubleBracketingBar; F605 left double vertical delimiter
&RightDoubleBracketingBar; F606 right double vertical delimiter
─ E859 short horizontal line
| E85A short vertical line
≔ E85B assignment operator
❘ E85C vertical separating operator
⫤ E30F alias for ⫤
⥰ F524 right double arrow with rounded head (looks like thin superset)
⊏̸ E604 negated set-like partial order operator
⊐̸ E615 negated set-like partial order operator
⊈ 2288 alias of ⊈
⊉ 2289 alias of ⊉
⥐ F50B left-down-right-down harpoon
⥞ F50E left-down harpoon from bar
⥖ F50C left-down harpoon to bar
⥟ F50F right-down harpoon from bar
⥗ F50D right-down harpoon to bar
⇤ 21E4 alias for ⇤
⥎ F505 left-up-right-up harpoon
↤ 21A4 alias for ↤
⥚ F509 left-up harpoon from bar
⥒ F507 left-up harpoon to bar
⇥ 21E5 alias for ⇥
⥛ F50A right-up harpoon from bar
⥓ F508 up-right harpoon to bar
⩵ F431 two consecutive equal signs
⪢ E2F7 alias for ≫
⧏ F410 not left triangle, vertical bar
⪡ E2FB alias for ≪
≭ 226D alias for &nasymp;
≂̸ E84E alias for ≂̸
≎̸ E616 alias for ≎̸
≏̸ E84D alias for ≏̸
⧏̸ F412 not left triangle, vertical bar
⪢̸ F428 not double greater-than sign
⪡̸ F423 not double less-than sign
&NotPrecedesTilde; E5DC alias for ⪯̸
⧐̸ E870 not vertical bar, right triangle
≿̸ E837 not succeeds or similar
⧐ F411 vertical bar, right triangle
∏ 220F alias for ∏
⋄ 22C4 alias for ⋄
⨯ E619 cross or vector product
□ 25A1 alias for □
⤓ F504 down arrow to bar
↧ 21A7 alias for ↧
⥡ F519 down-left harpoon from bar
⥙ F517 down-left harpoon to bar
⥑ F515 up-left-down-left harpoon
⥠ F518 up-left harpoon from bar
⥘ F516 up-left harpoon to bar
⥝ F514 down-right harpoon from bar
⥕ F512 down-right harpoon to bar
⥏ F510 up-right-down-right harpoon
⥜ F513 up-right harpoon from bar
⥔ F511 up-right harpoon to bar
↓ E87F short down arrow
↑ E880 sort up arrow
⤒ F503 up arrow to bar
↥ 21A5 ↥
̑ 0311 breve, inverted (non-spacing)
‾ 00AF over bar
⏞ F612 over brace
⎴ F614 over bracket
⏜ F610 over parenthesis
_ 0332 combining low line
⏟ F613 under brace
⎵ F615 under bracket
⏝ F611 under parenthesis
▫ F530 empty very small square
▪ F529 filled very small square
◻ F527 empty small square
◼ F528 filled small square
⧴ F51F rule-delayed (colon right arrow)