Overview: Mathematical Markup Language (MathML) Version 2.0
Previous: 5 Combining Presentation and Content Markup
Next: 7 The MathML Interface
6 Characters, Entities and Fonts
6.1 Introduction
6.1.1 The Intent of Character Names
6.1.2 The STIX Project
6.1.3 Character Listings
6.1.4 NonMarking Characters
6.1.5 Printing Character Symbol Listings
6.1.6 Special Constants
6.1.7 Alphabetical Lists
6.1.8 ISO Character Set Groupings
Notation and symbols have proved very important for mathematics. Mathematics has grown in part because of the succinctness and suggestiveness of its evolving notation. There have been many new signs evolved for use in mathematical notation, and mathematicians have not held back from making use of many symbols originally developed elsewhere. The result is that mathematics makes use of a very large collection of symbols. It is difficult to write mathematics fluently if these characters are not available for use in coding. It is difficult to read mathematics if corresponding glyphs are not available for presentation on specific display devices.
This situation posed a problem for the first W3C Math Working Group when it was brought into existence. It did not fall naturally within the purview of a developing a specification enabling mathematics to be used with HTML and producing a DTD for this to worry about more than the entities allowed in the DTD. However, as experience has shown, a long list of entities with no means to display them is of little use, and a cause of frequent frustrations in trying use a standard. On the other hand, a large collection of glyphs and fonts of characters without a standard way to refer to them is not of much use either.
The W3C Math Working Group has therefore took on directly the task of specification of part of the full mechanism of needed to proceed from notation to final presentation, and started collaboration with organizations undertaking specification of the rest.
For instance, in MathML 1 we tried to use entity names for the many character signs that are contained in ISO TR 9573, which supersedes the ISO TR 8879 annex as far as mathematics is concerned. There are considerations of mathematical usage that do on occasion militate against this, and the TR 9573 lists need supplementing. There was the hope of agreeing with the TR 9573 WG on suitable extensions, in the course of the revision of their document that they were undertaking. That has not actually happened, and the expected TR 9573 revision has not appeared either.
The STIX project of the STIPUB group of scientific and technical publishers has also been working since 1997 toward a common collection of mathematical symbols and names. The W3C Math Working Group itself has collaborated with that project and expects to have to issue further updates on the matter of character entities as a consequence of useful work of this project and others. For the latest character tables and fonts information, see the W3C Math Working Group home page.
The first STIX project team leader, Nico Poppelier, is a member of the W3C Math Working Group. The STIX project, set up by the STIPUB group of publishers includes the American Chemical Society (ACS), the American Institute of Physics (APS), the American Mathematical Society (AMS), the AMerican Physical Society (APS), Elsevier Science Publishers, the Institute of Electrical and Electronic Engineers (IEEE). An initial aim was to formulate a collection of characters needed in the course of scientific and technical publishing. A database of characters in common use has been produced by collaborating publishing organizations, including information from the T_{E}X world, Springer Verlag (Heidelberg), Design Science Inc., Wolfram Research Inc., the Association for Computing Machinery (ACM) in addition to the abovementioned. The coordination and the major portion of the work on this have been carried out by Barbara Beeton of the AMS.
The STIX team has proposed to the Unicode Technical Committee (UTC) of the Unicode consortium the additions to the next revision of the Unicode character set that this process shows are needed, together with the appropriate character codes. This has been the subject of ongoing negotiation for some time. In March 2000 a honed proposal supported by the UTC went on the the ISO WG2 meeting in Beijing which deals with incorporation of new material into the standard ISO 10646. The final results of that deliberation, which it is hoped will confirm assignement of codepoints put forward by the UTC will be incorporated into the information made public by the Math WG.
Finally, the STIX project's intention has always been to commission the production of a complete set of fonts covering those Unicode characters for science and technology, to be made available to the public under license, but free of charge. The STIPUB group recognizes that easy availability of the characters and fonts greatly facilitates communication and publication. At the start of the year 2000 the process of commisioning the making of fonts is underway, and their widespread availability is hoped with one or two years.
This chapter of the MathML Specification contains a listing of character names for use in MathML.
To provide more background on the characters used by mathematics we have used a large comparative database showing codes and meanings in other common math environments. The W3C Math Working Group is very grateful to Elsevier Science, to Wolfram Research (makers of Mathematica ®) and to Design Science (Makers of MathType ®) for making available to us so much useful data.
In MathML 1 the characters of the mathematical sciences were
listed as entities. This is coherent with thinking in terms of
SGML markup and the use of DTDs. For the XML world with its use
of documents wellformedness is to be sufficient for the
examination of a particular one, which does not require
validation against a DTD, where character entities would be
found declared. The next development that is expected to
replace the DTD as a specifier of a class of documents is that
of Schemas. The specification for Schemas is presently under
active development at the W3C. Though the final form of Schemas
is not yet clear, it is known that their use precludes effective
use of large lists of entities. For that reason MathML 2 passes
from the use of entities to name mathematical characters, which
becomes a deprecated usage, to the use of mchar
elements. For this reason the tables
below just list the suggested character names, which should be
used in the form <mchar name="character_name" />
.
Some characters although important for the quality of print rendering do not directly have glyph marks that correspond. They are called here nonmarking characters. Below we have a table of those adopted for the purposes of MathML. Their roles are discussed in Chapter 3 [Presentation Markup] and Chapter 4 [Content Markup], respectively. The values of the spaces given are recommendations. Some of these characters do not have official Unicode values, and some are given as combinations of Unicode characters employing the special mathematics modifier character (U02063). The correspondence between the spacing values mentioned below and those in the Unicode descriptions are not exact, but are good matches.
It used to be in MathML 1.0 that there were a number more
nonmarking character entities listed here. These were conerned with
composition control, such as linebreaking, In MathML 2 such control
is effected by the use of the proper attributes on the mspace
element.
Character name  Unicode  Description 
	 
00009  tabulator stop; horizontal tabulation 

 
0000A  force a line break; line feed 
&Space; 
00020  one em of space in the current font 
  
000A0  space that is not a legal breakpoint 
​ 
0200B  space of no width at all 
  
0200A  space of width 1/18 em 
  
02009  space of width 3/18 em 
  
02005  space of width 4/18 em 
   
020050200A  space of width 5/18 em 
​ 
0200A02063  space of width 1/18 em 
​ 
0200902063  space of width 3/18 em 
​ 
0205F02063  space of width 4/18 em 
​ 
0200502063  space of width 5/18 em 
⁢ 
02062  marks multiplication when it is understood without a mark
(Section 3.2.4 [Operator, Fence, Separator or Accent
(mo )] 
⁡ 
02061  character showing function application in presentation tagging
(Section 3.2.4 [Operator, Fence, Separator or Accent
(mo )] 
Even though the situation concerning availability of character codes from Unicode and under ISO 10646 is not yet fully clear at the time of writing, we have decided to proceed on the assumption that the code points suggested to ISO WG2 by the UTC will be confirmed. As before we can only reiterate that for current developments on details of character standards as far as they influence mathematical formalism the Home Page of the W3C Math WG should be consulted.
The Math WG started from the ISO 957313 proposal, as conveyed to us from Anders Berglund, and added a number of informative additional aliases based in the practice of the mathematical typesetting community. The main influence outside ISO has been the names to be found in the T_{E}X community because they inform the practice of the contributors to the STIX character database mentioned above.
To facilitate comprehension of a fairly large list of names, which totals over 2000 in this case, we offer the same information in more than one form.
We have characters listed by name and sample glyphs for all
of them. Each character name is accompanied by a code for a
character grouping chosen from a list given below, a short
verbal description, and a Unicode hex code if there is a
corresponding sample glyph to be found in ISO 10646, now
extended in accordance with the proposal forwarded by the UTC to
ISO WG2 in March 2000. We have excluded, with very few
exceptions that seemed to us compelling, other characters that
may have appeared in the corresponding lists in MathML 1. Those
characters thus lost will be found to be used very
infrequently in the experience of mathematical publishers, or
simply to be completley unacceptable for inclusion in Unicode.
However MathML 2 does provide the mglyph
and csymbol
elements to accommodate new
characters that authors may wish to introduce.
The character listings by alphabetical and Unicode order in Section 6.1.7 [Alphabetical Lists] have now been brought more into line with the corresponding ISO character sets than was the case in MathML 1.0, in that if some part of a set is included then the entire set is included. In addition, the group ISOCHEM has been dropped as more properly the concern of chemists. These changes have also been reflected in the entity declarations in the DTD in Appendix A [Parsing MathML].
To commence we list separately a few of the special characters
which MathML has seen fit to be a little radical in introducing.
These have been accorded new Unicode values. There used also the be
entries below for &true;
, &false;
and &NotANumber;
, but
these do not yet have Unicode points assigned to them so have been
removed. They can be reintroduced by the character extension
mechanisms provided by the mchar
and csymbol
elements.
Entity name  Unicode  Description 
ⅅ 
02145  D for use in differentials, e.g. within integrals 
ⅆ 
02146  d for use in differentials, e.g. within integrals 
ⅇ 
02147  e for use for the exponential base of the natural logarithms 
ⅈ 
02148F  i for use as a square root of 1 
The first table offered is a very large ASCII listing of printing entity names, ordered alphabetically, with uppercase preceding lowercase as in ASCII order. There is also an ASCII listing of printing characters ordered by Unicode number. The Unicode point points are those of the current proposal which will, it is expected eventaully be part of the next revision, Unicode 4. Unicode 3 has just been published in February 2000. Next we have collections of the entities in entity sets which correspond to the groupings in the corresponding ISO documents.
In addition, we list the above material in the groupings used by ISO 957313 introduced. This table makes explicit the entity groupings and provides links to ASCII listings of the groups and HTML tabular listings which display the glyphs, as well.
The symbols for mathematics that ISO have considered are organized, for both historical and mnemonic reasons into groupings with somewhat descriptive names. In the tables below we reproduce the newly proposed versions of these groups and give the corresponding Unicode sample glyphs. The entries are organized alphabetically by character name.
It should be noted that the sample glyphs given here are in GIF files intended for viewing on a monitor's screen at 72dpi. They are not suitable for printing, and in particular do not constitute a set of fonts covering the symbols of mathematics. Such a set of fonts is under development in more than one context. The MathML Working Group is engaged in the effort of ensuring that such fonts will be readily publicly available.
This first block of sets includes mostly nonletter symbols, along with a few letters loaded with mathematical semantics.
Group  Descriptive Name 
ISOAMSA  Added Math Symbols: Arrows

ISOAMSB  Added Math Symbols: Binary Operators

ISOAMSC  Added Math Symbols: Delimiters

ISOAMSN  Added Math Symbols: Negated Relations

ISOAMSO  Added Math Symbols: Ordinary

ISOAMSR  Added Math Symbols: Relations

ISOTECH  General Technical

ISOPUB  Publishing

ISODIA  Diacritical Marks

ISONUM  Numeric and Special Graphic

ISOBOX  Box and Line Drawing

Mathematical literature displays the common use of particular font styles. Characters representing given letters which differ only in the glyph presentation are in principle not different for the purposes of a character registry such as Unicode, which is not supposed to take into account mere font differences. However usage has meant that both ISO and Unicode, like mathematics, recognize them as different entities. Therefore we wish to include lists for Greek, script, open face (also known as double struck or blackboard bold), and fraktur (also known as gothic or German) fonts. The UTC has accepted a proposal for the inclusion of alphabetic character runs in Unicode Plane 1 for the express use of mathematics, brought to them by Murray Sargent of Microsoft and supported by the STIX Project as a compromise solution. However the tenets of the UTC preclude the duplication, if at all possible, of methods for encoding a character which conventionally has esentially one glyphic representation. Thus there are holes at certain points in the alphabetic runs for mathematical use in Plane 1 coding. These holes will, however, be reserved and not used for anything else, and so can be used, internally, in the obvious way by an application handling mathematics.
Group  Descriptive Name 
ISOGRK3  Greek Symbols

ISOMSCR  Math Alphabet Script

ISOMOPF  Math Alphabet Open Face

ISOMFRK  Math Alphabet Fraktur

Overview: Mathematical Markup Language (MathML) Version 2.0
Previous: 5 Combining Presentation and Content Markup
Next: 7 The MathML Interface