REC-MathML-19980407

6. Entities, Characters and Fonts

6.1 Introduction

6.1.1 The Intent of Entity Names

Notation has proved very important for mathematics. Mathematics has grown in part because of the succinctness and suggestiveness of its evolving notation. There have been many new signs evolved for use in mathematical notation, and mathematicians have not held back from making use of many symbols originally developed elsewhere. The result is that mathematics makes use of a very large collection of symbols. It is difficult to write mathematics fluently if these characters are not available for use in coding. It is difficult to read mathematics if glyphs are not available for presentation on specific display devices.

This situation poses a problem for the W3C Math Working Group. It does not fall naturally within the purview of a math for HTML specification and DTD production to worry about more than the entities allowed in the DTD. Moreover, as experience has shown, a long list of entities with no means to display them is of little use, and a cause of frequent frustrations in trying use a standard. On the other hand, a large collection of glyphs or characters without a standard way to refer to them is not of much use either.

The W3C Math Working Group has therefore taken on directly specification of part of the full mechanism of proceeding from notation to final presentation, and is collaborating with organizations undertaking specification of the rest.

For instance, we try to use entity names that are contained in ISO TR 9573, which supersedes the ISO TR 8879 annex as far as math is concerned. There are considerations of mathematical usage that do on occasion militate against this, and the TR 9573 lists need supplementing. We hope to be able to agree with the TR 9573 WG on suitable extensions, in the course of the revision of their document that they are presently undertaking.

The STIX project of the STIPUB group of scientific and technical publishers has also been working toward a common collection of mathematical symbols and names. The W3C Math Working Group expects to issue further updates on the matter of character entities as a consequence of this project's useful work. For the latest character tables and fonts information, see the W3C Math Working Group home page.

6.1.2 The STIX Project

The STIX project team leader, Nico Poppelier, is a member of the W3C Math Working Group. The STIX project, set up by the STIPUB group of publishers, aims to formulate a collection of characters needed in the course of scientific and technical publishing. A database of characters in common use is being produced by collaborating publishing organizations. The team will propose to the Unicode consortium the additions to the next revision of the Unicode character set that this process shows are needed, together with the appropriate character codes. Finally the STIX project will commission the production of a complete set of fonts covering those Unicode characters for science and technology, to be made available to the public under license, but free of charge. The STIPUB group recognizes that easy availability of the characters and fonts greatly facilitates communication and publication.

6.2 Entity Listings

This chapter of the MathML proposal contains a listing of entities for use in MathML.

To provide more background on the characters used by mathematics we have used a larger comparative database showing codes and meanings in other common math environments. The W3C Math Working Group is very grateful to Elsevier Science and to Wolfram Research (makers of Mathematica ®) for making available to us so much useful data.

6.2.1 Non-Marking Entities

Some character entities, although important for the quality of print rendering do not directly have glyph marks that correspond. They are called here non-marking entities. Below we have a table of those adopted for the purposes of MathML. Their roles are discussed in Chapters 3 and 4, respectively on Presentation and Content Markup. The values of the spaces given are recommendations. Some of these characters do not have Unicode values. In that case the ASCII value is given in prefixed with an X if it exists, otherwise the column entry is --. The correspondence between the spacing values mentioned below and those in the Unicode descriptions are not exact, but are good matches.

Entity name	Unicode	Description
&Tab;	X09	tabulator stop
&NewLine;	X10	force a line break
&IndentingNewLine;	--	force a line break and indent appropriately on next line
&NoBreak;	--	never break line here
&GoodBreak;	--	if a linebreak is needed, here is a good spot
&BadBreak;	--	if a linebreak is needed, try to avoid breaking here
&Space;	0020	one em of space in the current font
&NonBreakingSpace;	00A0	space that is not a legal breakpoint
&ZeroWidthSpace;	200B	space of no width at all
&VeryThinSpace;		space of width 1/18 em
		space of width 3/18 em
	2005	space of width 4/18 em
		space of width 5/18 em
&NegativeVeryThinSpace;	--	space of width -1/18 em
&NegativeThinSpace;	--	space of width -3/18 em
&NegativeMediumSpace;	--	space of width -4/18 em
&NegativeThickSpace;	--	space of width -5/18 em
⁣	--	used as a separator, e.g., in indices (Section 3.2.4)
⁣	--	short form of ⁣
⁢	--	marks multiplication when it is understood without a mark (Section 3.2.4)
⁢	--	short form of ⁢
⁡	--	character showing function application in presentation tagging (Section 3.2.4)
⁡	--	short form of ⁡

6.2.2 Printing Entity Listings

Since the situation concerning availability of character codes from Unicode and under ISO 9573-13 is not yet fully clear at the time of writing, we have decided to proceed conservatively.

We have taken the ISO 9573-13 proposal, as conveyed to us from Anders Berglund, and have added a number of additional aliases based in the practice of the mathematical typesetting community. Thus the main influence outside ISO has been the names to be found in the TeX community.

To facilitate comprehension of a fairly large list of names, which totals over 2000 in this case, we offer the same information in more than one form.

We have entities listed by name and sample glyphs for all of them. Each entity name is accompanied by a code for a character grouping chosen from a list given below, a short verbal description, and a Unicode hex code if there is a corresponding sample glyph to be found in ISO 10646. Those codes beginning with the hex digit E, e.g., E321, indicate assignments to the private zone of Unicode. This indicates that the character in question is not at present an official Unicode character. It is highly recommended that authors use entity names instead of Unicode values, especially for those characters in the Unicode private zone, as those values may change. It is hoped that most of these characters will become officially endorsed by Unicode and ISO under its 10646 standard in due course. In any case we expect fonts for these characters to become publicly available as the use of MathML develops. If the entity name is an alias then a reference back to the ISO form is given if there is one, and to a preferred form if not. The ISO or preferred forms have references to their alternates where they exist.

6.2.3 Special Constants

To commence we list separately a few of the special characters which MathML has seen fit to be a little radical in introducing. There are two for special constants and one for calculus.

Entity name	Unicode	Description
&CapitalDifferentialD;		D for use in differentials, e.g., within integrals
&DD;		short form of &CapitalDifferentialD;
&DifferentialD;		d for use in differentials, e.g., within integrals
&dd;		short form of &DifferentialD;
&ExponentialE;		e for use for the exponential base of the natural logarithms
&ee;		short form of &ExponentialE;
&false;		logical constant false
&ImaginaryI;		i for use as a square root of -1
&ii;		short form of &ImaginaryI;
&NotANumber;		used in 4.3.2.9
&true;		logical constant true

6.2.4 Alphabetical Lists

The first table offered is a very large ASCII listing of printing entity names, ordered alphabetically, with upper-case preceding lower-case as in ASCII order. The Unicode numbers beginning with E are arbitrary assignments in the Private Area where there is presently no Unicode character available. When there is no Unicode offered at all it is because the characters listed can be thought of as font variations of common Roman alphabetic characters.

There is also an ASCII listing of printing entities ordered by Unicode number. Next we have collections of the entities in entity sets which are similar to the groupings in the corresponding ISO documents.

6.2.5 ISO Entity Set Groupings

In addition, we list the above material in the groupings used by ISO 9573-13 with an additional grouping of aliases introduced. This table makes explicit the entity groupings and provides links to ASCII listings of the groups and HTML tabular listings which display the glyphs, insofar as they are to be had, as well.

6.2.5.1 ISO Symbol Entity Sets

The symbols for mathematics that ISO have considered are organized, for both historical and mnemonic reasons into groupings with somewhat descriptive names. In the tables below we reproduce the newly proposed versions of these groups and give the corresponding Unicode sample glyphs. For each ISO 9573-13 group we give first an Extended version in ASCII listing which includes aliases, then a similar listing with sample glyphs, then the Basic ISO 9573-13 entity set and its version with included glyphs. The entries are organized alphabetically by entity name.

It should be noted that the sample glyphs given here are in GIF files intended for viewing on a monitor's screen at 72dpi. They are not suitable for printing, and in particular do not constitute a set of fonts covering the symbols of mathematics. In addition ,it is important to note that the Unicode numbers assigned in the private zone, beginning with hex digits E2 and above, are arbitrary and only used here to ensure that sample glyphs are available for display. They do not constitute suggested assignments of codes. Such a set of fonts is under development in more than one context. The MathML Working Group is engaged in ensuring that fonts will be readily publicly available.

This first block of entity sets includes mostly non-letter symbols, along with a few letters loaded with mathematical semantics. At the end of the block we have included the table MMALIAS of the aliases introduced by MathML, which mostly come from the TeX community, and MMEXTRA with the additional character entities added by MathML. Note that some of the blocks are place-holders for a possible future expansion of the tables.

Group	Descriptive Name
ISOAMSA	Added Math Symbols: Arrows	Extended	Glyphs \|	Basic	Glyphs
ISOAMSB	Added Math Symbols: Binary Operators	Extended	Glyphs \|	Basic	Glyphs
ISOAMSC	Added Math Symbols: Delimiters	Extended	Glyphs \|	Basic	Glyphs
ISOAMSN	Added Math Symbols: Negated Relations	Extended	Glyphs \|	Basic	Glyphs
ISOAMSO	Added Math Symbols: Ordinary	Extended	Glyphs \|	Basic	Glyphs
ISOAMSR	Added Math Symbols: Relations	Extended	Glyphs \|	Basic	Glyphs
ISOTECH	General Technical	Extended	Glyphs \|	Basic	Glyphs
ISOPUB	Publishing	Extended	Glyphs \|	Basic	Glyphs
ISODIA	Diacritical Marks	Extended	Glyphs \|	Basic	Glyphs
ISONUM	Numeric and Special Graphic	Extended	Glyphs \|	Basic	Glyphs
ISOBOX	Box and Line Drawing	Extended	Glyphs \|	Basic	Glyphs
MMALIAS	MathML Aliases	Basic	Glyphs
MMEXTRA	MathML Additions	Basic	Glyphs

6.2.5.2 ISO Math Font Entity Sets

Mathematical literature displays the common use of particular font styles. Characters representing given letters which differ only in the glyph presentation are in principle not different for the purposes of a character registry such as Unicode, which is not supposed to take into account mere font differences. However usage has meant that both ISO and Unicode, like mathematics, recognize them as different entities. Therefore we include lists for Greek, script, open face (also known as double struck or blackboard bold), and fraktur (also known as gothic or German) fonts.

Group	Descriptive Name
ISOGRK3	Greek Symbols	ASCII	Glyphs
ISOMSCR	Math Script Font	ASCII	Glyphs
ISOMOPF	Math Open Face Font	ASCII	Glyphs
ISOMFRK	Math Fraktur Font	ASCII	Glyphs

6.2.5.3 Other ISO Font Entity Sets

For reference we provide a list of the names of several other ISO font entity sets which are really normally used for text. ISOGRK4 is actually a collection of emboldened forms of the Greek letters.

Group	Descriptive Name
ISOGRK1	Greek Letters
ISOGRK2	Monotoniko Greek
ISOGRK4	Alternative Greek Symbols
ISOCYR1	Russian Cyrillic
ISOCYR2	Non-Russian Cyrillic

Next: The MathML Interface
Up: Table of Contents