Up: Table of Contents Working Draft 6-Jan-98

6. Entities, Characters and Fonts

6.1 Introduction

6.1.1 The Intent of Entity Names

Notation has proved very important for mathematics. Mathematics has grown in part because of the succinctness and suggestiveness of its evolving notation. There have been many new signs evolved for use in mathematical notation, and mathematicians have not held back from making use of many symbols originally developed elsewhere. The result is that mathematics makes use of a very large collection of symbols. It is difficult to write mathematics fluently if these characters are not available for use in coding. It is difficult to read mathematics if glyphs are not available for presentation on specific display devices.

This situation poses a problem for the HTML-Math Working Group. It does not fall naturally within the purview of a math for HTML specification and DTD production to worry about more than the entities allowed in the DTD. Moreover, as experience has shown, a long list of entities with no means to display them is of little use, and a cause of frequent frustrations in trying use a standard. On the other hand, a large collection of glyphs or characters without a standard way to refer to them is not of much use either.

The HTML-Math Working Group has therefore taken on directly specification of part of the full mechanism of proceeding from notation to final presentation, and is collaborating with organizations undertaking specification of the rest.

For instance, we try to use entity names that are contained in ISO TR 9573, which supersedes the ISO TR 8879 annex as far as math is concerned. There are considerations of mathematical usage that do on occasion militate against this, and the TR 9573 lists need supplementing. We hope to be able to agree with the TR 9573 WG on suitable extensions, in the course of the revision of their document that they are presently undertaking.

The STIX project of the STIPUB group of scientific and technical publishers has also been working toward a common collection of mathematical symbols and names. The HTML-Math Working Group expects to issue further updates on the matter of character entities as a consequence of this project's useful work.

6.1.2 The STIX Project

The STIX project team leader, Nico Poppelier, is a member of the HTML-Math Working Group. The STIX project, set up by the STIPUB group of publishers, aims to formulate a collection of characters needed in the course of scientific and technical publishing. A database of characters in common use is being produced by collaborating publishing organizations. The team will propose to the Unicode consortium the additions to the next revision of the Unicode character set that this process shows are needed, together with the appropriate character codes. Finally the STIX project will commission the production of a complete set of fonts covering those Unicode characters for science and technology, to be made available to the public under license, but free of charge. The STIPUB group recognizes that easy availability of the characters and fonts greatly facilitates communication and publication.

6.2 Entity Listings

This chapter of the MathML proposal contains a listing of entities for use in MathML.

To provide more background on the characters used by mathematics we have used a larger comparative database showing codes and meanings in other common math environments. The HTML-Math Working Group is very grateful to Elsevier Science and to Wolfram Research (makers of Mathematica ®) for making available to us so much useful data.

6.2.1 Non-Marking Entities

Some character entities, although important for the quality of print rendering do not directly have glyph marks that correspond. They are called here non-marking entities. Below we have a table of those adopted for the purposes of MathML. Their roles are discussed in Chapters 3 and 4, respectively on Presentation and Content Markup. The values of the spaces given are recommendations.

Entity name Unicode Description
  tabulator stop
  force a line break
&IndentingNewLine;   force a line break and indent appropriately on next line
  never break line here
&GoodBreak;   if a linebreak is needed, here is a good spot
&BadBreak;   if a linebreak is needed, try to avoid breaking here
&Space;   one em of space in the current font
    space that is not a legal breakpoint
  space of no width at all
  space of width 1/18 em
  space of width 3/18 em
  space of width 4/18 em
     space of width 5/18 em
  space of width -1/18 em
  space of width -3/18 em
  space of width -4/18 em
  space of width -5/18 em
  used as a separator, e.g., in indices (Section 3.2.4)
  short form of ⁣
  marks multiplication when it is understood without a mark (Section 3.2.4)
  short form of ⁢
  character showing function application in presentation tagging (Section 3.2.4)
  short form of ⁡

6.2.2 Printing Entity Listings

Since the situation concerning availability of character codes from Unicode and under ISO 9573-13 is not yet fully clear at the time of writing, we have decided to proceed conservatively.

We have taken the ISO 9573-13 proposal, as conveyed to us from Anders Berglund, and have added a number of additional aliases based in the practice of the mathematical typesetting community. Thus the main influence outside ISO has been the names to be found in the TeX community.

To facilitate comprehension of a fairly large list of names, which should eventually total over 1300 in this case, we offer the same information in more than one form.

Ideally we should have entities listed by name and sample glyphs for all of them. That is not possible with publicly available glyphs at present. Each entity name is accompanied by a code for a character grouping chosen from a list given below, a short verbal description, and a Unicode hex code if there is a corresponding sample glyph to be found in ISO 10646. If the entity name is an alias then a reference back to the ISO form is given if there is one, and to a preferred form if not. The ISO or preferred forms have references to their alternates where they exist.

6.2.3 Special Constants

To commence we list separately a few of the special characters which MathML has seen fit to be a little radical in introducing. They are two for special constants and one for calculus.

Entity name Unicode Description
  d for use in differentials, e.g., within integrals
  short form of ⅆ
  e for use for the exponential base of the natural logarithms
  short form of ⅇ
  i for use as a square root of -1
  short form of ⅈ

6.2.4 Full Alphabetical Lists

The first table offered is a complete ASCII listing of all printing entity names, ordered alphabetically, with upper-case preceding lower-case as in ASCII order. The Unicode numbers beginning with E are arbitrary assignments in the Private Area where there is presently no Unicode character available. When there is no Unicode offered at all it is because the characters listed can be thought of as font variations of common Roman alphabetic characters.

There is also an ASCII listing of all entities ordered by Unicode number. Next we have collections of the entities in entity sets which are similar to the groupings in the corresponding ISO documents.

6.2.5 ISO Entity Set Groupings

In addition, we list the above material in the groupings used by ISO 9573-13 with an additional grouping of aliases introduced. This table makes explicit the entity groupings and provides links to ASCII listings of the groups and full HTML tabular listings which display the glyphs, insofar as they are to be had, as well.

6.2.5.1 ISO Symbol Entity Sets

The symbols for mathematics that ISO have considered are organized, for both historical and mnemonic reasons into groupings with somewhat descriptive names. In the tables below we reproduce the newly proposed versions of these groups and give the corresponding Unicode sample glyphs. For each ISO 9573-13 group we give first an Extended version in ASCII listing which includes aliases, then a similar listing with sample glyphs, then the Basic ISO 9573-13 entity set and its version with included glyphs. The entries are organized alphabetically by entity name.

It should be noted that the sample glyphs given here are in GIF files intended for viewing on a monitor's screen at 72dpi. They are not suitable for printing, and in particular do not constitute a set of fonts covering the symbols of mathematics. Such a set of fonts is under development in more than one context. The MathML Working Group is engaged in ensuring that fonts will be readily publicly available.

This first block of entity sets includes mostly non-letter symbols, along with a few letters loaded with mathematical semantics. At the end of the block we have included the table MMALIAS of the aliases introduced by MathML, which mostly come from the TeX community, and MMEXTRA with the additional character entities added by MathML.

Group Descriptive Name
ISOAMSA Added Math Symbols: Arrows Extended Glyphs | Basic Glyphs
ISOAMSB Added Math Symbols: Binary Operators Extended Glyphs | Basic Glyphs
ISOAMSC Added Math Symbols: Delimiters Extended Glyphs | Basic Glyphs
ISOAMSN Added Math Symbols: Negated Relations Extended Glyphs | Basic Glyphs
ISAMSO Added Math Symbols: Ordinary Extended Glyphs | Basic Glyphs
ISOAMSR Added Math Symbols: Relations Extended Glyphs | Basic Glyphs
ISOTECH General Technical Extended Glyphs | Basic Glyphs
ISOPUB Publishing Extended Glyphs | Basic Glyphs
ISODIA Diacritical Marks Extended Glyphs | Basic Glyphs
ISONUM Numeric and Special Graphic Extended Glyphs | Basic Glyphs
ISOBOX Box and Line Drawing Extended Glyphs | Basic Glyphs
MMALIAS MathML Aliases Basic Glyphs
MMEXTRA MathML Additions Basic Glyphs

6.2.5.2 ISO Math Font Entity Sets

Mathematical literature displays the common use of particular font styles. Characters representing given letters which differ only in the glyph presentation are in principle not different for the purposes of a character registry such as Unicode, which is not supposed to take into account mere font differences. However usage has meant that both ISO and Unicode, like mathematics, recognize them as different entities. Therefore we include lists for Greek, script, open face (also known as double struck or blackboard bold), and fraktur (also known as gothic or German) fonts.

Group Descriptive Name
ISOGRK3 Greek Symbols ASCII Glyphs
ISOMSCR Math Script Font ASCII Glyphs
ISOMOPF Math Open Face Font ASCII Glyphs
ISOMFRK Math Fraktur Font ASCII Glyphs

6.2.5.3 Other ISO Font Entity Sets

For reference we provide a list of the names of several other ISO font entity sets which are really normally used for text. ISOGRK4 is actually a collection of emboldened forms of the Greek letters.

Group Descriptive Name
ISOGRK1 Greek Letters
ISOGRK2 Monotoniko Greek
ISOGRK4 Alternative Greek Symbols
ISOCYR1 Russian Cyrillic
ISOCYR2 Non-Russian Cyrillic

Next: Implementing MathML
Up: Table of Contents