6 Characters, Entities and Fonts

Overview: Mathematical Markup Language (MathML) Version 2.0
Previous: 5 Combining Presentation and Content Markup
Next: 7 The MathML Interface
 
6 Characters, Entities and Fonts
6.1 Introduction
   6.1.1 The Intent of Character Names
   6.1.2 The STIX Project
   6.1.3 Character Listings
   6.1.4 Non-Marking Characters
   6.1.5 Printing Character Symbol Listings
   6.1.6 Special Constants
   6.1.7 Alphabetical Lists
   6.1.8 ISO Character Set Groupings

6.1 Introduction

6.1.1 The Intent of Character Names

Notation and symbols have proved very important for mathematics. Mathematics has grown in part because of the succinctness and suggestiveness of its evolving notation. There have been many new signs evolved for use in mathematical notation, and mathematicians have not held back from making use of many symbols originally developed elsewhere. The result is that mathematics makes use of a very large collection of symbols. It is difficult to write mathematics fluently if these characters are not available for use in coding. It is difficult to read mathematics if corresponding glyphs are not available for presentation on specific display devices.

This situation posed a problem for the first W3C Math Working Group when it was brought into existence. It did not fall naturally within the purview of a developing a specification enabling mathematics to be used with HTML and producing a DTD for this to worry about more than the entities allowed in the DTD. However, as experience has shown, a long list of entities with no means to display them is of little use, and a cause of frequent frustrations in trying use a standard. On the other hand, a large collection of glyphs and fonts of characters without a standard way to refer to them is not of much use either.

The W3C Math Working Group has therefore took on directly the task of specification of part of the full mechanism of needed to proceed from notation to final presentation, and started collaboration with organizations undertaking specification of the rest.

For instance, in MathML 1 we tried to use entity names for the many character signs that are contained in ISO TR 9573, which supersedes the ISO TR 8879 annex as far as mathematics is concerned. There are considerations of mathematical usage that do on occasion militate against this, and the TR 9573 lists need supplementing. There was the hope of agreeing with the TR 9573 WG on suitable extensions, in the course of the revision of their document that they were undertaking. That has not actually happened, and the expected TR 9573 revision has not appeared either.

The STIX project of the STIPUB group of scientific and technical publishers has also been working since 1997 toward a common collection of mathematical symbols and names. The W3C Math Working Group itself has collaborated with that project and expects to have to issue further updates on the matter of character entities as a consequence of useful work of this project and others. For the latest character tables and fonts information, see the W3C Math Working Group home page.

6.1.2 The STIX Project

The first STIX project team leader, Nico Poppelier, is a member of the W3C Math Working Group. The STIX project, set up by the STIPUB group of publishers includes the American Chemical Society (ACS), the American Institute of Physics (APS), the American Mathematical Society (AMS), the AMerican Physical Society (APS), Elsevier Science Publishers, the Institute of Electrical and Electronic Engineers (IEEE). An initial aim was to formulate a collection of characters needed in the course of scientific and technical publishing. A database of characters in common use has been produced by collaborating publishing organizations, including information from the TEX world, Springer Verlag (Heidelberg), Design Science Inc., Wolfram Research Inc., the Association for Computing Machinery (ACM) in addition to the above-mentioned. The coordination and the major portion of the work on this have been carried out by Barbara Beeton of the AMS.

The STIX team has proposed to the Unicode Technical Committee (UTC) of the Unicode consortium the additions to the next revision of the Unicode character set that this process shows are needed, together with the appropriate character codes. This has been the subject of on-going negotiation for some time. In March 2000 a honed proposal supported by the UTC went on the the ISO WG2 meeting in Beijing which deals with incorporation of new material into the standard ISO 10646. The final results of that deliberation, which it is hoped will confirm assignement of code-points put forward by the UTC will be incorporated into the information made public by the Math WG.

Finally, the STIX project's intention has always been to commission the production of a complete set of fonts covering those Unicode characters for science and technology, to be made available to the public under license, but free of charge. The STIPUB group recognizes that easy availability of the characters and fonts greatly facilitates communication and publication. At the start of the year 2000 the process of commisioning the making of fonts is underway, and their wide-spread availability is hoped with one or two years.

6.1.3 Character Listings

This chapter of the MathML Specification contains a listing of character names for use in MathML.

To provide more background on the characters used by mathematics we have used a large comparative database showing codes and meanings in other common math environments. The W3C Math Working Group is very grateful to Elsevier Science, to Wolfram Research (makers of Mathematica ®) and to Design Science (Makers of MathType ®) for making available to us so much useful data.

In MathML 1 the characters of the mathematical sciences were listed as entities. This is coherent with thinking in terms of SGML markup and the use of DTDs. For the XML world with its use of documents well-formedness is to be sufficient for the examination of a particular one, which does not require validation against a DTD, where character entities would be found declared. The next development that is expected to replace the DTD as a specifier of a class of documents is that of Schemas. The specification for Schemas is presently under active development at the W3C. Though the final form of Schemas is not yet clear, it is known that their use precludes effective use of large lists of entities. For that reason MathML 2 passes from the use of entities to name mathematical characters, which becomes a deprecated usage, to the use of mchar elements. For this reason the tables below just list the suggested character names, which should be used in the form <mchar name="character_name" />.

6.1.4 Non-Marking Characters

Some characters although important for the quality of print rendering do not directly have glyph marks that correspond. They are called here non-marking characters. Below we have a table of those adopted for the purposes of MathML. Their roles are discussed in Chapter 3 [Presentation Markup] and Chapter 4 [Content Markup], respectively. The values of the spaces given are recommendations. Some of these characters do not have official Unicode values, and some are given as combinations of Unicode characters employing the special mathematics modifier character (U02063). The correspondence between the spacing values mentioned below and those in the Unicode descriptions are not exact, but are good matches.

It used to be in MathML 1.0 that there were a number more non-marking character entities listed here. These were conerned with composition control, such as line-breaking, In MathML 2 such control is effected by the use of the proper attributes on the mspace element.

Character name Unicode Description
&Tab; 00009 tabulator stop; horizontal tabulation
&NewLine; 0000A force a line break; line feed
&Space; 00020 one em of space in the current font
&NonBreakingSpace; 000A0 space that is not a legal breakpoint
&ZeroWidthSpace; 0200B space of no width at all
&VeryThinSpace; 0200A space of width 1/18 em
&ThinSpace; 02009 space of width 3/18 em
&MediumSpace; 02005 space of width 4/18 em
&ThickSpace; 02005-0200A space of width 5/18 em
&NegativeVeryThinSpace; 0200A-02063 space of width -1/18 em
&NegativeThinSpace; 02009-02063 space of width -3/18 em
&NegativeMediumSpace; 0205F-02063 space of width -4/18 em
&NegativeThickSpace; 02005-02063 space of width -5/18 em
&InvisibleTimes; 02062 marks multiplication when it is understood without a mark (Section 3.2.4 [Operator, Fence, Separator or Accent (mo)]
&ApplyFunction; 02061 character showing function application in presentation tagging (Section 3.2.4 [Operator, Fence, Separator or Accent (mo)]

6.1.5 Printing Character Symbol Listings

Even though the situation concerning availability of character codes from Unicode and under ISO 10646 is not yet fully clear at the time of writing, we have decided to proceed on the assumption that the code points suggested to ISO WG2 by the UTC will be confirmed. As before we can only reiterate that for current developments on details of character standards as far as they influence mathematical formalism the Home Page of the W3C Math WG should be consulted.

The Math WG started from the ISO 9573-13 proposal, as conveyed to us from Anders Berglund, and added a number of informative additional aliases based in the practice of the mathematical typesetting community. The main influence outside ISO has been the names to be found in the TEX community because they inform the practice of the contributors to the STIX character database mentioned above.

To facilitate comprehension of a fairly large list of names, which totals over 2000 in this case, we offer the same information in more than one form.

We have characters listed by name and sample glyphs for all of them. Each character name is accompanied by a code for a character grouping chosen from a list given below, a short verbal description, and a Unicode hex code if there is a corresponding sample glyph to be found in ISO 10646, now extended in accordance with the proposal forwarded by the UTC to ISO WG2 in March 2000. We have excluded, with very few exceptions that seemed to us compelling, other characters that may have appeared in the corresponding lists in MathML 1. Those characters thus lost will be found to be used very infrequently in the experience of mathematical publishers, or simply to be completley unacceptable for inclusion in Unicode. However MathML 2 does provide the mglyph and csymbol elements to accommodate new characters that authors may wish to introduce.

The character listings by alphabetical and Unicode order in Section 6.1.7 [Alphabetical Lists] have now been brought more into line with the corresponding ISO character sets than was the case in MathML 1.0, in that if some part of a set is included then the entire set is included. In addition, the group ISOCHEM has been dropped as more properly the concern of chemists. These changes have also been reflected in the entity declarations in the DTD in Appendix A [Parsing MathML].

6.1.6 Special Constants

To commence we list separately a few of the special characters which MathML has seen fit to be a little radical in introducing. These have been accorded new Unicode values. There used also the be entries below for &true;, &false; and &NotANumber;, but these do not yet have Unicode points assigned to them so have been removed. They can be reintroduced by the character extension mechanisms provided by the mchar and csymbol elements.

Entity name Unicode Description
&CapitalDifferentialD; 02145 D for use in differentials, e.g. within integrals
&DifferentialD; 02146 d for use in differentials, e.g. within integrals
&ExponentialE; 02147 e for use for the exponential base of the natural logarithms
&ImaginaryI; 02148F i for use as a square root of -1

6.1.7 Alphabetical Lists

The first table offered is a very large ASCII listing of printing entity names, ordered alphabetically, with upper-case preceding lower-case as in ASCII order. There is also an ASCII listing of printing characters ordered by Unicode number. The Unicode point points are those of the current proposal which will, it is expected eventaully be part of the next revision, Unicode 4. Unicode 3 has just been published in February 2000. Next we have collections of the entities in entity sets which correspond to the groupings in the corresponding ISO documents.

6.1.8 ISO Character Set Groupings

In addition, we list the above material in the groupings used by ISO 9573-13 introduced. This table makes explicit the entity groupings and provides links to ASCII listings of the groups and HTML tabular listings which display the glyphs, as well.

6.1.8.1 ISO Symbol Sets

The symbols for mathematics that ISO have considered are organized, for both historical and mnemonic reasons into groupings with somewhat descriptive names. In the tables below we reproduce the newly proposed versions of these groups and give the corresponding Unicode sample glyphs. The entries are organized alphabetically by character name.

It should be noted that the sample glyphs given here are in GIF files intended for viewing on a monitor's screen at 72dpi. They are not suitable for printing, and in particular do not constitute a set of fonts covering the symbols of mathematics. Such a set of fonts is under development in more than one context. The MathML Working Group is engaged in the effort of ensuring that such fonts will be readily publicly available.

This first block of sets includes mostly non-letter symbols, along with a few letters loaded with mathematical semantics.

Group Descriptive Name
ISOAMSA Added Math Symbols: Arrows
ISOAMSB Added Math Symbols: Binary Operators
ISOAMSC Added Math Symbols: Delimiters
ISOAMSN Added Math Symbols: Negated Relations
ISOAMSO Added Math Symbols: Ordinary
ISOAMSR Added Math Symbols: Relations
ISOTECH General Technical
ISOPUB Publishing
ISODIA Diacritical Marks
ISONUM Numeric and Special Graphic
ISOBOX Box and Line Drawing

6.1.8.2 ISO Character Sets for Mathematics Alphabets

Mathematical literature displays the common use of particular font styles. Characters representing given letters which differ only in the glyph presentation are in principle not different for the purposes of a character registry such as Unicode, which is not supposed to take into account mere font differences. However usage has meant that both ISO and Unicode, like mathematics, recognize them as different entities. Therefore we wish to include lists for Greek, script, open face (also known as double struck or blackboard bold), and fraktur (also known as gothic or German) fonts. The UTC has accepted a proposal for the inclusion of alphabetic character runs in Unicode Plane 1 for the express use of mathematics, brought to them by Murray Sargent of Microsoft and supported by the STIX Project as a compromise solution. However the tenets of the UTC preclude the duplication, if at all possible, of methods for encoding a character which conventionally has esentially one glyphic representation. Thus there are holes at certain points in the alphabetic runs for mathematical use in Plane 1 coding. These holes will, however, be reserved and not used for anything else, and so can be used, internally, in the obvious way by an application handling mathematics.

Group Descriptive Name
ISOGRK3 Greek Symbols
ISOMSCR Math Alphabet Script
ISOMOPF Math Alphabet Open Face
ISOMFRK Math Alphabet Fraktur

Overview: Mathematical Markup Language (MathML) Version 2.0
Previous: 5 Combining Presentation and Content Markup
Next: 7 The MathML Interface