It is hoped that the entity sets defined by this specification may form the basis of an update to
[ISO9573-13-1991], however pressure of other commitments
has currently prevented this document being processed by the relevant
ISO committee, thus the entity sets are being presented with Formal
Public identifiers of the form -//W3C//...
rather than
. It is hoped that an update to TR 9573-13 may be
made later. (The present version of TR 9573-13 defines the sets of
names, but does not give mappings to Unicode.)
Notation and symbols have proved very important for scientific documents, especially in mathematics. Mathematics has grown in part because its notation continually changes toward being succinct and suggestive. There have been many new signs developed for use in mathematical notation, and mathematicians have not held back from making use of many symbols originally introduced elsewhere. The result is that science in general, and particularly mathematics, makes use of a very large collection of symbols. It is difficult to write science fluently if these characters are not available for use. It is difficult to read science if corresponding glyphs are not available for presentation on specific display devices. In the majority of cases it is preferable to store characters directly as Unicode character data or as XML numeric character references. However, in some environments it is more convenient to use the ASCII input mechanism provided by XML entity references. Many entity names are in common use, and this specification aims to provide standard mappings to Unicode for each of these names. It introduces no names that have not already been used in earlier specifications. Specifically, the entity names in the sets starting with the letters "iso" were first standardized in SGML ([SGML]) and updated in [ISO9573-13-1991], the entity names in the sets with names starting "mml" were first standardized in MathML [MathML2] and those starting with "xhtml" were first standardized in HTML [HTML4].
This specification defines Unicode mappings of many sets of names that have been defined by earlier specifications.
We first present two tables listing the combined sets, firstly in Unicode order and then in alphabetic order.
Then there come tables documenting each of the entity sets. Each set has a link to the DTD entity declaration for the corresponding entity set, and also a link to an XSLT2 stylesheet that will implement a reverse mapping from characters to entity names (this is, of course, only possible for entity names that map to a single uniocde code point).
In addition to the stylesheets and entity files corresponding to each individual entity set, a combined stylesheet is provided, as well as two combined sets of DTD entity declarations. The first is a small file which includes all the other entity files via parameter entity references; the second is a larger file that directly contains a definition of each entity, with all duplicates removed.
Certain characters are of of particular relevance to scientific document production. The following tables display Unicode ranges containing the characters that are most used in mathematics.
Differences between the XHTML entity definitions described here and the entity set described in the XHTML 1.0 DTD.
U+27E8, XHTML 1.0 used U+2329 (which has canonical decomposition to U+3008)
U+27E9, XHTML 1.0 used U+232A (which has canonical decomposition to U+3009)
The differences between MathML 2 and the current entity definitions are listed below.
fj, ISOPUB (and MathML 1) defined an fj ligature Unicode does not have a specific character and the entity was dropped from MathML2, It is re-instated here for maximum compatibility with [SGML]
U+03C6 (decimal 966) GREEK SMALL LETTER PHI (the definition used in HTML4), MathML2 used U+03D5 (decimal 981) GREEK PHI SYMBOL.
It is very difficult for (X)HTML definitions to change since HTML is so widely deployed. Many of the assignments in the current definitions would be different if it were not for HTML compatibilty. However in this case, perhaps this change could be made in an XHTML2/HTML5 time frame. Currenly U+03D5 has the entity names: straightphi,phis. U+03C6 has the entity names phi, phgr, phiv,varphi.
It is also worth noting that Unicode has changed (swapped)
the default glyphs for U+03C6 and U+03D5 since the publication
of HTML4. The current recommendation is to use a cursive form
for U+03C6 (), and a form
with a straight vertical bar for U+03D5 (
). Some newer fonts
use glyphs that correspond to the change made by Unicode, while a number of
older fonts remain unchanged and hence will display the glyphs swapped
relative to the current version of Unicode. There is no way to guarantee
that the intended glyph is displayed without font-specific knowledge.
U+0237, MathML 2 used U+006A (j) as there was no dotless j before Unicode 4.1.
U+23E2 and U+23E7, MathML 2 used U+FFFD (REPLACEMENT CHARACTER) as these characters were added at Unicode 5.0 specifcally to support these entities.
The following bracket symbols have been added to the Mathematical symbols block in Unicode versions between 3.1 and 5.1. MathML2 used similar characters intended for CJK punctuation.
U+27E8, XHTML 1.0 used U+2329 (which has canonical decomposition to U+3008)
U+27EA, MathML2 used U+300A
U+2772, MathML2 used U+3014
U+27EC, MathML2 used U+3018
U+27E6, MathML2 used U+301A
U+27E9, XHTML 1.0 used U+232A (which has canonical decomposition to U+3009)
U+27EB, MathML2 used U+300B
U+2773, MathML2 used U+3015
U+27ED, MathML2 used U+3019
U+27E7, MathML2 used U+301B
U+23DE, MathML2 used U+FE37
U+23DC, MathML2 used U+FE35
U+23DF, MathML2 used U+FE38
U+23DD, MathML2 used U+FE36
U+27E6, MathML2 used U+301A
U+27E7, MathML2 used U+301B
MathML3 uses the entity sets defined by this specification, so there will be no differences between MathML and the entities defined here once MathML3 is finalized.
