This document describes and links to an updated set of entity definitions, and corresponding HTML tables and descriptions. It may be considerd as an update to Chapter 6 of the MathML2 Recommendation.
The primary reason for this update is the release of Unicode 3.2 (and to a lesser extent, Unicode 4.0) which has finally standardised a large repertoire of mathematical characters as part of the Unicode Character set.
The original SGML entity sets were defined via SDATA entities. These use a feature of SGML that allowed character entity names to be defined without mapping the character to any particular encoding, the processing of the entities was specified in a system-specific manner for any system processing the SGML. XML does not support SDATA entities, so for XML it is necessary to map the entity names to unicode characters (or other XML constructs).
Until recently Unicode has not had sufficient characters to support mathematics, and previous releases of the MathML DTD have used Private Use Area characters as the definition of many of the ISO character entities. However private use characters are, as the name suggests, intended for private use. With the release of Unicode 3.2 finally Unicode does have a large repertoire of mathematical symbols, and so in this release of the DTD we have removed all Private Use characters from the normative MathML 2 DTD (mathml2.dtd). This has caused some changes since the MathML Recommendation, but the fact that these changes were likely to occur once Unicode standardised was explictly mentioned in the MathML Recommendation. It is a very small percentage of characters that are affected, MathML2 was based on a draft of the Unicode 3.1 and 3.2 proposals, and most of the character assignments in the final Unicode standards agreed with those drafts.
Unfortunately, despite adding many new characters for Mathematics in Unicode 3.2, ISO/Unicode chose not to specify many characters that are needed to fully support the ISO entity sets for mathematics from either the original SGML standard (ISO 8879) or the extended set in ISO 9573. This causes some problems for MathML, as well as any application that traditionally used the ISO entity sets (such as DocBook or TEI). As mentioned above, earlier releases of MathML have used Private Use Area characters for characters without official unicode encodings, but in this release we have chosen to stay strictly within the defined Unicode character set. The effect of this is that some characters that were previously considered distinct are now mapped to the same unicode character. Both entity names are retained for compatibility, but are considered aliases, a rendering agent may use either glyph form for either of the entity names. an example of a character that has been "unified" in this way is dotless j, &jnodot; in ISOAMSO. This is not supported by Unicode, this entity name is now defined to expand to a standard "j", the rendering agent may (and should) omit the dot if a j is used in certain constructs involving mathematical operators in the accent position. Also the MathML 2 entity set contains some entities that corresponded to negative space characters. These negative spaces were not accepted into Unicode and so the corresponding entities (​, etc ) have kept for compatibility, but are all defined to produce a zero width space. Their use is now deprecated and should be replaced by uses of <mspace> with a suiotable width attribute.
A principal source of entity names for characters in MathML was the ISO set, however MathML is also intended to be used in XHTML documents, and so as far as possible the entity names have been chosen to be compatible with (X)HTML.
The entity name circ is used in both HTML and the ISODIA entity set. Thee mapping in HTML appears to be in error, whereas acute is mapped to 00b4 (ACUTE ACCENT) circ is not mapped to 005e (CIRCUMFLEX ACCENT) but rather to 02c6 (MODIFIER LETTER CIRCUMFLEX ACCENT).
In these files, circ is mapped to MODIFIER LETTER CIRCUMFLEX ACCENT, and so the entity declarations are compatible with (X)HTML. If the ISODIA entity set is being used in a context where HTML compatibility is not important you may wish to define circ before loading isodia.ent, in which case this definition will take precedence:
<!ENTITY circ "^" > <!ENTITY % isodir-ent SYSTEM "iso8879/isodia.ent"> %isodir-ent;
The entity name asymp is used in both HTML and the ISOAMSR entity set.
The HTML maping is to a "double tilde" 2248 (ALMOST EQUAL TO)
.
The ISO mapping is normally to a "cupcap" symbol 224d (EQUIVALENT TO)
.
Interestingly neither of these uses 2243 ASYMPTOTICALLY EQUAL TO
.
In these files, asymp is mapped to ALMOST EQUAL TO as in HTML. A new entity asympeq is defined in (in the mmlextra entity set) with definition EQUIVALENT TO.
If the ISOAMSR entity set is being used in a context where HTML compatibility is not important you may wish to define asymp before loading isoamsr.ent, in which case this definition will take precedence:
<!ENTITY asymp "≍" > <!ENTITY % isoamsr-ent SYSTEM "iso8879/isoamsr.ent"> %isoamsr-ent;
Unicode 3 has two lowercase phi characters.
03C6 GREEK SMALL LETTER PHI is an "open" curly phi.
03D5 GREEK PHI SYMBOL is a "straight" phi which is more often use in
mathematics.
Previous versions of Unicode also had phi characters in these slots but the roles and sample glyphs were reversed. (The name of the character 03D5 in earlier releases of Unicode was GREEK SMALL LETTER SCRIPT PHI).
In these files:
the textual greek set (ISOGRK1) uses
GREEK SMALL LETTER PHI (entity phgr)
the mathematical greek set (ISOGRK3) uses
GREEK PHI SYMBOL for the (entity "phi" in ISO 9573-13, and "phis" in
ISO 8879)
The "html symbol" entity set defined here defined "phi" to be
GREEK PHI SYMBOL, compatible with ISOGRK3. The official HTML4
and XHTML entity set defines phi to be GREEK SMALL LETTER PHI
(HTML predates the release of Unicode 3 when these definitions
changed).
XHTML also has an entity "phi" which is defined to be character 3C6
If use these entity set is being used in a context where full HTML compatibility is important you may wish to define phi before loading isogrk3.ent, in which case this definition will take precedence:
<!ENTITY phi "φ" > <!ENTITY % isogrk3 SYSTEM "iso9573-13/isogrk3.ent"> %isogrk3-ent;
Unicode 3.2 introduced a range of mathematical alphabets into "Plane 1" of Unicode, ie above position 2^16, xFFFF.
Unfortunately some existing systems can not yet handle these characters, in particular the XML parser built into the original release of Internet Explorer 6 (although stand-alone versions of MSXML3 and MSXML 4 parser, and the versions of IE6 that include the IE6 SP1 update available from Microsoft accept these characters) and also the SGML parsing sp library (including the original nsgmls parser) note however that onsgmls from the Open sp project does handle the full Unicode range.
To allow the use of the DTD with these systems the character references to plane 1 characters have been parameterised. If the parameter entity is redefined (from the default definition of "&#38;#x1D") then references to these characters will stay within the basic plane of Unicode.
For example
<!DOCTYPE math SYSTEM "mathml2.dtd" [ <!ENTITY % plane1D "U+1D"> ]> <html xmlns="http://www.w3.org/1999/xhtml"> ... ... 𝔄 ...
Here the reference 𝔄 would now produce the text "U + 1 D 5 0 4" which is the Unicode plain text notation for this character. This will not work to display this character in a MathML Renderer, but it is sufficient to allow these systems to be used for validation and other purposes.
For example
<!DOCTYPE math SYSTEM "mathml2.dtd" [ <!ENTITY % plane1D "&#38;#xE"> ]> <html xmlns="http://www.w3.org/1999/xhtml"> ... ... 𝔄 ...
Here the reference 𝔄 would now produce the Character Reference  This is a "Private Use Area" character. The entire 1Dxxx block of math alphabets will similarly be translated down to Exxx. As described above the use of private use characters is not ideal but the current generation of browsers (Internet Explorer 6, Netscape 7, Mozilla 1, for example) can not correctly deal with characters outside the basic plane, so this translation is a necessary temporary measure. Earlier releases of the combined XHTML-MathML DTD distributed from the MathML DTD directory used this technique, however current releases do not redefine this entity, and so use teh standard character definitions. It is in fact produced by running James Clark's spam program on the following document and extracting the expanded DTD from the resulting output.:
<!DOCTYPE html SYSTEM "xhtml-math11.dtd"> <html> <head><title/></head> <body><p/></body> </html>
The xhtml-math11.dtd file is simply:
<!ENTITY % driver SYSTEM "xhtml-math-svg.dtd">
<--<!ENTITY % plane1D "&#38;#38;#xE">-->
<!ENTITY % SVG.module "IGNORE" >
<!ENTITY % MATHML.pref.prefixed "INCLUDE" >
<!ENTITY % MATHML.sysid.base "" >
%driver;
<!ATTLIST %a.qname; target CDATA #IMPLIED>
<!ATTLIST %html.qname;
%Schema.xmlns.attrib;
%att-schemalocation;>
MathML includes all of the entity sets listed under ISO 9573-13 except for ISOGRK4
MathML includes the entity sets isobox, isocyr1, isocyr2, isodia, isolat1, isolat2, isonum, isopub listed under ISO 8879
The first table lists a large collection of characters (including at least all the characters with MathML entity definitions) ordered by unicode number. The second table lists all the MathML entity names, in alphabetic order.
The tables in this section each display a block of 256 Uniocde character positions. For reference the Unicode names for blocks of characters are also given together with links to pdf files available from the unicode site showing these characters.
All the files linked from this page are
generated by XSL from the master XML file
unicode.xml
charlist DTD file referenced by unicode.xml
Two XSL files are used (using an extension element, for saxon, to write multiple output files in each case).