<document>
<head>
<title>Information technology - SGML support facilities - Techniques
for using SGML</title>
<subtitle>Part 13: Public entity sets for mathematics and sciences</subtitle>
<!--<date>2003-07-07</date>-->
<!--<date>2003-11-26</date>-->
<date>2003-12-08</date>
<author>Martin Bryan</author>
<author>David Carlisle</author>

</head>

<section>
<head>Scope</head>
<p>Tens of thousands of graphic characters are used in publishing text, a large
proportion of which have been defined in ISO/IEC 10646. Even where standard
coded representations exist, however, there may be situations in which they
cannot be keyboarded conveniently or accurately, or in which it is not possible
to display the desired visual depiction of the characters.</p>
<p>To help overcome these barriers to the successful interchange of SGML and
related documents, this part of ISO/IEC TR 9573 defines character entity sets for
some widely used special graphic characters regularly used in the production of
scientific and mathematical documents.</p>
<note>Entity repertoires are necessarily larger and more repetitious
than character sets, as they deal in general with higher-level constructs. For
example, unique entities have been defined for each accented Latin alphabetic
character, while a character set might represent such characters as combinations
of letters and diacritical mark characters.</note>
<p>In many instances upper- and lower-case is used to differentiate the names of
entities. It is assumed that any SGML concrete syntax used in conjunction with
these entity names will be case sensitive.</p>
<note>The reference concrete syntax defined in ISO/IEC 8879 (SGML) is case
sensitive.</note>
</section>

<section>
<head>Normative references</head>
<p>The following standards contain provisions which, through reference in this
text, constitute provisions of this part of ISO/IEC TR 9573. At the time of
publication, the editions indicated were valid. All standards are subject to
revision, and parties to agreements based on this part of ISO/IEC TR 9573 are
encouraged to investigate the possibility of applying changes made in more
recent editions of referenced standards.</p>
<p>ISO/IEC 8879:1986</p>
<p>ISO/IEC 9541-1:1991</p>
<p>ISO/IEC 10646-1:2000/Amd 1:2002</p>
<p>ISO/IEC 10646-2:2001</p>
</section>
<section>
<head>Definitions</head>
<p>For the purposes of this part of ISO/IEC TR 9573 the definitions
given in ISO/IEC 8879 apply.</p>
</section>

<section>
<head>General considerations</head>
<p>This edition of the standard has been aligned with the Unicode 3.2 updates to
ISO/IEC 10646:2000, as covered by Amendment 1 to the standard. For the purposes
of backwards compatibility the names assigned to the characters in the original
edition of the standard are shown before those assigned to the character in ISO/IEC
10646. References to characters in this part should, however, refer to the ISO/IEC
10646 name rather than the name originally assigned by ISO/IEC TR 9573.</p>
<section>
<head>Format of Descriptions</head>
<p>To follow</p>
</section>
<section>
<head>Corresponding Display Entity Sets</head>
<p>Each character has a characteristic visual description known as a
&quot;glyph&quot;. Systems using these entity sets need to be able to convert
each entity reference to an appropriate glyph. Where character sets based
on ISO/IEC 10646 are available this is typically done by conversion to an
entity reference of the form <code>&amp;xnnnnn;</code> where <code>nnnnn</code>
is the five digit hexadecimal code listed in the column headed Unicode/10646,
where the first character indicates the plane of ISO/IEC 10646 to which the
character has been assigned. The entity name and descriptive comment are added
to the definition, giving it the form:</p>
<pre>&lt;!ENTITY frac78  &quot;&amp;#x0215E&quot;&gt;&lt;!-- VULGAR FRACTION SEVEN EIGHTHS--&gt;</pre>
</section>
</section>

<section break="yes">
<head>Comparision with other sets of entity definitions</head>
<section break="yes">
<head>Differences between MathML and Stix Data</head>

<p>The Stix consortium maintains a table of information about
mathematical characters, including mappings to ISO/IEC entity sets.
The following is an annotated list of cases where entity definitions
appear to be different in these two collection.</p>

<p>During the review of this draft document these alignment issues will be
reviewed and resolved.</p>

<pre>
<stixdiff/>
</pre>

</section>

<section break="yes">
<head>Differences between MathML and DocBook Data</head>

<p>OASIS distribute a set of entity declarations for use with the
DocBook markup language.
<uri>http://www.oasis-open.org/docbook/specs/wd-docbook-xmlcharent-0.3.html</uri></p>
<p>The following table lists the current differences between this set
and the definitions described in this document.</p>

<p>During the review of this draft document these alignment issues will be
reviewed and resolved.</p>

<pre>
<docbookdiff/>
</pre>

</section>

<section break="yes">
<head>Differences between MathML and XHTML 1.1 Data</head>

<p>W3C  distribute a set of entity declarations for use with the
(X)HTML markup language.
<uri>http://www.w3.org/TR/2001/REC-xhtml-modularization-20010410/dtd_module_defs.html#a_xhtml_character_entities</uri></p>
<p>The following table lists the current differences between this set
and the definitions described in this document.</p>

<p>During the review of this draft document these alignment issues will be
reviewed and resolved.</p>

<pre>
<xhtmldiff/>
</pre>

</section>

</section>


<section break="yes">
<head>Character definitions requiring special review</head>

<section break="yes">
<head>Duplicate entities mapped to same code point</head>

<p>This section details cases where two entities with different names
have been mapped to the same code point (so become indistinguishable
to most XML applications, even if SGML applications could
differentiate the original SGML SDATA entities).</p>

<p>In many cases these unifications are acceptable or intentional, for
example for reasons of convenience and compatibility the <quote>mathematical
greek</quote> set ISOGRK3 is mapped to the standard Greek characters in the
BMP (so clashing with the ISOGRK1 definitions) rather than the Math
Italic alphabet in the 1Dxxx range. Similarly, the same character can
have different logical uses in different scientific disciplines,
but sometimes the duplication has occurred because it has not
been possible to retain differences foreseen in ISO/IEC TR 9573 within the
ISO/IEC 10646 character set. During the review of this draft the need for
duplication of these entities will be reviewed and resolved.</p>

<pre>
<duplicates/>
</pre>

</section>

<section break="yes">
<head>Entity definitions starting with a combining character</head>

<p>It is generally a bad idea to start an entity definition with a
combining character as it makes normalisation dependent on the order
of entity expansion, and in worst cases the combining character can
combine with the markup, resulting in a normalised form that is no
longer well formed XML.</p>

<!--<p>If non-combining diacritics for these characters are not added to
the ISO/IEC 10646 character set, these entities could be
redefined to be a space
or zero width space followed by the combining diacritic, or
simply leaving things as they are.</p>-->

<p>In the current version of these entities, the entity definitions
are defined to consist of a single space character (U+0020) followed
by the combining character</p>
<pre>
<combining/>
</pre>


</section>
<section break="yes">
<head>Possible new characters</head>
<p>
There are several new characters planned for Unicode and ISO/IEC 10646
that may affect these definitions. See
<uri>http://www.unicode.org/alloc/Pipeline.html</uri>.</p>

<p>In particular, jmath (ISOAMSO) which is currently defined to be <quote>j</quote>
could more usefully be defined to be the proposed character
<quote>LATIN SMALL LETTER DOTLESS J</quote> at code point U+0237 and
perp (ISOTECH) could be defined to use <quote>PERPENDICULAR</quote> at
code point U+27C2
(and so allow it to be distinguished from bottom (also in ISOTECH).</p>
</section>
</section>

<section break="yes">
<head>Character listings</head>
<p>Each character set is shown as four column table.</p>

<ul>
<li>
<p>The first column gives the entity name, these names are as used
in previous versions of this report, and use the following
abbreviation scheme:</p>
<ul>
<li>
<p>Prefixes<br/>
l = left; r = right; u = up; d = down; h = horizontal; v= vertical;<br/>
b = back, reversed;<br/>
cu = curly;<br/>
cw = clock-wise; aw = anti clock-wise;<br/>
g = greater than; l = less than;<br/>
n = negated;<br/>
o = in circle;<br/>
s = small, short: <br/>
sq = square shaped;<br/>
thk = thick.;<br/>
x = extended, long, big;</p>
</li>
<li>
<p>Bodies<br/>
ap = approx;<br/>
arr = arrow; har = harpoon;<br/>
pr = precedes; sc = succeeds;<br/>
sub = subset; sup = superset;</p>
</li>
<li>
<p>Suffixes<br/>
b = boxed;<br/>
f = filled, black, solid;<br/>
e = single equals; E = double equals;<br/>
hk = hook;<br/>
s = slant;<br/>
t = tail;<br/>
v = variant;<br/>
w = wavy, squiggly;<br/>
2 = two of;</p>
</li>
<li>
<p>Upper-case letter means <quote>doubled</quote> (or sometimes
<quote>two of</quote>).</p>
</li>
</ul>
</li>
<li><p>The second column gives the code points of the corresponding
character as 5 digit hexadecimal numbers, separated by
<quote>-</quote>.</p></li>

<li><p>The third column gives a sample glyph representation of the
character.</p></li>

<li><p>The fourth column gives the name of the character in two forms,
Firstly the entity description as used in previous editions of this
report. secondly (in uppercase) The name of the character as given in
Unicode and ISO/IEC 10646. In the case of combinations with combining
characters or variant selectors, the name of the base character is
given in uppercase, followed by an indication (in lower case) of the
variant form.</p></li>
</ul>

<charactertables/>
</section>
</document>
