<?xml version="1.0" encoding="utf-8"?>
<!-- $Id: character-set.xml,v 1.27 2008/07/17 13:30:17 dcarlis Exp $ -->
<!DOCTYPE spec [<!ENTITY date "20080721">]>
<spec w3c-doctype="wd">
<header>
<title>XML Entity definitions for Characters</title>
<w3c-designation>xml-entity-names-&date;</w3c-designation>
<w3c-doctype>W3C Working Draft</w3c-doctype>
<pubdate><day>21</day> <month>July</month> <year>2008</year></pubdate>
<publoc>
<loc href="http://www.w3.org/TR/2007/WD-xml-entity-names-&date;/">http://www.w3.org/TR/2007/WD-xml-entity-names-&date;/</loc>
</publoc>
<latestloc>
<loc href="http://www.w3.org/TR/xml-entity-names/">http://www.w3.org/TR/xml-entity-names/</loc>
</latestloc>
<prevlocs>
  <loc href="http://www.w3.org/TR/2007/WD-xml-entity-names-20071214/">http://www.w3.org/TR/2007/WD-xml-entity-names-20071214/</loc>
</prevlocs>
<authlist>
<author>
<name>David Carlisle</name>
<affiliation>NAG</affiliation>
</author>
<author>
<name>Patrick Ion</name>
<affiliation>Mathematical Reviews, American Mathematical Society</affiliation>
</author>
</authlist>
<errataloc href="http://www.w3.org/2003/entities/2007doc/errata.html"/>
<status id="status">


<p><emph> This section describes the status of this document at the time
of its publication. Other documents may supersede this document. A
list of current W3C publications and the latest revision of this
technical report can be found in the <loc
href="http://www.w3.org/TR/">W3C technical reports index</loc> at
http://www.w3.org/TR/.</emph></p> 

<p>
This document is a W3C Public Working Draft  produced by the <loc
href="http://www.w3.org/Math/">W3C Math Working Group</loc> as 
part of the W3C <loc
href="http://www.w3.org/Math/Activity">Math
Activity</loc>.
</p>
<p>Publication as a Working Draft does not imply endorsement by the W3C Membership. 
This is a draft document and may be updated, replaced or obsoleted by other 
documents at any time. It is inappropriate to cite this document as other 
than work in progress.</p>

<p>Public discussion of this document is encouraged on
 <loc
href="mailto:www-math@w3c.org">www-math@w3c.org</loc>, the public mailing list of the Math Working
Group (<loc
href="http://lists.w3.org/Archives/Public/www-math/">list archives</loc>).
To subscribe send an email to <loc
href="mailto:www-math-request@w3.org">www-math-request@w3.org</loc>
with the word <code>subscribe</code> in the subject line.
</p>

<p>
Please report errors in this document to <loc
href="mailto:>www-math@w3c.org">www-math@w3.org</loc>.
</p>

<p> This document was produced by a group operating under 
the <loc href="http://www.w3.org/Consortium/Patent-Policy-20040205/">5 
February 2004 W3C Patent Policy</loc>. W3C maintains a 
<loc role="disclosure" href="http://www.w3.org/2004/01/pp-impl/35549/status">
public list of any patent disclosures</loc> made in connection with 
the deliverables of the group; that page also includes instructions 
for disclosing a patent. An individual who has actual knowledge of 
a patent which the individual believes contains 
<loc href="http://www.w3.org/Consortium/Patent-Policy-20040205/#def-essential">Essential Claim(s)</loc> 
must disclose the information in accordance with 
<loc href="http://www.w3.org/Consortium/Patent-Policy-20040205/#sec-Disclosure">section 6 
of the W3C Patent Policy</loc>. </p>

<p>It is hoped that the entity sets defined by this specification may form the basis of an update to 
<bibref ref="ISO9573-13-1991"/>, however pressure of other commitments
has currently prevented this document being processed by the relevant
ISO committee, thus the entity sets are being presented with Formal
Public identifiers of the form <code>-//W3C//...</code> rather than
<code>ISO...</code>. It is hoped that an update to TR 9573-13  may be
made later. (The present version of TR 9573-13 defines the sets of
names, but does not give mappings to Unicode.)</p> 
</status>
<abstract id="abstract">
<p>This document defines several sets of names which are assigned to Unicode characters.
Each of these sets is also implemented as a file of XML entity declarations.</p>
</abstract>
<langusage>
<language id="en">English</language>
</langusage>

<revisiondesc>
<p>First draft, derived from the MathML2 sources.</p>
<p>Seconnd draft, comments from Karl Tomlinson, Ian Hickson and others.</p>
</revisiondesc>

</header>
<body>

<div1 id="chars_intro"><head>Introduction</head>


<p>
Notation and symbols have proved very important for scientific
documents, especially in mathematics. Mathematics has grown in part
because <phrase>its notation continually changes toward being succinct
and suggestive</phrase>. There have been many new signs
<phrase>developed</phrase> for use in mathematical notation, and
mathematicians have not held back from making use of many symbols
originally <phrase>introduced</phrase> elsewhere. The result is that
science in general, and particularly mathematics, makes use of a very large collection of symbols.  It is
difficult to write science fluently if these characters are not
available for use. It is difficult to read science if
corresponding glyphs are not available for presentation on specific
display devices. In the majority of cases it is preferable to store
characters directly as Unicode character data or as XML numeric
character references.  However, in some environments it is more
convenient to use the ASCII input mechanism provided by XML entity
references. Many entity names are in common use, and this 
specification aims to provide standard mappings to Unicode for each of
these names. It introduces no names that have not already been used in
earlier specifications. Specifically, the entity names in the sets
starting with the letters "iso" were first standardized in SGML (<bibref
ref="SGML"/>) and updated in <bibref ref="ISO9573-13-1991"/>, the
entity names in the sets with names starting  "mml" were first
standardized in MathML <bibref ref="MathML2"/> and those starting
with "xhtml" were first standardized in HTML <bibref ref="HTML4"/>.</p>
</div1>

<div1 id="sets">
<head>Sets of names</head>
<p>This specification defines Unicode mappings of many sets of names
that have been defined by earlier specifications.</p>
<p>We first present two tables listing the combined sets, firstly in
<loc href="bycodes.html">Unicode order</loc> and then in <loc
href="byalpha.html">alphabetic order</loc>.</p>
<p>Then there come tables
documenting each of the entity sets. Each set has a link to the DTD
entity declaration for the corresponding entity set, and also a link
to an XSLT2 stylesheet that will implement a reverse mapping from
characters to entity names (this is, of course, only possible for  entity names
that map to a single uniocde code point).</p><p>In addition to the
stylesheets and entity files corresponding to each individual entity
set, a <loc
href="http://www.w3.org/2003/entities/2007/entitynamesmap.xsl">combined
stylesheet is provided</loc>, as well as two combined sets of DTD
entity declarations. The first is a <loc
href="http://www.w3.org/2003/entities/2007/w3centities.ent">small file
which includes all the other entity files via parameter entity
references</loc>; the second is a <loc
href="http://www.w3.org/2003/entities/2007/w3centities-f.ent">larger
file that directly contains a definition of each entity, with all
duplicates removed</loc>.</p>
</div1>

<div1 id="blocks">
<head>Unicode Character Blocks for Scientific Documents</head>
<p>Certain characters are of of particular relevance to scientific document production. The following tables display 
Unicode ranges containing the characters that are most used in mathematics.</p>
</div1>



</body>
<back>
  <div1>
    <head>Changes</head>
    <div2>
      <head>Changes since 2007-12-14</head>
      <p>The following entity definitions have changed at this draft</p>
      <p>phi, lang, rang, OverParenthesis, UnderParenthesis, OverBrace, UnderBrace,
      lbbrk, rbbrk</p>
    </div2>
  </div1>
  <div1 id="diffs">
    <head>Differences between these entities and earlier W3C DTDs</head>
    <div2 id="diff-xhtml1">
      <head>Differences from XHTML 1.0</head>
      <p>Differences between the XHTML entity definitions described here and the entity set described in the <loc href="http://www.w3.org/TR/xhtml1/dtds.html">XHTML 1.0 DTD</loc>.</p>
      <glist>
	<gitem><label>lang</label><def><p>U+27E8, XHTML 1.0 used U+2329 (which has canonical decomposition to U+3008)</p></def></gitem>
	<gitem><label>rang</label><def><p>U+27E9, XHTML 1.0 used U+232A (which has canonical decomposition to U+3009)</p></def></gitem>
      </glist>
	
    </div2>
    <div2 id="diff-mathml2">
      <head>Differences from MathML 2.0 (second edition)</head>
      <p>The differences between MathML 2 and the current entity
      definitions are listed below.</p>
      <glist>
	<gitem><label>fjlig</label><def><p>fj, ISOPUB (and MathML 1) defined an fj ligature
   Unicode does not have a specific character and the entity was dropped from MathML2,
   It is re-instated here for maximum compatibility with <bibref ref="SGML"/></p></def></gitem>
	<gitem><label>phi</label><def><p>U+03C6 (decimal 966) GREEK SMALL LETTER PHI (the definition used in HTML4), 
	MathML2 used  U+03D5 (decimal 981) GREEK PHI SYMBOL. </p>
      <note><p>It is very difficult for (X)HTML
      definitions to change since HTML is so widely deployed. Many of
      the assignments in the current definitions would be different if
      it were not for HTML compatibilty.  However in this case,
      perhaps this change could be made in an XHTML2/HTML5 time frame.
      Currenly U+03D5 has the entity names:
      straightphi,phis. U+03C6 has the entity names phi, phgr, phiv,varphi.</p>
      <p>It is also worth noting that Unicode has changed (swapped)
      the default glyphs for U+03C6 and U+03D5 since the publication
      of HTML4. The current recommendation is to use a cursive form
      for U+03C6 (<graphic role="glyph" source="U003C6"/>), and a form
      with a straight vertical bar for  U+03D5 (<graphic role="glyph"
      source="U003D5"/>). Some newer fonts  
      use glyphs that correspond to the change made by Unicode, while a number of
      older fonts remain unchanged and hence will display the glyphs swapped
      relative to the current version of Unicode.  There is no way to guarantee
	that the intended glyph is displayed without font-specific knowledge.</p></note></def></gitem>
	<gitem><label>jmath</label><def><p>U+0237, MathML 2 used U+006A (j) as
	there was no dotless j before Unicode 4.1.</p></def></gitem>
	<gitem><label>trpezium, elinters</label><def><p>U+23E2 and U+23E7,
	MathML 2 used U+FFFD (REPLACEMENT CHARACTER) as these characters were added at Unicode 5.0
	specifcally to support these entities. </p></def></gitem>
      </glist>
      <p>The following bracket  symbols have been added to the Mathematical
      symbols block in Unicode versions between 3.1 and 5.1. MathML2 used
      similar characters intended for CJK punctuation.</p>
      <glist>
	<gitem><label>lang, langle, LeftAngleBracket</label><def><p>U+27E8, XHTML 1.0 used U+2329 (which has canonical decomposition to U+3008)</p></def></gitem>
	<gitem><label>Lang</label><def><p>U+27EA, MathML2 used U+300A</p></def></gitem>
	<gitem><label>lbbrk</label><def><p>U+2772, MathML2 used U+3014</p></def></gitem>
	<gitem><label>loang</label><def><p>U+27EC, MathML2 used U+3018</p></def></gitem>
	<gitem><label>lobrk</label><def><p>U+27E6, MathML2 used U+301A</p></def></gitem>
	<gitem><label>rang, rangle,RightAngleBracket</label><def><p>U+27E9, XHTML 1.0 used U+232A (which has canonical decomposition to U+3009)</p></def></gitem>
	<gitem><label>Rang</label><def><p>U+27EB, MathML2 used U+300B</p></def></gitem>
	<gitem><label>rbbrk</label><def><p>U+2773, MathML2 used U+3015</p></def></gitem>
	<gitem><label>roang</label><def><p>U+27ED, MathML2 used U+3019</p></def></gitem>
	<gitem><label>robrk</label><def><p>U+27E7, MathML2 used U+301B</p></def></gitem>
	<gitem><label>OverBrace</label><def><p>U+23DE, MathML2 used U+FE37</p></def></gitem>
	<gitem><label>OverParenthesis</label><def><p>U+23DC, MathML2 used U+FE35</p></def></gitem>
	<gitem><label>UnderBrace</label><def><p>U+23DF, MathML2 used U+FE38</p></def></gitem>
	<gitem><label>UnderParenthesis</label><def><p>U+23DD, MathML2 used U+FE36</p></def></gitem>
	<gitem><label>LeftDoubleBracket</label><def><p>U+27E6, MathML2 used U+301A</p></def></gitem>
	<gitem><label>RightDoubleBracket</label><def><p>U+27E7, MathML2 used U+301B</p></def></gitem>
      </glist>
      <note><p>MathML3 uses the entity sets defined by this specification, so there will be no differences
      between MathML and the entities defined here once MathML3 is finalized.</p></note>
    </div2>

  </div1>
  <div1 id="source">
    <head>Source Files</head>
    <p>All data files used to construct the entity declarations, XSLT character maps, and HTML tables referenced from this document are available from <loc href="http://www.w3.org/2003/entities/2007xml/">http://www.w3.org/2003/entities/2007xml/</loc>.</p>
<p>

</p>
   <ulist>
     <item><p><loc
		  href="http://www.w3.org/2003/entities/2007xml/unicode.xml">unicode.xml</loc> master file detailing all unicode characters with names in various entity sets and applications, TeX equivalents and other data. This file has been maintained for many years, originally by Sebastian Rahtz as part of the jadetex distribution and since around 1999 as part of the MathML specification sources by David Carlisle. The current version encodes data for all characters in Unicode 5.1.
<emph>Note: unicode.xml is over 5MB in size and may not really be suitable for direct viewing in a browser, you may prefer to save the file rather than follow the above link to unicode.xml in a browser.</emph></p></item>
     <item><p><loc href="http://www.w3.org/2003/entities/2007xml/charlist.rnc">charlist.rnc</loc> relax NG schema for unicode.xml.</p></item>
     <item><p><loc href="http://www.w3.org/2003/entities/2007xml/unicode.xsl">unicode.xsl</loc> XSLT stylesheet that renders unicode.xml as an HTML table.</p></item>
     <item><p><loc href="http://www.w3.org/2003/entities/2007xml/character-set.xml">character-set.xml</loc> the source file for this document.</p></item>
     <item><p><loc href="http://www.w3.org/2003/entities/2007xml/xmlspec.xsl">xmlspec.xsl</loc> a copy of the  standard xmlspec stylesheet</p></item>
     <item><p><loc href="http://www.w3.org/2003/entities/2007xml/run">run</loc> small script file that builds this collection</p></item>
     <item><p><loc href="http://www.w3.org/2003/entities/2007xml/xhtml1.xml">xhtml1.xml</loc> record of XHTML 1.0 entity definitions</p></item>
     <item><p><loc href="http://www.w3.org/2003/entities/2007xml/mml2.xml">mml2.xml</loc> record of MathML 2.0 (second edition) entity definitions</p></item>
     <item><p><loc href="http://www.w3.org/2003/entities/2007xml/unicodedata.xsl">unicodedata.xsl</loc> stylesheet that generates a new copy of unicode.xml, incorporating data from the unicode data file, used to updated unicode.xml as new versions of Unicode are released.</p></item>
     <item><p><loc href="http://www.w3.org/2003/entities/2007xml/entities.xsl">entities.xsl</loc> stylesheet to generate the DTD declarations for the entities.</p></item>
     <item><p><loc href="http://www.w3.org/2003/entities/2007xml/charmap.xsl">charmap.xsl</loc> stylesheet to generate the XSLT character maps.</p></item>
     <item><p><loc href="http://www.w3.org/2003/entities/2007xml/characters.xsl">characters.xsl</loc> stylesheet to generate this document, including the referenced HTML tables.</p></item>
     <item><p><loc href="http://www.w3.org/2003/entities/2007xml/schemas.xml">schemas.xml</loc> file associating XML documents with appropriate Relax NG schema</p></item>
   </ulist>
     
  </div1>
  <div1 id="references"><head>References</head>
  <blist>
     <bibl id="SGML">ISO/IEC 8879:1986, Information processing &#x2014;  Text and office systems &#x2014;  Standard Generalized Markup Language (SGML)</bibl>
     <bibl id="ISO9573-13-1991">ISO/IEC TR :1991, Information
technology &#x2014; SGML support facilities
Techniques for using
SGML &#x2014; Part 13: Public entity sets for
mathematics and science</bibl>
    <bibl id="Unicode">The Unicode Consortium;
    <emph>The Unicode Standard, Version 5.0</emph>, 
    Addison-Wesley Professional; 5th edition (November 3, 2006).
    ISBN 0321480910.
    (<loc
    href="http://www.unicode.org/versions/Unicode5.0.0/">http://www.unicode.org/versions/Unicode5.0.0/</loc>) 
    </bibl>
    
    <bibl id="Unicode25">Barbara Beeton, Asmus Freytag, Murray Sargent III,
    <emph><loc
	      href="http://www.unicode.org/unicode/reports/tr25/">Unicode Support for Mathematics</loc></emph>, 
    Unicode Technical Report #25 2007-05-07.
    (<loc
    href="http://www.unicode.org/unicode/reports/tr25/">http://www.unicode.org/unicode/reports/tr25/</loc>) 
    </bibl>
    <bibl id="MathML2">David Carlisle, Patrick Ion, Robert Miner, Nico Poppelier,
    <emph><loc href="http://www.w3.org/TR/MathML2/">Mathematical Markup Language (MathML) Version 2.0 (Second Edition)</loc></emph>
    W3C Recommendation 21 October 2003
    (<loc href="http://www.w3.org/TR/2003/REC-MathML2-20031021/">http://www.w3.org/TR/2003/REC-MathML2-20031021/</loc>) 
    </bibl>
    <bibl id="HTML4">Dave Raggett, Arnaud Le Hors, Ian Jacobs,
 <emph><loc href="http://www.w3.org/TR/html4/"/>HTML 4.01 Specification</emph>
W3C Recommendation 24 December 1999
(<loc href="http://www.w3.org/TR/1999/REC-html401-19991224/">http://www.w3.org/TR/1999/REC-html401-19991224</loc>)</bibl>
  </blist>
  </div1>
</back>
</spec>
