Up
Previous
Next
Character definitions requiring special review
    Duplicate entities mapped to same code point
    Entity definitions starting with a combining character
    Possible new characters

6 Character definitions requiring special review

6.1 Duplicate entities mapped to same code point

This section details cases where two entities with different names have been mapped to the same code point (so become indistinguishable to most XML applications, even if SGML applications could differentiate the original SGML SDATA entities).

In many cases these unifications are acceptable or intentional, for example for reasons of convenience and compatibility the "mathematical greek" set ISOGRK3 is mapped to the standard Greek characters in the BMP (so clashing with the ISOGRK1 definitions) rather than the Math Italic alphabet in the 1Dxxx range. Similarly, the same character can have different logical uses in different scientific disciplines, but sometimes the duplication has occurred because it has not been possible to retain differences foreseen in ISO/IEC TR 9573 within the ISO/IEC 10646 character set. During the review of this draft the need for duplication of these entities will be reviewed and resolved.


U+002A: ast(9573-2003-isonum): midast(9573-2003-isoamsb)
U+00A8: Dot(9573-2003-isotech): die(9573-2003-isodia): uml(9573-2003-isodia)
U+00AF: macr(9573-2003-isodia): strns(9573-2003-isotech)
U+00BD: frac12(9573-2003-isonum): half(9573-2003-isonum)
U+0131: imath(9573-2003-isoamso): inodot(9573-2003-isolat2)
U+0393: Gamma(9573-2003-isogrk3): Ggr(9573-2003-isogrk1)
U+0394: Delta(9573-2003-isogrk3): Dgr(9573-2003-isogrk1)
U+0398: THgr(9573-2003-isogrk1): Theta(9573-2003-isogrk3)
U+039B: Lambda(9573-2003-isogrk3): Lgr(9573-2003-isogrk1)
U+039E: Xgr(9573-2003-isogrk1): Xi(9573-2003-isogrk3)
U+03A0: Pgr(9573-2003-isogrk1): Pi(9573-2003-isogrk3)
U+03A3: Sgr(9573-2003-isogrk1): Sigma(9573-2003-isogrk3)
U+03A6: PHgr(9573-2003-isogrk1): Phi(9573-2003-isogrk3)
U+03A8: PSgr(9573-2003-isogrk1): Psi(9573-2003-isogrk3)
U+03A9: OHgr(9573-2003-isogrk1): Omega(9573-2003-isogrk3)
U+03B1: agr(9573-2003-isogrk1): alpha(9573-2003-isogrk3)
U+03B2: beta(9573-2003-isogrk3): bgr(9573-2003-isogrk1)
U+03B3: gamma(9573-2003-isogrk3): ggr(9573-2003-isogrk1)
U+03B4: delta(9573-2003-isogrk3): dgr(9573-2003-isogrk1)
U+03B5: egr(9573-2003-isogrk1): epsiv(9573-2003-isogrk3)
U+03B6: zeta(9573-2003-isogrk3): zgr(9573-2003-isogrk1)
U+03B7: eegr(9573-2003-isogrk1): eta(9573-2003-isogrk3)
U+03B8: theta(9573-2003-isogrk3): thgr(9573-2003-isogrk1)
U+03B9: igr(9573-2003-isogrk1): iota(9573-2003-isogrk3)
U+03BA: kappa(9573-2003-isogrk3): kgr(9573-2003-isogrk1)
U+03BB: lambda(9573-2003-isogrk3): lgr(9573-2003-isogrk1)
U+03BC: mgr(9573-2003-isogrk1): mu(9573-2003-isogrk3)
U+03BD: ngr(9573-2003-isogrk1): nu(9573-2003-isogrk3)
U+03BE: xgr(9573-2003-isogrk1): xi(9573-2003-isogrk3)
U+03C0: pgr(9573-2003-isogrk1): pi(9573-2003-isogrk3)
U+03C1: rgr(9573-2003-isogrk1): rho(9573-2003-isogrk3)
U+03C2: sfgr(9573-2003-isogrk1): sigmav(9573-2003-isogrk3)
U+03C3: sgr(9573-2003-isogrk1): sigma(9573-2003-isogrk3)
U+03C4: tau(9573-2003-isogrk3): tgr(9573-2003-isogrk1)
U+03C5: ugr(9573-2003-isogrk1): upsi(9573-2003-isogrk3)
U+03C6: phgr(9573-2003-isogrk1): phiv(9573-2003-isogrk3)
U+03C7: chi(9573-2003-isogrk3): khgr(9573-2003-isogrk1)
U+03C8: psgr(9573-2003-isogrk1): psi(9573-2003-isogrk3)
U+03C9: ohgr(9573-2003-isogrk1): omega(9573-2003-isogrk3)
U+03DC: Gammad(9573-2003-isogrk3): b.Gammad(9573-2003-isogrk4)
U+03DD: gammad(9573-2003-isogrk3): b.gammad(9573-2003-isogrk4)
U+2010: hyphen(9573-2003-isonum): dash(9573-2003-isopub)
U+2019: rsquo(9573-2003-isonum): rsquor(9573-2003-isopub)
U+201D: rdquo(9573-2003-isonum): rdquor(9573-2003-isopub)
U+2026: hellip(9573-2003-isopub): mldr(9573-2003-isopub)
U+210B: hamilt(9573-2003-isotech): Hscr(9573-2003-isomscr)
U+210F: planck(9573-2003-isoamso): plankv(9573-2003-isoamso)
U+2111: image(9573-2003-isoamso): Ifr(9573-2003-isomfrk)
U+2112: Lscr(9573-2003-isomscr): lagran(9573-2003-isotech)
U+211C: real(9573-2003-isoamso): Rfr(9573-2003-isomfrk)
U+212C: bernou(9573-2003-isotech): Bscr(9573-2003-isomscr)
U+2133: phmmat(9573-2003-isotech): Mscr(9573-2003-isomscr)
U+2134: order(9573-2003-isotech): oscr(9573-2003-isomscr)
U+2190: larr(9573-2003-isonum): slarr(9573-2003-isoamsa)
U+2192: rarr(9573-2003-isonum): srarr(9573-2003-isoamsa)
U+21D4: hArr(9573-2003-isoamsa): iff(9573-2003-isotech)
U+2205: empty(9573-2003-isoamso): emptyv(9573-2003-isoamso)
U+2208: isin(9573-2003-isotech): isinv(9573-2003-isotech)
U+2209: notin(9573-2003-isotech): notinva(9573-2003-isotech)
U+220B: niv(9573-2003-isotech): ni(9573-2003-isotech)
U+220C: notni(9573-2003-isotech): notniva(9573-2003-isotech)
U+2216: setmn(9573-2003-isoamsb): ssetmn(9573-2003-isoamsb)
U+221D: prop(9573-2003-isotech): vprop(9573-2003-isoamsr)
U+2223: mid(9573-2003-isoamsr): smid(9573-2003-isoamsr)
U+2224: nmid(9573-2003-isoamsn): nsmid(9573-2003-isoamsn)
U+2225: par(9573-2003-isotech): spar(9573-2003-isoamsr)
U+2226: npar(9573-2003-isoamsn): nspar(9573-2003-isoamsn)
U+223C: sim(9573-2003-isotech): thksim(9573-2003-isoamsr)
U+223E: ac(9573-2003-isoamsb): mstpos(9573-2003-isoamsr)
U+2248: asymp(9573-2003-isoamsr): ap(9573-2003-isotech): thkap(9573-2003-isoamsr)
U+22A5: bottom(9573-2003-isotech): perp(9573-2003-isotech)
U+2322: frown(9573-2003-isoamsr): sfrown(9573-2003-isoamsr)
U+2323: smile(9573-2003-isoamsr): ssmile(9573-2003-isoamsr)
U+25A1: squ(9573-2003-isopub): square(9573-2003-isotech)
U+25AA: squf(9573-2003-isopub): squarf(9573-2003-isotech)
U+FFFD: elinters(9573-2003-isotech): trpezium(9573-2003-isoamso)

6.2 Entity definitions starting with a combining character

It is generally a bad idea to start an entity definition with a combining character as it makes normalisation dependent on the order of entity expansion, and in worst cases the combining character can combine with the markup, resulting in a normalised form that is no longer well formed XML.

In the current version of these entities, the entity definitions are defined to consist of a single space character (U+0020) followed by the combining character

U+20DB: tdot [COMBINING THREE DOTS ABOVE]
U+20DC: DotDot [COMBINING FOUR DOTS ABOVE]

6.3 Possible new characters

There are several new characters planned for Unicode and ISO/IEC 10646 that may affect these definitions. See http://www.unicode.org/alloc/Pipeline.html.

In particular, jmath (ISOAMSO) which is currently defined to be "j" could more usefully be defined to be the proposed character "LATIN SMALL LETTER DOTLESS J" at code point U+0237 and perp (ISOTECH) could be defined to use "PERPENDICULAR" at code point U+27C2 (and so allow it to be distinguished from bottom (also in ISOTECH).