W3CUser Interface Domain

XML Entity Definitions for Characters Last Call Dispositions

Public Version   [Member-Confidential Version]

6 February 2010, Version 1.2

Editors:
David Carlisle, NAG, Editor
Patrick Ion, MR/AMS, Math WG Co-chair

Table of contents

7 comments of which 7 fully resolved, 0 resolved but not yet formally accepted and 0 unresolved.

XML Entity Definitions for Characters Last Call Comments and Responses

The XML Entity Definitions for Characters draft has received several comments. We itemize the comments received and their responses.


Entities
 
Summary: bfsfit glyphs
Submitted: Will Robertson, http://lists.w3.org/Archives/Public/www-math/2009Nov/0056.html
Response: David Carlisle, http://lists.w3.org/Archives/Public/www-math/2009Nov/0057.html
Will Robertson, http://lists.w3.org/Archives/Public/www-math/2009Nov/0060.html
Discussion:
The glyphs shown as examples for sans serif bold italic look as if  
they're from the regular weight:
   http://www.w3.org/TR/2009/WD-xml-entity-names-20091117/sans-serif-bold-italic.html 
   
Not that there exists a regular weight for Greek sans :)

-- Will

It's 10 years ago and I'm not sure I have the sources, but I did
tweak the metafont parameters taking a creative merge of the bold,
slanted  and sans serif parameters  to come up with a bold sans serif
slanted. Perhaps I wasn't creative enough, I agree it could be
more bold.

I recently discarded my script and bold script glyphs (which were
identical) and replaced them with glyphs derived from the stix beta.
perhaps I should do the same here?

David

It wasn't the absolute weight that I was worried about, just that it  
looked like the same glyphs were being used for normal weight slanted  
sans serif as well.

The results would certainly look better; not the end of the world  
either way, of course :)

-- Will

Resolution: David Carlisle, http://lists.w3.org/Archives/Public/www-math/2009Nov/0070.html
Actually it was only 9 years, October 2000 seems to have been the last
time I touched those sources...

They didn't only look similar: cmp confirmed the png files were identical.
So despite there being a start of some MF sources for this, it appears
that I just used the normal weight ones.

Rather than switch to STIX beta here I decided to stick with the
Computer Modern heritage of (almost) all the rest and have tweaked the
metafont a bit more resulting in an updated set of bold slanted sans
glyphs as can be seen in the editors' draft:

http://www.w3.org/2003/entities/2007doc/sans-serif-bold-italic.html

Thanks for reporting this,

David

 
Summary: missing entities in w3centities-f.ent
Submitted: John Cowan, http://lists.w3.org/Archives/Public/www-math/2009Nov/0061.html
Response: David Carlisle, http://lists.w3.org/Archives/Public/www-math/2009Nov/0062.html
Discussion:
The definitions of the lowercase equivalents of the HTML5-UPPERCASE
set (namely amp, copy, gt, lt, quot, reg, and trade) do not exist in
w3centities-f.ent, even though they still appear in the xhtml1-lat1.ent
and xhtml1-special.ent files.

Please fix.

Resolution:
Yes, sorry, I just noticed that last night. I messed up the arguments to
sort. I wanted to change the sort order to bring AMP and amp together but
I actually coalesced them. 

Corrected file has been committed.

 
Summary: U02220-020D2 , five hexadecimal digits,...
Submitted: Matthias Mittelstein, http://lists.w3.org/Archives/Public/www-math/2009Nov/0058.html
Response: David Carlisle, http://lists.w3.org/Archives/Public/www-math/2009Nov/0059.html
Discussion:
(1)
http://www.w3.org/TR/2009/WD-xml-entity-names-20091117/glyphs/022/U02220-020D2.png seems 
to point to an existing file. Nevertheless it shows a placeholder only.

(2)
I prefer hexadecimal Unicode code point numbers to have four or six digits. May be that 
is old-fashioned and byte-oriented. But five digit numbers hurt my eyes especially in 
columns with the title "BMP".

(3)
http://www.w3.org/TR/2009/WD-xml-entity-names-20091117/U0FE00.html , 
http://www.w3.org/TR/2009/WD-xml-entity-names-20091117/U020D2.html , 
http://www.w3.org/TR/2009/WD-xml-entity-names-20091117/U020D2.html show a variable 
number of Entity Names. What is the reason to show duplicates like 
" oplus, oplus, CirclePlus ".

Resolution:
> http://www.w3.org/TR/2009/WD-xml-entity-names-20091117/glyphs/022/U02220-020D2.png
> seems to point to an existing file. Nevertheless it shows a placeholder only. 

Ah thanks for that, The png works in FireFox but not in IE. That's
happened before occasionally, previously if I use imagemagic convert to
convert the png (to anything and back again) it will warn of some
internal inconsistency and fix it up....
$ convert U02220-020D2.png x.gif
convert: Incorrect tRNS chunk length `U02220-020D2.png'.

David Carlisle@dcarlisle /home/w3c/WWW/2003/entities/2007doc/glyphs/022
$ convert x.gif U02220-020D2.png 

David Carlisle@dcarlisle /home/w3c/WWW/2003/entities/2007doc/glyphs/022
$ cvs commit -m "bad chunk length" U02220-020D2.png
Checking in U02220-020D2.png;
/w3ccvs/WWW/2003/entities/2007doc/glyphs/022/U02220-020D2.png,v  <--  U02220-020D2.png
new revision: 1.2; previous revision: 1.1
done
yes seems to work now, try the editor's draft at


http://www.w3.org/2003/entities/2007doc/U020D2.html


thanks.


> I prefer hexadecimal Unicode code point numbers to have four or six
> digits. May be that is old-fashioned and byte-oriented. But five digit
> numbers hurt my eyes especially in columns with the title "BMP". 

original versions of these tables (in mathml, going back a decade or so)
used the internal U01234 form pretty much everywhere: this form has
advantages in the internal build as it's a valid XML ID (unlike U+ form
which can't be used as an XML ID value, and consistently using 5 digits
allows things to be sorted naively (until someone pushes some
interesting characters in the 6 digit range;-) however in the visible
text of the specification we've almost completely switched to using the
Unicode U+1234 form, just using the original form for internal
identifiers, and png file names, so I suppose it makes sense to catch
the remaing cases as well.  All the tables are generated so changing
notation isn't a big deal just a matter of dropping in a suitable
regular expression replace. I'll see what I can do.


> What is the reason to show duplicates like " oplus, oplus, CirclePlus
> ".

well they are dupicated because (in the case of oplus) the name is both
in xhtml-symbol and in isoamsb, but since I don't show the set name
there the duplication is not very helpful, .....
I just checked in the stylesheet with distinct-values() xpath functin inserted

and the editors' draft now just shows these just once:

http://www.w3.org/2003/entities/2007doc/U020D2.html

Thanks for your comments.

David
and from the commenter on a private list
From: "Mittelstein, Matthias" <matthias.mittelstein@sap.com>
To: David Carlisle <davidc@nag.co.uk>
Importance: low
Date: Mon, 23 Nov 2009 09:27:09 +0100
Subject: RE: [Entities-last-call] U02220-020D2 , five hexadecimal digits,...
Thread-Topic: [Entities-last-call] U02220-020D2 , five hexadecimal digits,...
Thread-Index: AcprbkGaEdUJvF/OQAKstASdcIlQEgAp6X8g
In-Reply-To: <200911221220.nAMCKohw025358@edinburgh.nag.co.uk>
Accept-Language: en-US, de-DE
Content-Language: en-US
acceptlanguage: en-US, de-DE

Hello David,

I withdraw my proposal not to use five digits for plane 1 and 2.

Even if I prefered hexadecimal Unicode code point numbers to have four or six digits, I 
will try to use five digits, where suitable.

It is hard enough to explain Unicode to everybody. Using a consistent notation makes 
it a bit easier. Somehow I did not remembered that sentence. But it was also in older 
Unicode books. 

Matthias Mittelstein

Development Architect
TD Core AS&VM ABAP Infratructure
TD Core ABAP Server and VM Technology
BST Technology Develoment Core 
BST Technology Develoment 
Business Solutions & Technology
SAP AG
Großer Grasbrook 17
20457 Hamburg
T +49 40 22707 131
T +49 6227 7 61164
F +49 6227 78 00295
mailto:matthias.mittelstein@sap.com
www.sap.com 
Pflichtangaben/Mandatory Disclosure Statements: http://www.sap.com/company/legal/impressum.epx 
Diese E-Mail kann Betriebs- oder Geschäftsgeheimnisse oder sonstige vertrauliche 
Informationen enthalten. Sollten Sie diese E-Mail irrtümlich erhalten haben, 
ist Ihnen eine Kenntnisnahme des Inhalts, eine Vervielfältigung oder Weitergabe 
der E-Mail ausdrücklich untersagt. Bitte benachrichtigen Sie uns und vernichten 
Sie die empfangene E-Mail. Vielen Dank.
This e-mail may contain trade secrets or privileged, undisclosed, or otherwise 
confidential information. If you have received this e-mail in error, you are 
hereby notified that any review, copying, or distribution of it is strictly 
prohibited. Please inform us immediately and destroy the original transmittal. 
Thank you for your cooperation.

 
Summary: isotech error with lowast?
Submitted: Bruce Rosenbloom, http://lists.w3.org/Archives/Public/www-math/2009Nov/0043.html
Response: David Carlisle, http://lists.w3.org/Archives/Public/www-math/2009Nov/0045.html
http://lists.w3.org/Archives/Public/www-math/2009Dec/0001.html
Discussion:
> Shouldn't this be defined as:
> 
> <!ENTITY lowast           "&#x0204E;" ><!--low asterisk -->

One might expect that, but like some others (most notably asymp) the
definitions are somewhat skewed by HTML compatibility. Many of the HTML4
entity definitions are somewhat strange but the HTML4 symbol.ent file
linked from 

http://www.w3.org/TR/html4/sgml/dtd.html

defines lowast to be 

<!ENTITY lowast   CDATA "&#8727;" -- asterisk operator, U+2217 ISOtech -->

The HTML entity sets are baked into code in multiple browsers of every
desktop and mobile phone on the planet and nothing that is put in a DTD
entity file will change that as most HTML systems don't read these
definitions from any kind of declarative file, thus is seems futile to
publish an HTML DTD with definitions different from the ones actually
implemented, and it would be odd to try to make XHTML incompatible to
HTML with respect to entity definitions.

If the MathML set were incompatible with XHTML then the meaning of lowast
throughout an xhtml+mathml document would depend on the technical
details of how the DTDs were combined, but whichever definition "won" it
would mean the same thing throughout the document; you can't make the
expansion of the entity sensitive to which element the entity is in.

The main aim of taking the entity set definitions out of MathML into
their own spec at

http://www.w3.org/TR/xml-entity-names/

is to get a common set of definitions that can be used accross
languages, but where existing usage in different communities is
incompatible, getting a common set means something has to change and the
above considerations mean that essentially if there was a conflict the
HTML definition was taken.

I hope that explains why things are as they are.

Resolution:
David,

Thanks for the clear and complete explanation. It's helpful, and I 
appreciate your time to reply.

Best regards,

Bruce

 
Summary: Series of comments from I18N direction
Submitted: Martin Dürst, http://lists.w3.org/Archives/Member/member-i18n-core/2009Nov/0006.html
Response: David Carlisle, http://lists.w3.org/Archives/Public/www-math/2009Nov/0054.html
,
Discussion: Since the comments were ofered on non-public list, but were valuable, it seems correct to reproduce essentially all here.
Please proceed on any of my comments either as a quick fix before LC 
publication or as an input for LC.
..
Now for the comments themselves:

Title: "XML Entity definitions for Characters" looks very ambigous. I 
think something like "XML Entity Definitions for Characters used by 
MathML" or so would help the general public a lot to understand the 
context and coverage of the document.

Abstract: "This document defines several sets of names which are 
assigned to Unicode characters. Each of these sets is also implemented 
as a file of XML entity declarations.":
First, this says that the names are the main stuff, and the XML entities 
are just an implementation detail. This is a contradiction to the title, 
where XML entities are the main thing.
Second, "sets of names which are assigned to Unicode characters" is 
unclear as to whether a set of names is assingned to a Unicode 
character, or something else. The same problem is present elsewhere 
(e.g. first sentence of the Introduction)
Third, all Unicode characters have official names (e.g. LATIN CAPITAL 
LETTER A for U+0041). These are a very important part of nailing down 
the identity of a character. It would be good if either the abstract or 
the Introduction or both would make clear that what you are dealing with 
are short mnemotic names that are different from the official Unicode names.
Fourth, names being *assigned* to Unicode characters doesn't sound 
right. This may be a programmer's viewpoint, but what you are doing, in 
terms of an average programmig language, is to assign Unicode 
codepoints/characters to entity names, not the other way round. XML 
entities in this sense are not much different from variables in a 
programming language, so it would help a lot to keep things straight.

Introduction:
"The W3C Math Working Group has been invited to take over the 
maintenance and development of these sets by the original standards 
committee (ISO/IECJTC1 SC34).": It should say somewhere that this 
document is the result of this "taking over".

There should be a section on Notation, which explains things such as U+ 
and leading slashes (is that TEX?).

Tables:
http://www.w3.org/2003/entities/2007doc/bycodes.html:
- Instead of U00009 and the like, please use the official U+0009 
notation, and do not use a hyphen for character sequences, as this may 
look like a character range.
- Use a <table> so that this displays decently even with 
non-proportional fonts (you can then eliminate the ugly commas). There 
are lots of cases where <table> is misused in Web pages, but this is 
clearly a case where it is "misunused" or "misnonused" or whatever one 
would call the absence of the use of a feature when such use is clearly 
warranted.
- Use proper table headings
- For character sequences, use e.g. "LESS-THAN SIGN with COMBINING LONG 
VERTICAL LINE OVERLAY" rather than "LESS-THAN SIGN with vertical line"

http://www.w3.org/2003/entities/2007doc/byalpha.html:
- Similar comments as for bycodes.html
- I don't understand why this table contains the origins/collections, 
but bycodes.html doesn't.
- I don't understand the lowercase stuff at the end of each line. It 
seems to be some kind of annotations, but in some cases is totally 
useless (e.g. [LATIN SMALL LETTER A WITH CIRCUMFLEX], latin capital 
letter A with circumflex)
- This table puts the official Unicode names in "[" and "]", but 
bycodes.html doesn't. Why? There should be no such gratutious differences.

http://www.w3.org/2003/entities/2007doc/000.html and similar:
Please add a note to all the pages with lots of small glyphs that it may 
take time to load all the images to see all the glyphs. (one test run 
with Mozilla Firebug took 37 seconds on a broadband connection).

Please use a stable, final location for all these GIFs. It's okay to 
have an occasional "301 Moved Permanently" for a page, but it 
essentially doubles the number of objects your page has to download from 
256 to 512. Even the former isn't pretty, the later is definitely bad 
and totally unnecessary. (the redirects come from URIs of the form 
http://www.w3.org/2003/entities/glyphs/003/U003FF.png, the actual images 
seem to be at places such as 
http://www.w3.org/2003/entities/2007doc/glyphs/003/U003FF.png)

Codepoints U+0000 through U+0010 (with three exceptions) are shown as 
"Unicode or XML Non-Character". They are valid control characters in 
Unicode. Strangely enough, there are also such cases (red background 
color) in the U+1D4xx and U+1D5xx 'blocks'. A codepoint such as U+1D53F 
is simply <reserved> in Unicode, the Unicode consortium could decide to 
allocate a character there in the future. This is no different at all 
from all the characters that you marked with a yellow background. The 
only codepoints that are actually non-characters in Unicode are cases 
such as U+FFFF and the like, but you don't have any of these. I 
therefore suggest that the red backgrounds in the U+1D4xx and U+1D5xx 
'blocks' have to be turned to yellow, and the text for the red 
background should be changed to "Characters not representable in XML 
1.0" or some such (most of them would be representable in XML 1.1).

For codepoints with a yellow background, the legend says "XML Character 
not currently described in Unicode". The term "XML Character" is really 
strange. XML uses Unicode, there are no "XML Characters". The cells with 
yellow backgrounds represent unassigned (reserved) Unicode codepoints. 
So the best legend would be "reserved Unicode codepoint (no character 
currently assigned)" or something similar.

Putting the "Next" link above the "Previous" link at the top and bottom 
of these tables seems counterintuitive, because the overall flow is from 
top to bottom.

For http://www.w3.org/2003/entities/2007doc/double-struck.html and similar:

Why do some rows have a yellow background? There's no explanation, so 
the reader is left guessing.

Why do some of these characters not have any corresponding entity names 
at all?

Section 3:

Title: An "Unicode Character Block": As you can see from 
http://unicode.org/Public/UNIDATA/Blocks.txt, Unicode blocks are not of 
equal size of 256 characters, and are not all alligned on boundaries 
divisible by 256. But the reader can easily get such an impression. The 
title, or the text below it, should be changed to reflect this, unless 
(which would be more appropriate for the document (see next comment), 
but may be difficult in terms of production costs) actual Unicode blocks 
are used.

I don't understand why Arabic presentation forms are (as indicated by 
the yellow background) available in the STIX fonts, when basic Arabic 
isn't. Turning things around, would a font for Math or Science have to 
support these? The sentence "The following tables display Unicode ranges 
containing the characters that are most used in mathematics." at the 
start of section 3 seems to suggest so.

Turning things around: Are these tables for all the 256-character-sized, 
aligned parts that contain one or more of the characters for which 
entities have been defined in this document? If yes, please say so. If 
no, please say what the differences are.

Section 5, first sentence: "there are some that use multiple character 
combinations": "multiple character combinations" is "multiple 
combinations of characters". However, characters are used in sequences, 
not in combinations. So "a sequence of multiple characters" or so would 
be better.

Editorial:

- Please change 'definitions' to 'Definitions' it the title, or adopt 
any other W3C approved consistent casing convention. That such an 
inconsistency is 'traditional for this document' shouldn't be a reason 
to keep it.

Section 1, first sentence: "especially in scientific documents, 
especially in mathematics": Repetition; unclear about the relationship 
between the two clauses introduced by 'especially'.

Section 1, second sentence: "has grown in part because its notation 
continually changes": I suggest changing "changes" to "changed" to align 
the tenses.

Section 1, first paragarph: "It is difficult to write science fluently" 
-> "It is difficult to write scientific texts fluently"; same later for 
"read science".

Section 3, first sentence: "Certain characters are of of particular 
relevance": "of of" -> "of"

Section 5, first sentence: "character, however" -> "character. However" 
or "character, but" (however starts a new sentence)


Hope this helps,    Martin.

This plethora of points was answered by David Carlisle on the public Math list:
Martin,

Thanks for your comments on the entities draft.
I've changed the CC list and will handle them as LC comments (as the LC
draft publication is imminent)

These are _personal_ first impressions not a formal response to the
coments.
...
Then see below under the Resolution section for details.
Resolution: Note the final response was not sent by the commenter to the public list but in response to a prompt from Chris Lilley at W3C just before the Director's meeting on advancing the specification to a Proposed Recommendation. Relevant extracts are therefore reproduced below:
From: 	  duerst
Subject: 	Re: URGENT: your comments on "XML Entity definitions for Characters"
Date: 	February 4, 2010 5:58:08 AM EST (CA)

Great, thanks!   Regards,   Martin.

On 2010/02/04 18:52, David Carlisle wrote:

| Sorry, I just changed it to use exactly the wording you suggest.
sh-3.2$ cvs commit -m "update unassigned legend for MD" characters.xsl
Checking in characters.xsl;
/w3ccvs/WWW/2003/entities/2007xml/characters.xsl,v <-- characters.xsl
new revision: 1.54; previous revision: 1.53

From:   duerst
Subject: Re: URGENT: your comments on "XML Entity definitions for Characters"
Date: February 4, 2010 3:25:26 AM EST (CA)
To:   chris@w3.org

Sorry to be so late in responding. I don't really feel like I have been able to 
read email the last two/three weeks, very busy time of the year.

I'm quite satisfied with how my comments were addressed. However, what in 
fact originally got me to look closer at the document unfortunately hasn't 
been fixed at all.

It is the legend given to the light yellow cells in the "code range" charts 
(see e.g. http://www.w3.org/2003/entities/2007doc/003.html).
It still says "XML Character not currently described in Unicode".

The previous exchange on this was:

>> For codepoints with a yellow background, the legend says "XML Character
>> not currently described in Unicode". The term "XML Character" is really
>> strange. XML uses Unicode, there are no "XML Characters".
>
>"XML Characters" is intended to mean something matching the XML char
>production, that is, a character usable as character data in XML,
>which is a bit less than full unicode range as you know. However
>the legend has been reworded as noted in the previous comment.

Something seems to be missing, because e.g. for the cells with light 
purple background (e.g. at http://www.w3.org/2003/entities/2007doc/000.html), 
the text now says "Codepoint not allowed as XML 1.0 character data", so 
I would have expected the text for the light yellow cells to say something 
like "Codepoint allowed as XML 1.0 character data; no Unicode character defined" 
or something similar.

Anyway, David's response says it has been changed but those changes seem 
to have been lost. I'm sure that can be fixed quickly, and the document 
can go ahead.

Regards,    Martin.


-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University

Martin,

thanks again for your comments on the last call draft of 
XML Entity Definitions for Characters


The last call draft is at the URI:

http://www.w3.org/TR/2009/WD-xml-entity-names-20091117/

An Editors' draft showing the changes made in response to LC comments so
far is available at the URI:

http://www.w3.org/2003/entities/2007doc/Overview.html


I hope we have addressed all the points that you have raised.  As you
will know, the W3C process requires that we log the resolution of every
last call comment, so we would appreciate it if you could confirm via an
email to www-math list whether all the points you have raised have been
addressed satisfactorily.

David


> 
>    Now for the comments themselves:
> 
>    Title: "XML Entity definitions for Characters" looks very ambigous. I 
>    think something like "XML Entity Definitions for Characters used by 
>    MathML" or so would help the general public a lot to understand the 
>    context and coverage of the document.


Although parts of this document were derived from the MathML2 spec
sources, this is explicitly _not_ just for MathML. It includes several
entity sets that are not included in the MathML DTD (isogrk1, isogrk2,
isogrk4, xhtml1-lat1, xhtml1-special, xhtml1-symbol, html5-uppercase) So
as well as being used for MathML it can be used for HTML (HTML5 uses
these definitions for example) and serves as an update for the (now
cancelled) ISO/IEC document 9573-13 defining the ISO entity sets. It
was for example cited in the docbook documentation for use with docbook
(now that docbook5 is RelaxNG defined and does not have its own set of
entity definitions). Thus it is important that the title does not
mention MathML as it is explicitly not just for MathML.


> 
>    abstract: "This document defines several sets of names which are 
>    assigned to Unicode characters. Each of these sets is also implemented 
>    as a file of XML entity declarations.":
>    First, this says that the names are the main stuff, and the XML entities 
>    are just an implementation detail. This is a contradiction to the title, 
>    where XML entities are the main thing.

The statement you quote is factually true, however we have reworded
it to remove the implied relative importance of the different
aspects.

>    Second, "sets of names which are assigned to Unicode characters" is 
>    unclear as to whether a set of names is assigned to a Unicode 
>    character, or something else. The same problem is present elsewhere 
>    (e.g. first sentence of the Introduction)

This has been reworded to clarify this.


>    Third, all Unicode characters have official names (e.g. LATIN CAPITAL 
>    LETTER A for U+0041). These are a very important part of nailing down 
>    the identity of a character. It would be good if either the abstract or 
>    the Introduction or both would make clear that what you are dealing with 
>    are short mnemotic names that are different from the official Unicode names.

A comment pointing this out has been added to the introduction.

>    Fourth, names being *assigned* to Unicode characters doesn't sound 
>    right. This may be a programmer's viewpoint, but what you are doing, in 
>    terms of an average programmig language, is to assign Unicode 
>    codepoints/characters to entity names, not the other way round. XML 
>    entities in this sense are not much different from variables in a 
>    programming language, so it would help a lot to keep things straight.
> 

It is of course possible to view this mapping in either direction.
and in fact the mappings are implemented in both directions by the xml
entity files and the xslt character maps respectively. Although being a
many-many map these are not exact inverses. However as you say, it is
probably clearer to use the wording of assigning codepoints to names
rather than the other way round, and the document has been edited
accordingly wherever it used "assigned".

> 
>    Introduction:
>    "The W3C Math Working Group has been invited to take over the 
>    maintenance and development of these sets by the original standards 
>    committee (ISO/IECJTC1 SC34).": It should say somewhere that this 
>    document is the result of this "taking over".
> 
Well historically the document began before SC34 considered updating
9573-13 and a long time before they decided to cancel that project.
Informally they cancelled the project because this set was being more
actively maintained and although I was editing both documents I couldn't
keep to SC34 timescales as I couldn't get ahead of mathml3 and html5,
however we shouldn't speculate on the reasons behind the SC34
decision in the W3C REC track document.

> 
>    There should be a section on Notation, which explains things such as U+ 
>    and leading slashes (is that TEX?).
> 

It's pseudo TeX used (without explanation) in the original ISO standard.
The original ISO entity definitions only gave those descriptions (and no
unicode mappings) and the job really is to match those to unicode in the
most sane way possible subject to compatibility constraints. So I don't
want to change the entity description texts in any way as they are the
reference point for comparison to the ISO standards.

> 
>    Tables:
>    http://www.w3.org/2003/entities/2007doc/bycodes.html:
>    - Instead of U00009 and the like, please use the official U+0009 
>    notation, and do not use a hyphen for character sequences, as this may 
>    look like a character range.

We have revised the document to use U+ notation consistently. The U12345
ID form is just now used for internal linking, and for filenames, not
for referring to codepoints on the text or tables.


>    - Use a <table> so that this displays decently even with 
>    non-proportional fonts (you can then eliminate the ugly commas). There 
>    are lots of cases where <table> is misused in Web pages, but this is 
>    clearly a case where it is "misunused" or "misnonused" or whatever one 
>    would call the absence of the use of a feature when such use is clearly 
>    warranted.
>    - Use proper table headings
>    - For character sequences, use e.g. "LESS-THAN SIGN with COMBINING LONG 
>    VERTICAL LINE OVERLAY" rather than "LESS-THAN SIGN with vertical line"
> 

There were explicit requests from developers (when this table was in
MathML2) for an ascii file that could easily be tested against code,
the format that developed with the monospace layout but including some
hyperlinking is a compromise.


>    http://www.w3.org/2003/entities/2007doc/byalpha.html:
>    - Similar comments as for bycodes.html
>    - I don't understand why this table contains the origins/collections, 
>    but bycodes.html doesn't.
>    - I don't understand the lowercase stuff at the end of each line. It 
>    seems to be some kind of annotations, but in some cases is totally 
>    useless (e.g. [LATIN SMALL LETTER A WITH CIRCUMFLEX], latin capital 
>    letter A with circumflex)

The final field is the original ISO entity description. If it looks the
same as the unicode formal name than that is good, it isn't superfluous:
it is conformation that the entity has been paired with the right
unicode character. We note again that the original ISO entity definitions
_only_ gave those lower case descriptions not any unicode mapping.
However the order of the columns has now been changed so that this
entity description now comes after the entity name, with the Unicode
codepoint and formal name being the last two columns. Also information
has been added to the top of the file explaining what is in each
column.



>    - This table puts the official Unicode names in "[" and "]", but 
>    bycodes.html doesn't. Why? There should be no such gratutious differences.

Accepted as an editorial improvement.  Also the order of the columns has
been changed to put the entity description after the entity name rather
than after the Unicode formal name, and a paragraph describing the
column format has been added at the start of the page.


>    http://www.w3.org/2003/entities/2007doc/000.html and similar:
>    Please add a note to all the pages with lots of small glyphs that it may 
>    take time to load all the images to see all the glyphs. (one test run 
>    with Mozilla Firebug took 37 seconds on a broadband connection).

A suitable warning note has been added.


>    Please use a stable, final location for all these GIFs. It's okay to 
>    have an occasional "301 Moved Permanently" for a page, but it 
>    essentially doubles the number of objects your page has to download from 
>    256 to 512. Even the former isn't pretty, the later is definitely bad 
>    and totally unnecessary. (the redirects come from URIs of the form 
>    http://www.w3.org/2003/entities/glyphs/003/U003FF.png, the actual images 
>    seem to be at places such as 
>    http://www.w3.org/2003/entities/2007doc/glyphs/003/U003FF.png)

You happened to review the document while it was in transition, and the
redirects were put in place to keep everything working. Current builds
directly reference the new location of the png images, and the redirects
would only be used if someone has linked to the old locations.



>    Codepoints U+0000 through U+0010 (with three exceptions) are shown as 
>    "Unicode or XML Non-Character". They are valid control characters in 
>    Unicode.

yes they are valid in unicode but not in XML 1.0 hence "Unicode or XML"
but see below.


>     Strangely enough, there are also such cases (red background 
>    color) in the U+1D4xx and U+1D5xx 'blocks'. A codepoint such as U+1D53F 
>    is simply <reserved> in Unicode, the Unicode consortium could decide to 
>    allocate a character there in the future. This is no different at all 
>    from all the characters that you marked with a yellow background. The 
>    only codepoints that are actually non-characters in Unicode are cases 
>    such as U+FFFF and the like, but you don't have any of these. I 
>    therefore suggest that the red backgrounds in the U+1D4xx and U+1D5xx 
>    'blocks' have to be turned to yellow, and the text for the red 
>    background should be changed to "Characters not representable in XML 
>    1.0" or some such (most of them would be representable in XML 1.1).

All except 0000 would be representable in xml 1.1 as numeric references I think.
XML 1.1 came out after that text was written...
We don't want to mark the reserved "holes" in the 1Dxxx blocks the
same as completely unallocated codepoints.
The various cases are now separately distinguished (codepoint not usable
in xml 1,0, reserved codepoint in plane 1, unallocated codepoint) these
have been given different css classes and colours, and the key on each
table identifies the cases that occur on that page.


> 
>    For codepoints with a yellow background, the legend says "XML Character 
>    not currently described in Unicode". The term "XML Character" is really 
>    strange. XML uses Unicode, there are no "XML Characters".

"XML Characters" is intended to mean something matching the XML char
production, that is, a character usable as character data in XML,
which is a bit less than full unicode range as you know. However
the legend has been reworded as noted in the previous comment.


>     The cells with 
>    yellow backgrounds represent unassigned (reserved) Unicode codepoints. 
>    So the best legend would be "reserved Unicode codepoint (no character 
>    currently assigned)" or something similar.

Looking at it from a unicode viewpoint it makes sense to say it's a
codepoint to which no character is currently assigned. But looking at it
from an xml viewpoint it _is_ a character (or more exactly it
corresponds to well formed character data matching the char production)
but unicode has not assigned any interpretation for that character.
As noted above the tables now distinguish more cases, separating out the
control characters (not usable directly in XML) from the reserved codepoints.



> 
>    Putting the "Next" link above the "Previous" link at the top and bottom 
>    of these tables seems counterintuitive, because the overall flow is from 
>    top to bottom.
> 

The ordering was inconsistent, we have now consistently ordered these
links as suggested.

> 
>    For http://www.w3.org/2003/entities/2007doc/double-struck.html and similar:
> 
>    Why do some rows have a yellow background? There's no explanation, so 
>    the reader is left guessing.

They are highlighting the cases that are in the BMP not in the
(possibly?) expected runs in the 1Dxxx block. This was explained in a
note at the start of the section (in the overview document) however we
have added an additional footnote at the bottom of each affected page.


> 
>    Why do some of these characters not have any corresponding entity names 
>    at all?

Because, as stated explicitly in the introduction, this specification
doesn't define any new names, it only allocates unicode code points to
names previously thought up by ISO or the W3C.

> 
> 
>    Section 3:
> 
>    Title: An "Unicode Character Block": As you can see from 
>    http://unicode.org/Public/UNIDATA/Blocks.txt, Unicode blocks are not of 
>    equal size of 256 characters, and are not all alligned on boundaries 
>    divisible by 256. But the reader can easily get such an impression. The 
>    title, or the text below it, should be changed to reflect this, unless 
>    (which would be more appropriate for the document (see next comment), 
>    but may be difficult in terms of production costs) actual Unicode blocks 
>    are used.
> 

Yes in the table of contents all the block names that occur in the 256
square are listed, with "(continued)" added when the blocks run over.
The section title has been changed to use "Ranges" rather than "Blocks"
to avoid any impression that the 256 squares are Blocks.

>    I don't understand why Arabic presentation forms are (as indicated by 
>    the yellow background) available in the STIX fonts, when basic Arabic 
>    isn't. Turning things around, would a font for Math or Science have to 
>    support these? The sentence "The following tables display Unicode ranges 
>    containing the characters that are most used in mathematics." at the 
>    start of section 3 seems to suggest so.

Given the list of blocks most used in science/mathematics (eg as listed
in unicode report 25) every 256-aligned range that covers those blocks
is listed, which means that some additional characters are shown in the
tables. The exact details of the Arabic support are somewhat in flux as
there are unicode proposals to add variant forms (in a similar manner to
the variants for latin and greek in 1d4xx and 1d5xx) and as for the
latin/greek cases there is some discussion as to whether existing
variant letters in the BMP should be reused.


> 
>    Turning things around: Are these tables for all the 256-character-sized, 
>    aligned parts that contain one or more of the characters for which 
>    entities have been defined in this document? If yes, please say so. If 
>    no, please say what the differences are.
> 
As above; they are tables for all the 256-character-sized, aligned parts
that contain a math/science related block as listed in unicode tr 25.



> 
>    Section 5, first sentence: "there are some that use multiple character 
>    combinations": "multiple character combinations" is "multiple 
>    combinations of characters". However, characters are used in sequences, 
>    not in combinations. So "a sequence of multiple characters" or so would 
>    be better.
> 

OK, change made.

 
> 
> 
>    Editorial:
> 
>    - Please change 'definitions' to 'Definitions' it the title, or adopt 
>    any other W3C approved consistent casing convention. That such an 
>    inconsistency is 'traditional for this document' shouldn't be a reason 
>    to keep it.

agreed, d changed to D.

> 
>    Section 1, first sentence: "especially in scientific documents, 
>    especially in mathematics": Repetition; unclear about the relationship 
>    between the two clauses introduced by 'especially'.

agreed , this has been reworded.


>    Section 1, second sentence: "has grown in part because its notation 
>    continually changes": I suggest changing "changes" to "changed" to align 
>    the tenses.
> 

The tense of "changes" is intentional here. The evolution is still
in progress.



>    Section 1, first paragarph: "It is difficult to write science fluently" 
>    -> "It is difficult to write scientific texts fluently"; same later for 
>    "read science".


This has been reworded.

> 
>    Section 3, first sentence: "Certain characters are of of particular 
>    relevance": "of of" -> "of"

The spurious "of" has been deleted.


> 
>    Section 5, first sentence: "character, however" -> "character. However" 
>    or "character, but" (however starts a new sentence)
> 
I think "however" is being used as a conjunction there rather than start a
new sentence, however the phrase has been reworded.

Thanks again for the comments,



 
Summary: htmlmath entities collection
Submitted: David Carlisle, http://lists.w3.org/Archives/Public/www-math/2009Dec/0000.html
Response: ,
Discussion:
This is a last call comment for the public record, noting an addition
made to the editor's draft of the XML Entity Definitions for Characters
specification. The change has already been made to the Editors' draft so
will be assumed resolved, however if anyone would like to comment on
this addition, please do reply to this message (on www-math list).

The entities last call draft 

http://www.w3.org/TR/2009/WD-xml-entity-names-20091117/

includes a combined entity file w3centities-f that defines all the
entities defined in the specification.

As noted in a thread a while ago on the public-html list

http://lists.w3.org/Archives/Public/public-html/2009Nov/0305.html

for some uses it would be more useful to have a combined entity set just
including those sets used in mathml and html and omitting the other ISO
entity sets that are not typically used in a web context. This would
correspond closely with an updated version of the entity file that
firefox uses for mathml documents for example.

In the Editors' draft

http://www.w3.org/2003/entities/2007doc/Overview.html#sets

I've now added two files htmlmathml.ent (which references each of the
entity sets used by mathml or html) and htmlmathml-f.ent which directly
contains each definition, sorted into alphabetic order, with duplicates
removed.

http://www.w3.org/2003/entities/2007/htmlmathml.ent
http://www.w3.org/2003/entities/2007/htmlmathml-f.ent


there are no changes to any of the defined entity definitions resulting
from this addition.

David

Resolution: Self-closing editorial comment.
 
Summary: Possible bug in the operator dictionary and mmlalias.ent?
Submitted: Bobby Thomale, http://lists.w3.org/Archives/Public/www-math/2010Feb/0005.html
Response: David Carlisle, http://lists.w3.org/Archives/Public/www-math/2010Feb/0007.html
Discussion:
I have been going through the operator dictionary writing text
descriptions of each mathematical operator. While I was doing this, I
noticed what appears to be an error in the mmlalias.ent file, as well
as the entity tables on the mathml standard website.

At the top of the file it says to report bugs on this list.

If you look at the following entities in mmlalias.ent:

<!ENTITY NotGreater       "&#x0226F;" ><!--alias ISOAMSN ngt -->
<!ENTITY NotGreaterEqual  "&#x02271;" ><!--alias ISOAMSN nge -->
<!ENTITY NotGreaterFullEqual "&#x02266;&#x00338;" ><!--alias ISOAMSN nlE -->
<!ENTITY NotGreaterGreater "&#x0226B;&#x00338;" ><!--alias ISOAMSN nGtv -->


NotGreaterFullEqual is, I believe, in error. If you look at the
glyphs, the symbol being negated: ≦ is actually a less than
sign over a full equals.


<!ENTITY LessFullEqual    "&#x02266;" ><!--alias ISOAMSR lE -->


http://www.fileformat.info/info/unicode/char/2266/index.htm

The other "not greater" glyphs are, correctly, variants of greater
than symbols with a negation line through them.

You can also see this problem if you search the following table for
"NotGreaterFullEqual":

http://www.w3.org/TR/MathML/isoamsn.html

Its clearly mapped to "LESS-THAN OVER EQUAL TO with slash." Seems wrong.

Also odd is the fact that NotGreaterFullEqual is defined here and
listed in the operator dictionary, but NotLessFullEqual isn't. There
are 7 entities beginning with "NotGreater..." defined in the operator
dictionary, and only 6 "NotLess..." ones - NotGreaterFullEqual is the
one that does not have a corresponding NotLess... defined.

http://www.w3.org/TR/MathML2/appendixf.html


"&NotGreater;"                       form="infix"
lspace="thickmathspace" rspace="thickmathspace"
"&NotGreaterEqual;"                  form="infix"
lspace="thickmathspace" rspace="thickmathspace"
"&NotGreaterFullEqual;"              form="infix"
lspace="thickmathspace" rspace="thickmathspace"
"&NotGreaterGreater;"                form="infix"
lspace="thickmathspace" rspace="thickmathspace"
"&NotGreaterLess;"                   form="infix"
lspace="thickmathspace" rspace="thickmathspace"
"&NotGreaterSlantEqual;"             form="infix"
lspace="thickmathspace" rspace="thickmathspace"
"&NotGreaterTilde;"                  form="infix"
lspace="thickmathspace" rspace="thickmathspace"


"&NotLess;"                          form="infix"
lspace="thickmathspace" rspace="thickmathspace"
"&NotLessEqual;"                     form="infix"
lspace="thickmathspace" rspace="thickmathspace"
"&NotLessGreater;"                   form="infix"
lspace="thickmathspace" rspace="thickmathspace"
"&NotLessLess;"                      form="infix"
lspace="thickmathspace" rspace="thickmathspace"
"&NotLessSlantEqual;"                form="infix"
lspace="thickmathspace" rspace="thickmathspace"
"&NotLessTilde;"                     form="infix"
lspace="thickmathspace" rspace="thickmathspace"


-- 
Bobby Thomale
Vital Source Technologies
http://www.vitalsource.com

http://lists.w3.org/Archives/Public/www-math/2010Feb/0008.html
oh, thanks. It would seem that that has been wrong forever, at least 
since mathml 1 in 1998

http://www.w3.org/TR/1998/REC-MathML-19980407/chap6/MMALIAS2.html

We have a working group phone call this afternoon and I'll put this on 
the agenda, but it seems like this is definitely wrong unless there s 
some subtlety I'm missing.

David

http://lists.w3.org/Archives/Public/www-math/2010Feb/0009.html
Yes, that sounds right to me too.

Also you might want to consider adding NotLessFullEqual as the negation of:
<!ENTITY LessFullEqual "&#x02266;" ><!--alias ISOAMSR lE -->
for completeness, since all of the other variants of greater and not
greater have corresponding less than and not less than entities
defined.

Somehow those two symbols probably got mixed up and combined when the
list was initially being compiled.

-- 
Bobby Thomale

http://lists.w3.org/Archives/Public/www-math/2010Feb/0010.html
It's clear that this is "missing" in some sense, but we're taking a 
fairly hard line in not adding any new names.

If we add a name and someone uses it then any fragments of MathML using 
that that get moved to a document using an older dtd are not well formed 
and the xml parser would reject the entire document.

This is why for example the double struck letters only have entities for 
upper case not lower case, even though there are now defined Unicode 
code points for upper and lower case

David

http://lists.w3.org/Archives/Public/www-math/2010Feb/0011.html
Interesting.

Ok, that makes sense. Fixing the entity that's named wrong is the
important part anyhow.

-- 
Bobby Thomale

Resolution:
On Feb 4, 2010, at 8:53 PM, David Carlisle wrote:

I fixed the "typos" relating to Unicode and NotGreaterFullEqual
It affected several tables in the doc and I thought I'd better record 
it in the change log in both B.1 and C.2. Altogether the files below changed....

David
Checking in 2007/htmlmathml-f.ent;
/w3ccvs/WWW/2003/entities/2007/htmlmathml-f.ent,v  <--  htmlmathml-f.ent
new revision: 1.4; previous revision: 1.3
done
Checking in 2007/mmlalias.ent;
/w3ccvs/WWW/2003/entities/2007/mmlalias.ent,v  <--  mmlalias.ent
new revision: 1.14; previous revision: 1.13
done
Checking in 2007/mmlaliasmap.xsl;
/w3ccvs/WWW/2003/entities/2007/mmlaliasmap.xsl,v  <--  mmlaliasmap.xsl
new revision: 1.10; previous revision: 1.9
done
Checking in 2007/w3centities-f.ent;
/w3ccvs/WWW/2003/entities/2007/w3centities-f.ent,v  <--  w3centities-f.ent
new revision: 1.13; previous revision: 1.12
done
Checking in 2007doc/Overview.html;
/w3ccvs/WWW/2003/entities/2007doc/Overview.html,v  <--  Overview.html
new revision: 1.25; previous revision: 1.24
done
Checking in 2007doc/U00338.html;
/w3ccvs/WWW/2003/entities/2007doc/U00338.html,v  <--  U00338.html
new revision: 1.10; previous revision: 1.9
done
Checking in 2007doc/byalpha.html;
/w3ccvs/WWW/2003/entities/2007doc/byalpha.html,v  <--  byalpha.html
new revision: 1.26; previous revision: 1.25
done
Checking in 2007doc/bycodes.html;
/w3ccvs/WWW/2003/entities/2007doc/bycodes.html,v  <--  bycodes.html
new revision: 1.23; previous revision: 1.22
done
Checking in 2007doc/isoamsn.html;
/w3ccvs/WWW/2003/entities/2007doc/isoamsn.html,v  <--  isoamsn.html
new revision: 1.17; previous revision: 1.16
done
Checking in 2007doc/mmlalias.html;
/w3ccvs/WWW/2003/entities/2007doc/mmlalias.html,v  <--  mmlalias.html
new revision: 1.20; previous revision: 1.19
done
Checking in 2007xml/character-set.xml;
/w3ccvs/WWW/2003/entities/2007xml/character-set.xml,v  <-- character-set.xml
new revision: 1.85; previous revision: 1.84
done
Checking in 2007xml/unicode.xml;
/w3ccvs/WWW/2003/entities/2007xml/unicode.xml,v  <--  unicode.xml
new revision: 1.47; previous revision: 1.46
done
sh-3.2$