Comments on Arabic Mathematical Notation

Version reviewed, dated 25 November 2005

Main reviewer

Richard Ishida


These are so far mostly personal comments, NOT on behalf of the I18N WG. Where the WG has expressed agreement this is noted in the table.


ID Location CommentEd. /
Discussion threads

A very nice, clearly written note. Very useful work. Thank you!

Note that I am reading this note without a good understanding of how MathML works. On the whole, I think this may lead to useful comments.

1 General

It may be useful to add the note about available fonts to the document itself, rather than just a javascript alert. Especially given that the popup doesn't reappear after you have visited the page once.


"In the same way that European mathematics broadens the set of distinct symbols available by using bold face, Fraktur or other styles, so does Arabic mathematics. The distinctions in Arabic are made by varying strokes, adding tails or other extensions. As Arabic is a calligraphic script, letters within words are typically joined together..."

-> "In the same way that European mathematics broadens the set of distinct symbols available by using bold face, Fraktur or other styles, so does Arabic mathematics but typically by varying strokes, adding tails or other extensions. As Arabic is a calligraphic script, letters within words are typically joined together..."

Otherwise it sounds like Arabic uses Fraktur, bold and other other European conventions, and then does some additional stuff.

31"We relegate all localization issues..." This is important enough to begin a new paragraph, so it is not lost from view. It should also link to the appendix.E

"(although current MathML renderers are not doing this)"

Not an accurate statement, I believe. My out-of-the-box Firefox browser does it fine. And even IE and Opera, while not presenting the math information well, still do the glyph shaping fine for these words.

52.1It may be better to show the Morrocan approach later, since it is the more complicated in terms of directionality. This may make it easier for the reader who is unfamiliar with bidi text to understand what's going on.E
6generalI understood the examles in the note because i'm very familiar with arabic text, but for others i would strongly suggest that you add a section entitled something like "Features of the Arabic script", and include in it the initial paragraphs in the subsections 3.1, 3.2, 3.4, possibly 3.5, and A.1, using the subheadings 'Text direction', 'Glyph shaping', 'Mirroring', 'Horizontal stretchiness', and "Number systems". At the moment these concepts are explained at earliest half-way through the document.E
7generalI would be interested in seeing how this works in Hebrew too.E

I would say something like the following in place of the first paragraph in 3.1 (or in a separate subsection as described in comment #6):

"When a mixture of LTR and RTL characters appears in text (ie. bidirectional or BiDi text), Unicode's BiDirectional Algorithm [UnicodeBiDi] describes the order in which the characters will be displayed. All adjacent strongly-typed RTL characters (such as in a single Arabic word) will be presented in right-to-left order, and vice versa for strongly-typed LTR characters. A cluster of characters with the same directionality can be called a directional run. "

"At directional boundaries, directional runs are then ordered according to the overall directional context applied to a 'paragraph'. The bidi algorithm allows for higher-level protocols that determine essentially what parts of a given document constitute paragraphs, in the BiDi sense. For example, in HTML the directional context is applied to the html tag or changed on lower-level elements using the dir attribute."

"Although the order of characters in words or numbers is fixed, the overall direction of an equation or mathematical expression may be LTR or RTL in Arabic, depending on the preferred style."

"Difficulties can arise where it is not clear whether directionally-neutral Unicode characters, such as plus or equals signs, should take on the directionality of the context (the equation) or of the surrounding characters. "

"Further difficulties can arise in mathematical equations since sequences of digits that compose a number always run LTR, whether European or other digit shapes are used, however such a number does not constitute a separate directional run when embedded in surrounding RTL text."

"(For more information see What you need to know about the bidi algorithm and inline markup.)"

(which is why you get the 1 - ت / 2 + 4 problem, but only get it when the overall context is LTR)

This kind of explanation will allow you to rewrite the para in 2.1 below the table so that it's clearer. It's the directionality of the expression that is LTR, but Arabic words are still to be rendered RTL.


"... and assuring that the appropriate symbols are marked as mirrored and that they set RTL as needed."

The mirroring of the characters should be automatic in a RTL environment if the characters have the Unicode mirrored property and the font supports the rendering. They should only need additional markup as a temporary workaround if this is not the case.


"but indicate that the Unicode BiDi algorithm, along with glyph shaping, should apply individually to token elements"

The meaning of this is not clear to me. In a RTL context this should be just fine.


Possible complications that spring to mind with the use of REVERSE SOLIDUS:

  1. could cause complications for localization, given that the context may be important for deciding how to translate. I suppose it mainly depends on whether the reverse solidus is used for any other purpose, and whether localisation is done with human intervention or not.
  2. would it change the meaning of the expression?
  3. it could cause complications for generating presentation from Content MathML, if I understand that process correctly.

I don't think you need to use an NCR for the backslash '\'.


"The overall mathematical directionality should be determined by a (new) dir attribute on the outermost math element which takes one of the values ltr or rtl; the default is ltr."

I think you should add, or is inherited from an enclosing element.


"The text content of each Token element should be treated as a separate paragraph (in the Unicode BiDi sense), with an initial directionality determined by the mathematical directionality"

Do you mean, each sub element under math should be treated as a separate directional run, and that each element should be displayed in the order determined by the directional context for the equation?

If so, this seems like a reasonable strategy on the face of it, although we would need to test for problems in edge cases.


I think you also need to specify the ability to temporarily override the directional context set by the math element in order to cope with the Morrocan example of otherwise (with...), which appears to switch base directionality a couple of times.


"alternative shapes are used depending on position"

Actually this depends on how other characters join to it, rather than just the position, given the exceptions you mention later. Maybe better to say 'generally'.


"Authors, of course, should also avoid using the characters (ا د ذ ر ز و) in the middle of words."

Not sure of the reason for this. Surely if they would use in non-marked up text there's no reason to avoid them in MathML.


"Thus, implementors are strongly encouraged to apply shaping to each character sequence within the text content of any token elements."

I think this should say "implementors should apply", and assume that that is the default. The exception should be non-joined sequences. Note, however, that the Unicode non-joiner character can be recommended in some cases. I'm not sure why it is not mentioned here.


We propose to add an additional allowed value madrwb

Hmm. There are several possible transcriptions of مضروب including maDruub, or mdrwb. Personally, I prefer madruub, since it is more consistently related to (approximate) sound and easier for non-arabic speakers to talk about.


In the table I recommend that you are consistent with the naming - either Unicode or traditional names. I would recommend the Unicode naming: European, Arabic-indic, and Eastern Arabic-indic.


"such a comma is distinct from the Arabic comma "Arabic Comma" used to separator items in a list"

It would be helpful to list some Unicode code point values here to identify which commas we are talking about. The graphic is not terribly clear.


separator -> separate


See also additional editorial comments on subsequent draft.

Version: $Id: Overview.html,v 1.6 2006/01/04 11:19:46 rishida Exp $