# 2 MathML Fundamentals

Issue update_fundamentals wiki (member only) The current chapter remains based largely from MathML2 since the language MathML has not been drastically changed. The contents have been settled upon but there are still details still being considered by the Working Group. The chapter has been reformulated and much shortened. Almost all that devolves from its role as an XML vocabulary is now considered to be adequately described by mentining that fact. An attempt has been made to keep the text drier than before. In order to provide a concrete example of a snippet of actual MathML early a treatmet of the quadratic formula has been added to the previous chapter.

## 2.1 MathML Syntax and Grammar

### 2.1.1 General Considerations

MathML is an application of [XML], Extensible Markup Language, and as such it is governed by the rules of XML syntax. XML syntax is a notation for rooted labeled planar trees. Planarity means that the children of a node may be viewed as given a natural order and MathML depends on this.

As an XML vocabulary, MathML's character set must be consist of legal characters as specified by the XML recommendation. XML mentions [Unicode]. The subject of Unicode characters as used for mathematics is discussed in Chapter 6 Characters, Entities and Fonts.

MathML specifies some syntactical and grammatical rules in addition to the general rules it inherits as an XML application. The grammar of MathML3 is specified by using a RelaxNG Schema. In other words, the generalities of using tags, attributes, entity references and the like are defined in the XML language specification, and the details about MathML elements and attribute names, which elements can be nested inside each other, and their possible relationships are specified in the MathML Schema. This is in Appendix A Parsing MathML.

The grammatical aspects of MathML2 were specified by a DTD, or Document Type Definition, and alternatively by an XML Schema, as specified by the W3C [XMLSchemas]. In an attempt to maintain continuity as MathML is revised a new MathML3 XML Schema is provided in Appendix A Parsing MathML, but the normative schema for MathML3 is that in Relax_NG form [RELAX-NG].

A special aspect of the MathML specification is that there are two main strains of markup, in Chapter 3 Presentation Markup and Chapter 4 Content Markup, which address, separately, the presentational and semantic aspects of formulas. Content markup is specified in particular detail. This specification makes use of a format called Content Dictionaries, which is also an application of XML. This new type of format has been developed in collaboration with the OpenMath Society, and is given in Chapter 8 MathML3 Content Dictionaries.

There are two kinds of grammar and syntax rules added by MathML to those inherited from XML. One kind involves placing additional constraints on attribute values. For example, it is not possible in pure XML to require that an attribute value be a positive integer. The second kind of rule specifies more detailed restrictions on the child elements (for example on ordering) than are given in the DTD or even a schema. For example, it is not possible in pure XML to specify that the first child be interpreted one way, and the second in another.

The following sections discuss features both of XML syntax and grammar in general, and of MathML in particular. Throughout the remainder of the MathML specification, we will usually take care to distinguish between usage required by XML syntax and the MathML Schema and usage required by MathML specific rules. However, we will often allude to "MathML errors" without identifying which part of the specification is being violated.

### 2.1.2 Children versus Arguments

Many MathML elements require a specific number of children or attach additional meanings to child elements in certain positions. As noted above, these kinds of requirements are specific to MathML, and cannot be given entirely using XML syntax and grammar. When the children of a given MathML element are subject to these kinds of additional conditions, we will often refer to them as arguments instead of merely as children, in order to emphasize their MathML specific usage. Note that, especially in Chapter 3 Presentation Markup, the term "argument" is usually used in this technical sense, unless otherwise noted, and therefore refers to a child element.

In the detailed discussions of element syntax given with each element throughout the MathML specification, the number of arguments required and their order are implicitly indicated by giving names for the arguments at various positions. This information is also given for presentation elements in the table of argument requirements in Section 3.1.3 Required Arguments.

A few elements have other requirements on the number or type of arguments. These additional requirements are described together with the individual elements.

### 2.1.3 MathML Attribute Values

An MathML attribute's value, as the value of an XML attribute must be a string of legal characters as specified by the XML recommendation. Attribute names are generally shown in a `monospaced` font within descriptive text in this specification, just as the `monospaced` font is used for examples.

MathML uses a more complicated syntax for attribute values than the generic XML syntax. These additional rules are intended for use by MathML applications, and it is a MathML error to violate them, though they cannot be enforced by processing that employs only what is needed to comply with XML's recommendations. The MathML syntax of each attribute value is specified in the table of attributes provided with the description of each element, using a notation described below. Attribute values may contain any MathML characters as specified in Chapter 6 Characters, Entities and Fonts also permitted by the syntax restrictions for an attribute. Character data can be included directly in attribute values, or by using entity references as described in Section 6.2 Unicode Character Data which is dependent on the list of named character entities for XML as specified in [Entities]. However, modern practice suggest that it is preferable to use numeric character references rather than XML entities to avoid the need for the presence of a DTD with the entity definitions. After the initial parsing, the character entities are all resolved to Unicode character codes in any case.

In particular, the characters `"` (U+0022), `'` (U+0027), `&` (U+0026) and `<` (U+003C) can be included in MathML attribute values (when permitted by the attribute value syntax) using the entity references `&quot;`, `&apos;`, `&amp;` and `&lt;`, respectively. These characters have special roles in XML, and for that reason are usable in character entity form without resorting to Unicode character codes, which are, of course, valid too.

When MathML applications process attribute values, whitespace (as defined by Unicode character classes and made explicit below Section 2.1.5 Collapsing Whitespace in Input) should be ignored except to separate letter and digit sequences into individual words or numbers. But note that this normalisation was not implemented in early MathML processors so, for backwards compatibility, it is advisable not to add extra whitespace within attribute values.

 Editorial note: Robert Miner and Chris and George Henri Sivonen notes that trimming of whitespace around ennumerated attributes is not widely implemented. For example, movablelimits="false" and movablelimits=" false " are not treated in the same way in Firefox. http://lists.w3.org/Archives/Public/www-math/2007Dec/0008.html

#### 2.1.3.1 Syntax notations used in the MathML specification

To describe the MathML-specific syntax of permissible attribute values, the following conventions and notations are used for most attributes in the present document.

Notation What it matches
number a decimal integer or rational number (a string of decimal digits from the range U+0030 to U+0039, with up to one decimal point represented by U+002E), optionally starting with '-' (U+002D)
unsigned-number a decimal integer or real number, no sign
integer a decimal integer, optionally starting with '-' (U+002D)
positive-integer a decimal integer, unsigned, not 0 (U+0030)
string an arbitrary character string (always the entire attribute value)
character a single non-whitespace character, or MathML entity reference; whitespace separation is optional
#rrggbb RGB color value; the three pairs of hexadecimal digits in the example #5599dd define proportions of red, green and blue on a scale of x00 through xFF, which gives a strong sky blue.
h-unit a unit of horizontal length (allowable units are listed below)
v-unit a unit of vertical length (allowable units are listed below)
css-fontfamily explained in the CSS subsection below, Section 2.1.3.3 CSS-compatible attributes
css-color-name explained in the CSS subsection below, Section 2.1.3.3 CSS-compatible attributes
other italicized words explained in the text for each attribute
form + one or more instances of 'form'
form * zero or more instances of 'form'
f1 f2 ... fn one instance of each form, in sequence, perhaps separated by whitespace
f1 | f2 | ... | fn any one of the specified forms
[ form ] an optional instance of 'form'
( form ) same as form
word in plain text that same word, literally present in the attribute value
quoted symbol that same symbol, literally present in the attribute value (e.g. "+" or '+')

The order of precedence of the syntax notation operators is, from highest to lowest precedence:

• form + or form *
• f1 f2 ... fn (sequence of forms)
• f1 | f2 | ... | fn (alternative forms)

A string can contain arbitrary characters which are specifiable within XML CDATA attribute values. See Chapter 6 Characters, Entities and Fonts for a full discussion of MathML characters. No syntax rule in MathML includes a string as only part of an attribute value; a string can only be the entire value.

 Editorial note: P. Ion It is no longer clear to me why we go to the trouble of formulating repeatedly this distiction between full and substrings. The reason should perhaps be given or the phrasing, which can trouble someone naive like me, removed.

Adjacent keywords and numbers must be separated by whitespace from other parts in the actual attribute values, except for unit identifiers (denoted by `h-unit` or `v-unit` syntax symbols) which immediately follow numbers. Whitespace is not otherwise required, but is permitted between any of the tokens listed above, except (for compatibility with CSS) immediately before unit identifiers, between the '-' signs and digits of negative numbers, or between `#` and "rrggbb" or "rgb".

Numerical attribute values for dimensions that should depend upon the current font can be given in font-related units, or in named absolute units (described in a separate subsection below). Horizontal dimensions are conventionally given in `em` units, and vertical dimensions in `ex` units, by immediately following a number by one of the unit identifiers "em" or "ex". For example, the horizontal spacing around an operator such as "+" is conventionally given in "em"s, though other units can be used. Using font-related units is usually preferable to using absolute units, since it allows renderings to grow or shrink in proportion to the current font size.

For most numerical attributes, only those in a subset of the expressible values are sensible; values outside this subset are not errors, unless otherwise specified, but rather are rounded up or down (at the discretion of the renderer) to the closest value within the allowed subset. The set of allowed values may depend on the renderer, and is not specified by MathML.

If a numerical value within an attribute value syntax description is declared to allow a minus sign ('-'), e.g. `number` or `integer`, it is not a syntax error when one is provided in cases where a negative value is not sensible. Instead, the value should be handled by the processing application as described in the preceding paragraph. An explicit plus sign ('+') is not allowed as part of a numerical value except when it is specifically listed in the syntax (as a quoted '+' or "+"), and its presence can change the meaning of the attribute value (as documented with each attribute which permits it).

 Editorial note: P. Ion The presence or not of an explicit + in attribute values is a palce we should be in accord with HTML's conventions, in particular HTML5's, if at all possible.

The symbols `h-unit`, `v-unit`, `css-fontfamily`, and `css-color-name` are explained in the following subsections.

#### 2.1.3.2 Attributes with units

Some attributes accept horizontal or vertical lengths as numbers followed by a "unit identifier" (often just called a "unit"). The syntax symbols `h-unit` and `v-unit` refer to a unit for horizontal or vertical length, respectively. The possible units and the lengths they refer to are shown in the table below; they are the same for horizontal and vertical lengths, but the syntax symbols are distinguished in attribute syntaxes as a reminder of the direction each is used in.

The unit identifiers and meanings are taken from CSS. However, the syntax of numbers followed by unit identifiers in MathML is not identical to the syntax of length values with units in CSS style sheets, since numbers in CSS cannot end with decimal points, and are allowed to start with '+' signs.

The possible horizontal or vertical units in MathML are:

Unit identifier Unit description
em em (font-relative unit traditionally used for horizontal lengths)
ex ex (font-relative unit traditionally used for vertical lengths)
px pixels, or size of a pixel in the current display
in inches (1 inch = 2.54 centimeters)
cm centimeters
mm millimeters
pt points (1 point = 1/72 inch)
pc picas (1 pica = 12 points)
% percentage of the default value

The typesetting units "em" and "ex" are defined in Appendix D Glossary, and discussed further under "Additional notes" below.

`%` is a "relative unit"; when an attribute value is given as "n%" (for any numerical value "n"), the value being specified is the default value for the property being controlled multiplied by "n" divided by 100. The default value (or the way in which it is obtained, when it is not constant) is listed in the table of attributes for each element, and its meaning is described in the subsequent documentation about that attribute. (The `mpadded` element has its own syntax for `%` and does not allow it as a unit identifier.)

For consistency with lengths in CSS, length units in MathML are rarely optional. When they are, the unit symbol is enclosed in square brackets in the attribute syntax, following the number to which it applies, e.g. `number [ h-unit ]`. The meaning of specifying no unit is given in the description for each attribute; in general it is that the number given is a multiplier for the default value of the attribute. (In such cases, specifying the number "nnn" without a unit is equivalent to specifying the number "nnn" times 100 followed by `%`. For example, `<mo maxsize="2"> ( </mo>` is equivalent to ```<mo maxsize="200%"> ( </mo>```.)

As a special exception (also consistent with CSS), a numerical value equal to 0 need not be followed by a unit identifier even if the syntax specified here requires one. In such cases, the unit identifier (or lack of one) would not matter, since 0 times any unit is 0.

For most attributes, the typical unit which would be used to describe them in typesetting is chosen as the one used in that attribute's default value in this specification; when a specific default value is not given, the typical unit is usually mentioned in the syntax table or in the documentation for that attribute. The most common units are `em` or `ex`. However, any unit can be used, unless otherwise specified for a specific attribute.

Note that some attributes, e.g. `framespacing` on a `<mtable>`, can contain more than one numerical value, each followed by its own unit.

It is conventional to use the font-relative unit `ex` mainly for vertical lengths, and `em` mainly for horizontal lengths, but this is not required. These units are relative to the font and font size which would be used for rendering the element in whose attribute value they are specified, which means they should be interpreted after attributes such as `fontfamily` and `fontsize` are processed, if those occur on the same element, since changing the current font or font size can change the length of one of these units.

The definition of the length of each unit, but not the MathML syntax for length values, is as specified in CSS, except that if a font provides specific values for `em` and `ex` which differ from the values defined by CSS (the font size and "x"-height respectively), those values should be used.

#### 2.1.3.3 CSS-compatible attributes

Several MathML attributes, listed below, correspond closely to text rendering properties defined originally in [CSS1]. In MathML 1.01, the names and values of these attributes were aligned with the CSS Recommendation where possible. This was done so that renderers in CSS environments could query the environment for the corresponding property when determining the default values for the attributes.

Allowing style properties to be set both via MathML attributes and CSS style sheets has drawbacks. At a minimum, duplication is confusing, and at worst, it leads to the meaning of equations being inadvertently changed by document-wide CSS changes. For these reasons, these attributes have been deprecated. In their place, MathML 2.0 introduced four new mathematical style attributes. These attributes use logical values to better capture the abstract categories of letter-like symbols used in math, and afford a much cleaner separation between MathML and CSS. See Section 3.2.2 Mathematics style attributes common to token elements for more details.

For reference, a table showing the correspondence of the deprecated MathML 1.01 style attributes with their CSS counterparts is given below:

MathML attribute CSS property syntax symbol MathML elements refer to
fontsize font-size - presentation tokens; `mstyle` Section 3.2.2 Mathematics style attributes common to token elements
fontweight font-weight - presentation tokens; `mstyle` Section 3.2.2 Mathematics style attributes common to token elements
fontstyle font-style - presentation tokens; `mstyle` Section 3.2.2 Mathematics style attributes common to token elements
fontfamily font-family css-fontfamily presentation tokens; `mstyle` Section 3.2.2 Mathematics style attributes common to token elements
color color css-color-name presentation tokens; `mstyle` Section 3.3.4 Style Change (mstyle)
background background css-color-name `mstyle` Section 3.3.4 Style Change (mstyle)

See also Section 2.1.4 Attributes Shared by all MathML Elements below for a discussion of the `class`, `style` and `xml:id` attributes for use with style sheets.

##### 2.1.3.3.1 Order of processing attributes versus style sheets

CSS or analogous style sheets can specify changes to rendering properties of selected MathML elements. Since rendering properties can also be changed by attributes on an element, or be changed automatically by the renderer, it is necessary to specify the order in which changes requested by various sources should occur. An example of automatic adjustment is what happens for `fontsize`, as explained in the discussion on `scriptlevel` in Section 3.3.4 Style Change (mstyle). In the case of "absolute" changes, i.e., setting a new property value independent of the old value (as opposed to "relative" changes, such as increments or multiplications by a factor), the absolute change performed last will be the only absolute change which is effective, so the sources of changes which should have the highest priority must be processed last.

In the case of CSS, the order of processing of changes from various sources which affect one MathML element's rendering properties should be as follows:

(first changes; lowest priority)

• Automatic changes to properties or attributes based on the type of the parent element, and this element's position in the parent, as for the changes to `fontsize` in relation to `scriptlevel` mentioned above; such changes will usually be implemented by the parent element itself before it passes a set of rendering properties to this element

• From a style sheet from the reader: styles which are not declared "important"

• Explicit attribute settings on this MathML element

• From a style sheet from the author: styles which are not declared "important"

• From a style sheet from the author: styles which are declared "important"

• From a style sheet from the reader: styles which are declared "important"

(last changes; highest priority)

Note that the order of the changes derived from CSS style sheets is specified by CSS itself (this is the order specified by CSS2). The following rationale is related only to the issue of where in this pre-existing order the changes caused by explicit MathML attribute settings should be inserted.

Rationale: MathML rendering attributes are analogous to HTML rendering attributes such as `align`, which the CSS section on cascading order specifies should be processed with the same priority. Furthermore, this choice of priority permits readers, by declaring certain CSS styles as "important", to decide which of their style preferences should override explicit attribute settings in MathML. Since MathML expressions, whether composed of "presentation" or "content" elements, are primarily intended to convey meaning, with their "graphic design" (if any) intended mainly to aid in that purpose but not to be essential in it, it is likely that readers will often want their own style preferences to have priority; the main exception will be when a rendering attribute is intended to alter the meaning conveyed by an expression, which is generally discouraged in the presentation attributes of MathML.

#### 2.1.3.4 Default values of attributes

Default values for MathML attributes are in general given along with the detailed descriptions of specific elements in the text. Default values shown in plain text in the tables of attributes for an element are literal (unless they are obviously explanatory phrases), but when italicized are descriptions of how default values can be computed.

Default values described as inherited are taken from the rendering environment, as described under `mstyle`, or in some cases (described individually) from the values of other attributes of surrounding elements, or from certain parts of those values. The value used will always be one which could have been specified explicitly, had it been known; it will never depend on the content or attributes of the same element, only on its environment. (What it means when used may, however, depend on those attributes or the content.)

Default values described as automatic should be computed by a MathML renderer in a way which will produce a high-quality rendering; how to do this is not usually specified by the MathML specification. The value computed will always be one which could have been specified explicitly, had it been known, but it will usually depend on the element content and possibly on the rendering environment.

Other italicized descriptions of default values which appear in the tables of attributes are explained for each attribute individually.

The single or double quotes which are required around attribute values in an XML start tag are not shown in the tables of attribute value syntax for each element, but are shown around example attribute values in the text.

Note that, in general, there is no value which can be given explicitly for a MathML attribute which will simulate the effect of not specifying the attribute at all for attributes which are inherited or automatic. Giving the words "inherited" or "automatic" explicitly will not work, and is not generally allowed. Furthermore, even for presentation attributes for which a specific default value is documented here, the `mstyle` element (Section 3.3.4 Style Change (mstyle)) can be used to change this for the elements it contains. Therefore, the MathML DTD declares most presentation attribute default values as #IMPLIED, which prevents XML preprocessors from adding them with any specific default value. This point of view is carried through to the MathML schema.

#### 2.1.3.5 Attribute values in the MathML DTD

In an XML DTD, allowed attribute values can be declared as general strings, or they can be constrained in various ways, either by enumerating the possible values, or by declaring them to be certain special data types. The choice of an XML attribute type affects the extent to which validity checks can be performed using a DTD.

The MathML DTD specifies formal XML attribute types for all MathML attributes, including enumerations of legitimate values in some cases. In general, however, the MathML DTD is relatively permissive, frequently declaring attribute values as strings; this is done to provide for interoperability with SGML parsers while allowing multiple attributes on one MathML element to accept the same values (such as "true" and "false"), and also to allow extension to the lists of predefined values.

At the same time, even though an attribute value may be declared as a string in the DTD, only certain values are legitimate in MathML, as described above and in the rest of this specification. For example, many attributes expect numerical values. In the sections which follow, the allowed attribute values are described for each element. To determine when these constraints are actually enforced in the MathML DTD, consult Appendix A Parsing MathML. However, lack of enforcement of a requirement in the DTD does not imply that the requirement is not part of the MathML language itself, or that it will not be enforced by a particular MathML renderer. (See Section 2.3.2 Handling of Errors for a description of how MathML renderers should respond to MathML errors.)

Furthermore, the MathML DTD is provided for convenience; although it is intended to be fully compatible with the text of the specification, the text should be taken as definitive if there is a contradiction. (Any contradictions which may exist between various chapters of the text should be resolved by favoring Chapter 6 Characters, Entities and Fonts first, then Chapter 3 Presentation Markup, Chapter 4 Content Markup, then Section 2.1 MathML Syntax and Grammar, and then other parts of the text.) For the MathML schema the situation will be the same: the published Recommendation text takes precedence. Though this is what is intended to happen, there is a practical difficulty. If the system processing the MathML uses a validating parser, whether it be based on a DTD or on a schema, the process will probably simply stop when it hits something held to be incorrect syntax, whether or not further MathML processing in full harmony with the specification would have processed the piece correctly.

### 2.1.4 Attributes Shared by all MathML Elements

In order to facilitate use with style sheet mechanisms such as [XSLT] and [CSS2] all MathML elements accept `class`, `style`, and `xml:id` attributes in addition to the attributes described specifically for each element. MathML renderers not supporting CSS may ignore these attributes. MathML specifies these attribute values as general strings, even if style sheet mechanisms have more restrictive syntaxes for them. That is, any value for them is valid in MathML.

In order to facilitate compatibility with linking mechanisms, all MathML elements accept the `xlink:href` attribute.

All MathML elements also accept the `xref` attribute for use in parallel markup (Section 5.4 Parallel Markup). The `xml:id` is also used in this context.

Every MathML element, because of a legacy from MathML 1.0, also accepts the deprecated attribute `other` (Section 2.3.3 Attributes for unspecified data) which was conceived for passing non-standard attributes without violating the MathML DTD. MathML renderers are only required to process this attribute if they respond to any attributes which are not standard in MathML. However, the use of `other` is strongly discouraged when there are already other ways within MathML of passing specific information.

See also Section 3.2.2 Mathematics style attributes common to token elements for a list of MathML attributes which can be used on most presentation token elements.

### 2.1.5 Collapsing Whitespace in Input

In MathML, as in XML, "whitespace" means simple spaces, tabs, newlines, or carriage returns, i.e., characters with hexadecimal Unicode codes U+0020, U+0009, U+000A, or U+000D, respectively.

MathML ignores whitespace occurring outside token elements. Non-whitespace characters are not allowed there. Whitespace occurring within the content of token elements is "trimmed" from the ends, i.e., all whitespace at the beginning and end of the content is removed. Whitespace internal to content of MathML elements is "collapsed" canonically, i.e., each sequence of 1 or more whitespace characters is replaced with one space character (U+0020, sometimes called a blank character).

For example, `<mo> ( </mo>` is equivalent to `<mo>(</mo>`, and

 ``` Theorem 1: ``` $\text{Theorem 1:}$

is equivalent to `<mtext>Theorem 1:</mtext>`.

Authors wishing to encode whitespace characters at the start or end of the content of a token, or in sequences other than a single space, without having them ignored, must use `&nbsp;` or other "whitespace" non-marking entities as described in Section 6.6 Non-Marking Characters. For example, compare

 ``` Theorem 1: ``` $\text{Theorem 1:}$

with

 ```  Theorem  1: ```

When the first example is rendered, there is no whitespace before "Theorem", one space between "Theorem" and "1:", and no whitespace after "1:". In the second example, a single space is rendered before "Theorem", two spaces are rendered before "1:", and there is no whitespace after the "1:".

Note that the `xml:space` attribute does not apply in this situation since XML processors pass whitespace in tokens to a MathML processor; it is the MathML processing rules which specify that whitespace is trimmed and collapsed.

For whitespace occurring outside the content of the token elements `mi`, `mn`, `mo`, `ms`, `mtext`, `ci`, `cn` and `annotation`, an `mspace` element should be used, as opposed to an `mtext` element containing only "whitespace" entities.

## 2.2 Interfacing MathML with other contexts

Issue update_interface wiki (member only) The current section needs continuing and updating further in later drafts. None recorded

To be effective, MathML must work well with a wide variety of renderers, processors, translators and editors. This section raises some of the interface issues involved in generating and rendering MathML. Since MathML exists primarily to encode mathematics in Web documents, perhaps the most important interface issues are related to embedding MathML in [HTML4] and [XHTML], and in any newer HTML when it appears.

There are three kinds of interface issues that arise in embedding MathML in other XML documents. First, MathML must be semantically integrated. MathML markup must be recognized as valid embedded XML content, and not as an error. This could be seen as primarily a question of managing namespaces in XML [Namespaces]. However, the implementation of XML namespaces and their management has not been well supported by recent commercial software. So there have grown up other ways of dealing with 'foreign content' in an XML document which is viewed as of a particular type. The Compound Document Formats Working Group (CDF WG) of the W3C has grappled with the questions of putting together XML vocabularies and has defined ways to do so for particular combinations of vocabularies. Their initial success has been with specifying profiles for combining XHTML and SVG, with special attention paid to the needs of mobile phone technology. The W3C Math WG continues to work toward defining profiles for full scientific documents involving XHTML for text, MathML for equations and SVG for diagrams and images.

Second, in the case of HTML/XHTML, MathML rendering must be integrated with browser software. Some browsers already implement MathML rendering natively, and one can expect more browsers will do so in the future. At the same time, other browsers have developed infrastructure to facilitate the rendering of MathML and other embedded XML content by third-party software or other built-in technology. Examples of this built-in technology are the sophisticated CSS rendering engines now available, and the powerful implementations of ECMAscript (or JavaScript) that are becoming common. Using these browser-specific mechanisms generally requires additional interface markup of some sort to activate them. In the case of CSS, there is a special restricted form of MathML3 tailored for use with present-day CSS, up to CSS2.1, which is specified in "A MathML for CSS profile" [MathMLforCSS]. This does not offer the full expressiveness afforded by MathML3 but provides a portable simpler form that can be rendered acceptably on the screen by modern CSS engines.

Third, other tools for generating and processing MathML must be able to communicate. A number of MathML tools have been or are being developed, including editors, translators, computer algebra systems, and other scientific software. However, since MathML expressions tend to be lengthy, and prone to error when entered by hand, special emphasis must be given to ensuring that MathML can be easily generated by user-friendly conversion and authoring tools, and that these tools work together in a dependable, platform and vendor independent way. This specification can do no more than utter the above fairly obvious suggestion at this point.

## 2.3 Conformance

Information is nowadays commonly generated, processed and rendered by software tools. The exponential growth of the Web is fueling the development of advanced systems for automatically searching, categorizing, and interconnecting information. In addition, there are increasing numbers of Web services, some of which offer technically based materials and activities. Thus, although MathML can be written by hand and read by humans, whether machine-aided or just with much concentration, the future of MathML is largely tied to the ability to process it with software tools.

There are many different kinds of MathML processors: editors for authoring MathML expressions, translators for converting to and from other encodings, validators for checking MathML expressions, computation engines that evaluate, manipulate or compare MathML expressions, and rendering engines that produce visual, aural or tactile representations of mathematical notation. What it means to support MathML varies widely between applications. For example, the issues that arise with a validating parser are very different from those for an equation editor.

In this section, guidelines are given for describing different types of MathML support, and for making clear the extent of MathML support in a given application. Developers, users and reviewers are encouraged to use these guidelines in characterizing products. The intention behind these guidelines is to facilitate reuse by and interoperability of MathML applications by accurately setting out their capabilities in quantifiable terms.

The W3C Math Working Group maintains MathML Conformance Guidelines. Consult this document for future updates on conformance activities and resources.

 Editorial note: P. Ion The Conformance Document mentioned above is still that for MathML2 and requires updating.

### 2.3.1 MathML Conformance

A valid MathML expression is an XML construct determined by the MathML Relax_NG Schema together with the additional requirements given in this specification.

 Editorial note: P. Ion The Relax_NG Schema is dominant now, not the DTD or the XML Schema.

We shall use the phrase "a MathML processor" to mean any application that can accept, produce, or "roundtrip" a valid MathML expression. Perhaps the simplest example of an application that might round-trip a MathML expression might be an editor that writes a new file even though no modifications are made.

Three forms of MathML conformance are specified:

1. A MathML-input-conformant processor must accept all valid MathML expressions, and faithfully translate all MathML expressions into application-specific form allowing native application operations to be performed.

2. A MathML-output-conformant processor must generate valid MathML, faithfully representing all application-specific data.

3. A MathML-roundtrip-conformant processor must preserve MathML equivalence. Two MathML expressions are "equivalent" if and only if both expressions have the same interpretation (as stated by the MathML Schema and specification) under any circumstances, by any MathML processor. Equivalence on an element-by-element basis is discussed elsewhere in this document.

Beyond the above definitions, the MathML specification makes no demands of individual processors. In order to guide developers, the MathML specification includes advisory material; for example, there are many suggested rendering rules throughout Chapter 3 Presentation Markup. However, in general, developers are given wide latitude in interpreting what kind of MathML implementation is meaningful for their own particular application.

To clarify the difference between conformance and interpretation of what is meaningful, consider some examples:

1. In order to be MathML-input-conformant, a validating parser needs only to accept expressions, and return "true" for expressions that are valid MathML. In particular, it need not render or interpret the MathML expressions at all.

2. A MathML computer-algebra interface based on content markup might choose to ignore all presentation markup. Provided the interface accepts all valid MathML expressions including those containing presentation markup, it would be technically correct to characterize the application as MathML-input-conformant.

3. An equation editor might have an internal data representation that makes it easy to export some equations as MathML but not others. If the editor exports the simple equations as valid MathML, and merely displays an error message to the effect that conversion failed for the others, it is still technically MathML-output-conformant.

#### 2.3.1.1 MathML Test Suite and Validator

As the previous examples show, to be useful, the concept of MathML conformance frequently involves a judgment about what parts of the language are meaningfully implemented, as opposed to parts that are merely processed in a technically correct way with respect to the definitions of conformance. This requires some mechanism for giving a quantitative statement about which parts of MathML are meaningfully implemented by a given application. To this end, the W3C Math Working Group has provided a test suite.

The test suite consists of a large number of MathML expressions categorized by markup category and dominant MathML element being tested. The existence of this test suite makes it possible, for example, to characterize quantitatively the hypothetical computer algebra interface mentioned above by saying that it is a MathML-input-conformant processor which meaningfully implements MathML content markup, including all of the expressions in the content markup section of the test suite.

Developers who choose not to implement parts of the MathML specification in a meaningful way are encouraged to itemize the parts they leave out by referring to specific categories in the test suite.

For MathML-output-conformant processors, there is also a MathML validator accessible over the Web. Developers of MathML-output-conformant processors are encouraged to verify their output using this validator.

Customers of MathML applications who wish to verify claims as to which parts of the MathML specification are implemented by an application are encouraged to use the test suites as a part of their decision processes.

#### 2.3.1.2 Deprecated MathML 1.x and MathML 2.x Features

MathML 2.0 contains a number of features of earlier MathML which are now deprecated. The following points define what it means for a feature to be deprecated, and clarify the relation between deprecated features and current MathML conformance.

1. In order to be MathML-output-conformant, authoring tools may not generate MathML markup containing deprecated features.

2. In order to be MathML-input-conformant, rendering/reading tools must support deprecated features if they are to be in conformance with MathML 1.x or MathML 2.x. They do not have to support deprecated features to be considered in conformance with MathML 3.0. However, all tools are encouraged to support the old forms as much as possible.

3. In order to be MathML-roundtrip-conformant, a processor need only preserve MathML equivalence on expressions containing no deprecated features.

#### 2.3.1.3 MathML Extension Mechanisms and Conformance

MathML 2.0 defined three basic extension mechanisms: The `mglyph` element provides a way of displaying glyphs for non-Unicode characters, and glyph variants for existing Unicode characters; the `maction` element uses attributes from other namespaces to obtain implementation-specific parameters; and content markup makes use of the `definitionURL` attribute to point to external definitions of mathematical semantics.

These extension mechanisms are important because they provide a way of encoding concepts that are beyond the scope of MathML, which allows MathML to be used for exploring new ideas not yet susceptible to standardization. However, as new ideas take hold, they may become part of future standards. For example, an emerging character that must be represented by an `mglyph` element today may be assigned a Unicode codepoint in the future. At that time, representing the character directly by its Unicode codepoint would be preferable. This transition into Unicode already taken place for hundreds of characters used for mathematics.

Because the possibility of future obsolescence is inherent in the use of extension mechanisms to facilitate the discussion of new ideas, MathML can reasonably make no conformance requirements concerning the use of extension mechanisms, even when alternative standard markup is available. For example, using an `mglyph` element to represent an 'x' is permitted. However, authors and implementors are strongly encouraged to use standard markup whenever possible. Similarly, maintainers of documents employing MathML 3.0 extension mechanisms are encouraged to monitor relevant standards activity (e.g. Unicode, OpenMath, etc) and update documents as more standardized markup becomes available.

### 2.3.2 Handling of Errors

If a MathML-input-conformant application receives input containing one or more elements with an illegal number or type of attributes or child schemata, it should nonetheless attempt to render all the input in an intelligible way, i.e. to render normally those parts of the input that were valid, and to render error messages (rendered as if enclosed in an `merror` element) in place of invalid expressions.

MathML-output-conformant applications such as editors and translators may choose to generate `merror` expressions to signal errors in their input. This is usually preferable to generating valid, but possibly erroneous, MathML.

### 2.3.3 Attributes for unspecified data

The MathML attributes described in the MathML specification are necessary for presentation and content markup. Ideally, the MathML attributes should be an open-ended list so that users can add specific attributes for specific renderers. However, this cannot be done within the confines of a single XML DTD or in a Schema. Although it can be done using extensions of the standard DTD, say, some authors will wish to use non-standard attributes to take advantage of renderer-specific capabilities while remaining strictly in conformance with the standard DTD.

To allow this, the MathML 1.0 specification [MathML1] allowed the attribute `other` on all elements, for use as a hook to pass on renderer-specific information. In particular, it was intended as a hook for passing information to audio renderers, computer algebra systems, and for pattern matching in future macro/extension mechanisms. The motivation for this approach to the problem was historical, looking to PostScript, for example, where comments are widely used to pass information that is not part of PostScript.

In the next period of evolution of MathML the development of a general XML namespace mechanism seemed to make the use of the `other` attribute obsolete. In MathML 2.0, the `other` attribute is deprecated in favor of the use of namespace prefixes to identify non-MathML attributes. The `other` attribute remains deprecated in MathML 3.0.

For example, in MathML 1.0, it was recommended that if additional information was used in a renderer-specific implementation for the `maction` element (Section 3.6.1 Bind Action to Sub-Expression (maction)), that information should be passed in using the `other` attribute:

```<maction actiontype="highlight" other="color='#ff0000'"> expression </maction>
```

From MathML 2.0 onwards, a `color` attribute from another namespace would be used:

```<body xmlns:my="http://www.example.com/MathML/extensions">
...
<maction actiontype="highlight" my:color="#ff0000"> expression </maction>
...
</body>
```

Note that the intent of allowing non-standard attributes is not to encourage software developers to use this as a loophole for circumventing the core conventions for MathML markup. Authors and applications should use non-standard attributes judiciously.

## 2.4 Future Extensions

If MathML is to remain useful in the future, it is to be expected that MathML will need to be extended and revised in various ways. Some of these extensions can be easily foreseen; for example, as work on behavioral extensions to CSS proceeds, MathML will likely need to be extended as well, or a description of new possible interaction provided.

Similarly, there are several kinds of functionality that are fairly obvious candidates for future MathML extensions. These include macros, style sheets, and perhaps a general facility for "labeled diagrams" and equation numbering. However, there will no doubt be other desirable extensions to MathML that will only emerge as MathML is widely used. For these extensions, the W3C Math Working Group relies on the extensible architecture of XML, and the common sense of the larger Web community.

### 2.4.1 Style Sheets

In the previous version, MathML 2.0, there was discussion of the use of XSLT and macro capabilities. In the interim this sort of extension seems to have become less interesting, so for such concerns one should look there.

#### 2.4.1.2 CSS3

The CSS working group continues to extend and refine the mechanism of cascading style sheets. As that happens what CSS there is to use with MathML changes. In this revision cycle the Math WG has prepared, to accompany MathML 3.0, a special profile to document how one should use best MathML with CSS 2.1 [MathMLforCSS]. This naturally does not cover all the possible deployments of MathML 3.0.

### 2.4.2 XML Extensions to MathML

The elements and attributes specified in the MathML specification are necessary for rendering common mathematical expressions. It is recognized that not all mathematical notation is covered by this set of elements, that new notations are continually invented, and that sub-communities within mathematics often have specialized notations; and furthermore that the explicit extension of a standard is a necessarily slow and conservative process. This implies that the MathML specification can never explicitly cover all the presentational forms used by every sub-community of authors and readers of mathematics, much less encode all mathematical content and its semantics.

In order to facilitate the use of MathML by the widest possible audience, and to enable its smooth evolution to encompass more notational forms and more mathematical content (perhaps eventually covered by explicit extensions to the standard), the set of tags and attributes is open-ended, in the sense described in this section.

#### 2.4.2.1 OpenMath

A very important mechanism for extending the reach of MathML, as will be necessary, beyond what can be reached as specified in this version 3.0 results from the collaboration the Math WG has had with the OpenMath Society. The Content Markup aspect of MathML is now specified using the device of Content Dictionaries as demonstrated in Chapter 4 of this document and specified in Chapter 8. Thus, in addition to the extensibility that is built in with the semantics and annotation elements, there is the possibility open now of defining a new content dictionary in the format just adopted in this specification and by the OpenMath Society.

### 2.4.4 XML Extensions to MathML

MathML is described by an XML DTD, which necessarily limits the elements and attributes to those occurring in the DTD. Renderers desiring to accept non-standard elements or attributes, and authors desiring to include these in documents, should accept or produce documents that conform to an appropriately extended XML DTD that has the standard MathML DTD as a subset.

MathML renderers are allowed, but not required, to accept non-standard elements and attributes, and to render them in any way. If a renderer does not accept some or all non-standard tags, it is encouraged either to handle them as errors as described above for elements with the wrong number of arguments, or to render their arguments as if they were arguments to an `mrow`, in either case rendering all standard parts of the input in the normal way.

## 2.5 Embedding MathML in other Documents

While MathML can be used in isolation as a language for exchanging mathematical expressions between MathML-aware applications, the primary anticipated use of MathML is to encode mathematical expression within larger documents. MathML is ideal for embedding math expressions in other applications of XML.

In particular, the focus here is on the mechanics of embedding MathML in [XHTML]. XHTML is a W3C Recommendation formulating a family of current and future XML-based document types and modules that reproduce, subset, and extend HTML. While [HTML4] is the dominant language of the Web at the time of this writing, one may anticipate a shift from HTML to XHTML. Indeed, XHTML can already be made to render properly in most HTML user agents.

Since MathML and XHTML share a common XML framework, namespaces provide a standard mechanism for embedding MathML in XHTML. While some popular user agents also support inclusion of MathML directly in HTML as "XML data islands," this is a transitional strategy. Consult user agent documentation for specific information on its support for embedding XML in HTML.

### 2.5.1 MathML and Namespaces

Embedding MathML in XML-based documents in general, and XHTML in particular, is a matter of managing namespaces. See the W3C Recommendation "Namespaces in XML" [Namespaces] for full details.

An XML namespace is a collection of names identified by a URI. The URI for the MathML namespace is:

```http://www.w3.org/1998/Math/MathML
```

Using namespaces, embedding a MathML expression in a larger XML document is merely a matter of identifying the MathML markup as residing in the MathML namespace. This can be accomplished by either explicitly identifying each MathML element name by attaching a namespace prefix, or by declaring a default namespace on an enclosing element.

To declare a namespace, one uses an `xmlns` attribute, or an attribute with an `xmlns` prefix. When the `xmlns` attribute is used alone, it sets the default namespace for the element on which it appears, and for any children elements.

Example:

```<math xmlns="http://www.w3.org/1998/Math/MathML">
<mrow>...</mrow>
[/itex]
```

When the `xmlns` attribute is used as a prefix, it declares a prefix which can then be used to explicitly associate other elements and attributes with a particular namespace.

Example:

```<body xmlns:m="http://www.w3.org/1998/Math/MathML">
...
<m:math><m:mrow>...</m:mrow></m:math>
...
</body>
```

These two methods of namespace declaration can be used together. For example, by using both an explicit document-wide namespace prefix, and default namespace declarations on individual mathematical elements, it is possible to localize namespace related markup to the top-level `math` element.

Example:

```<body xmlns:m="http://www.w3.org/1998/Math/MathML">
...
<m:math xmlns="http://www.w3.org/1998/Math/MathML">
<mrow>...<mrow>
</m:math>
...
</body>
```

#### 2.5.1.1 Document Validation Issues

The use of namespace prefixes creates an issue for DTD validation of documents embedding MathML. DTD validation requires knowing the literal (possibly prefixed) element names used in the document. However, the Namespaces in XML Recommendation [Namespaces] allows the prefix to be changed at arbitrary points in the document, since namespace prefixes may be declared on any element.

The 'historical' method of bridging this gap was to write a DTD with a fixed prefix, or in the case of XHTML and MathML, with no prefix, and mandate that the specified form must be used throughout the document. However, this is somewhat restricting for a modular DTD that is intended for use in conjunction with another DTD, which is exactly the situation with MathML in XHTML. In essence, the MathML DTD would have to allocate a prefix for itself and hope no other module uses the same prefix to avoid name clashes, thus losing one of the main benefits of XML namespaces.

One strategy for addressing this problem is to make every element name in the DTD be accessed by an entity reference. This means that by declaring a couple of entities to specify the prefix before the DTD is loaded, the prefix can be chosen by a document author, and compound DTDs that include several modules can, without changing the module DTDs, specify unique prefixes for each module to avoid clashes. The MathML DTD has been designed in this fashion. See Section A.3 Using the MathML DTD and [Modularization] for details.

An extra issue arises in the case where explicit prefixes are used on the top-level `math` element, but a default namespace is used for other MathML elements. In this case, one wants the MathML module to be included into XHTML with the prefix set to empty. However, the 'driver' DTD file that sets up the inclusion of the MathML module would then need to define a new element called m:math. This would allow the top-level `math` element to use an explicit prefix, for attaching rendering behaviors in current browsers, while the contents would not need an explicit prefix, for ease of interoperability between authoring tools, etc.

#### 2.5.1.2 Compatibility Suggestions

While the use of namespaces to embed MathML in other XML applications is completely described by the relevant W3C Recommendations, a certain degree of pragmatism is still called for at present. Support for XML, namespaces and rendering behaviors in popular user agents is not always fully in alignment with W3C Recommendations. In some cases, the software predates the relevant standards, and in other cases, the relevant standards are not yet complete.

During the transitional period, in which some software may not be fully namespace-aware, a few conventional practices will ease compatibility problems:

1. When using namespace prefixes with MathML markup, use m: as a conventional prefix for the MathML namespace. Using an explicit prefix is probably safer for compatibility in current user agents.

2. When using namespace prefixes, pick one and use it consistently within a document.

3. Explicitly declare the MathML namespace on all `math` elements.

Examples.

```<body>
...
<m:math xmlns:m="http://www.w3.org/1998/Math/MathML">
<m:mrow>...<m:mrow>
</m:math>
...
</body>
```

Or

```<body>
...
<math xmlns="http://www.w3.org/1998/Math/MathML">
<mrow>...<mrow>
[/itex]
...
</body>
```

Note that these suggestions alone may not be sufficient for creating functional Web pages containing MathML markup. It will generally be the case that some additional document-wide markup will be required. Additional work may also be required to make all MathML instances in a document compatible with document-wide declarations. This is particularly true when documents are created by cutting and pasting MathML expressions, since current tools will probably not be able to query global namespace information.

Consult the W3C Math Working Group home page for compatibility and implementation suggestions for current browsers and other MathML-aware tools.

### 2.5.2 The Top-Level `math` Element

MathML specifies a single top-level or root `math` element, which encapsulates each instance of MathML markup within a document. All other MathML content must be contained in a `math` element; equivalently, every valid, complete MathML expression must be contained in `[itex]` tags. The `math` element must always be the outermost element in a MathML expression; it is an error for one `math` element to contain another.

Applications that return sub-expressions of other MathML expressions, for example, as the result of a cut-and-paste operation, should always wrap them in `[itex]` tags. Ideally, the presence of enclosing `[itex]` tags should be a very good heuristic test for MathML material. Similarly, applications which insert MathML expressions in other MathML expressions must take care to remove the `[itex]` tags from the inner expressions.

The `math` element can contain an arbitrary number of children schemata. The children schemata render by default as if they were contained in an `mrow` element.

The attributes of the `math` element are:

class, id, style

Provided for use with stylesheets.

xref

Provided along with `xml:id` for use in parallel markup (Section 5.4 Parallel Markup)

macros

This attribute provides a way of pointing to external macro definition files. Macros are not part of the MathML specification, and much of the functionality provided by macros in MathML can be accommodated by XSL transformations [XSLT]. However, the `macros` attribute is provided to make possible future development of more streamlined, MathML-specific macro mechanisms. The value of this attribute is a sequence of URLs or URIs, separated by whitespace

mode

The `mode` attribute specifies whether the enclosed MathML expression should be rendered in a display style or an in-line style. Allowed values are "display" and "inline" (default). This attribute is deprecated in favor of the new `display` attribute, or the CSS2 'display' property with the analogous block and inline values.

display

The `display` attribute replaces the deprecated `mode` attribute. It specifies whether the enclosed MathML expression should be rendered in a display style or an in-line style. Allowed values are "block" and "inline" (default).

dir

The `dir` attribute specifies the overall directionality of layout. Allowed values are "ltr"(default) or "rtl". This attribute, in addition to the directionality of the text content of token elements, is used for presentation of mathematics in Right-to-Left scripts. See Section 3.1.5 Directionality for further discussion.

The attributes of the `math` element affect the entire enclosed expression. They are, in a sense, "inward looking". However, to render MathML properly in a browser, and to integrate it properly into an XHTML document, a second collection of "outward looking" attributes are also useful.

While general mechanisms for attaching rendering behaviors to elements in XML documents are under development, wide variations in strategy and level of implementation remain between various existing user agents. Consequently, the remainder of this section describes attributes and functionality that are desirable for integrating third-party rendering modules with user agents:

linebreak (default)

The expression will be broken across several lines. The line breaking algorithm is not specified, although one is suggested. All automatic linebreaking algorithms should make use of the attributes and values that are related to linebreaking and indentation following a linebreak.

maxwidth

This attribute specifies the maximum width to be used for linebreaking. The value of attribute is an h-unit. If a percentage is used, it is a percentage of maximum width available in the surrounding environment. If that value can not be determined, the renderer should assume an infinite rendering width.

overflow

In cases where size negotiation is not possible or fails (for example in the case of an expression that is too long to fit in the allowed width), this attribute is provided to suggest a processing method to the renderer. Allowed values are:

linebreak

(Default) The expression will be broken across several lines. The line breaking algorithm is not specified, but it is recommended that line breaking should try to keep meaningful subexpressions together and indent lines in a manner that aids in understanding the expression.

scroll

The window provides a viewport into the larger complete display of the mathematical expression. Horizontal or vertical scrollbars are added to the window as necessary to allow the viewport to be moved to a different position.

elide

The display is abbreviated by removing enough of it so that the remainder fits into the window. For example, a large polynomial might have the first and last terms displayed with "+ ... +" between them. Advanced renderers may provide a facility to zoom in on elided areas.

truncate

The display is abbreviated by simply truncating it at the right and bottom borders. It is recommended that some indication of truncation is made to the viewer.

scale

The fonts used to display the mathematical expression are chosen so that the full expression fits in the window. Note that this only happens if the expression is too large. In the case of a window larger than necessary, the expression is shown at its normal size within the larger window.

altimg

This attribute provides a graceful fall-back for browsers that do not support embedded elements. The value of the attribute is an URL.

alttext

This attribute provides a graceful fall-back for browsers that do not support embedded elements or images. The value of the attribute is a text string.

altimg-width

This attribute provides a width for the `altimg` (if any). The value of attribute is an `h-unit`. This value is useful for high resolution images which, if displayed at their full resolution, would be too large. If neither `altimg-width` nor `altimg-height` is given, then for those renderers that use an image, they should use the image's natural size. If only the width is given, the renderer should scale the height so as to preserve the aspect ration of the image.

altimg-height

This attribute provides a total height for the `altimg` (if any). The value of attribute is a `v-unit`. This value is useful for high resolution images which, if displayed at their full resolution, would be too large. If neither `altimg-width` nor `altimg-height` is given, then for those renderers that use an image, they should use the image's natural size. If only the width is given, the renderer should scale the width so as to preserve the aspect ration of the image.

altimg-valign

By default, the bottom of the image aligns to the current baseline. The `valign` attribute specifies the alignment point within the image. The value of attribute is a `v-unit`. A positive value of `valign` shifts the bottom of the image below the current baseline, while a negative value will raise it above the baseline.

Issue linebreak-control wiki (member only) Should there be a way to specify some sort of control over how line breaks are chosen (e.g., before or after an infix operator, or if the infix operator is duplicated)? None recorded
Issue indent-control wiki (member only) Should there be a way to specify some sort of indenting style? None recorded
Overview: Mathematical Markup Language (MathML) Version 3.0
Previous: 1 Introduction
Next: 3 Presentation Markup