2 MathML Fundamentals

Overview: Mathematical Markup Language (MathML) Version 3.0
Previous: 1 Introduction
Next: 3 Presentation Markup

2 MathML Fundamentals
    2.1 MathML Syntax and Grammar
        2.1.1 General Considerations
        2.1.2 MathML and Namespaces
        2.1.3 Children versus Arguments
        2.1.4 MathML and Rendering
        2.1.5 MathML Attribute Values
            2.1.5.1 Syntax notation used in the MathML specification
            2.1.5.2 Length Valued Attributes
            2.1.5.3 Color Valued Attributes
            2.1.5.4 Default values of attributes
        2.1.6 Attributes Shared by all MathML Elements
        2.1.7 Collapsing Whitespace in Input
    2.2 The Top-Level math Element
        2.2.1 Attributes
        2.2.2 Deprecated Attributes
    2.3 Conformance
        2.3.1 MathML Conformance
            2.3.1.1 MathML Test Suite and Validator
            2.3.1.2 Deprecated MathML 1.x and MathML 2.x Features
            2.3.1.3 MathML Extension Mechanisms and Conformance
        2.3.2 Handling of Errors
        2.3.3 Attributes for unspecified data

2.1 MathML Syntax and Grammar

2.1.1 General Considerations

MathML is an application of [XML], Extensible Markup Language, and as such it is governed by the rules of XML syntax. XML syntax is a notation for rooted labeled planar trees. Planarity means that the children of a node may be viewed as given a natural order and MathML depends on this order.

The basic ‘syntax’ of MathML is thus defined by XML. Upon this, we layer a ‘grammar’, being the rules for allowed elements, the order in which they can appear, and how they may be contained within each other, as well as additional syntactic rules for the values of attributes. These rules are defined by this specification, and formalized by a RelaxNG schema [RELAX-NG]. The RelaxNG Schema is normative, but a DTD (Document Type Definition) and an XML Schema [XMLSchemas] are provided for continuity (they were normative for MathML2). See Appendix A Parsing MathML.

As an XML vocabulary, MathML's character set must consist of legal characters as specified by Unicode [Unicode]. The use of Unicode characters for mathematics is discussed in Chapter 7 Characters, Entities and Fonts.

The following sections discuss the general aspects of the MathML grammar as well as describe the syntaxes used for attribute values.

2.1.2 MathML and Namespaces

An XML namespace [Namespaces] is a collection of names identified by a URI. The URI for the MathML namespace is:

http://www.w3.org/1998/Math/MathML

To declare a namespace, one uses an xmlns attribute, or an attribute with an xmlns prefix. When the xmlns attribute is used alone, it sets the default namespace for the element on which it appears, and for any child elements. For example:

<math xmlns="http://www.w3.org/1998/Math/MathML">
<mrow>...</mrow>
</math>

When the xmlns attribute is used as a prefix, it declares a prefix which can then be used to explicitly associate other elements and attributes with a particular namespace. When embedding MathML within XHTML, one might use:

<body xmlns:m="http://www.w3.org/1998/Math/MathML">
...
<m:math><m:mrow>...</m:mrow></m:math>
...
</body>

2.1.3 Children versus Arguments

Most MathML elements act as ‘containers’; such an element's children are not distinguished from each other except as individual members of the list of children. Commonly there is no limit imposed on the number of children an element may have. This is the case for most presentation elements and some content elements such as set. But many MathML elements require a specific number of children, or attach a particular meaning to children in certain positions. Such elements are best considered to represent constructors of mathematical objects, and hence thought of as functions of their children. Therefore children of such a MathML element will often be referred to as its arguments instead of merely as children. Examples of this can be found, say, in Section 3.1.3 Required Arguments.

There are presentation elements that conceptually accept only a single argument, but which for convenience have been written to accept any number of children; then we infer an mrow containing those children which acts as the argument to the element in question; see Section 3.1.3.1 Inferred <mrow>s.

In the detailed discussions of element syntax given with each element throughout the MathML specification, the correspondence of children with arguments, the number of arguments required and their order, as well as other constraints on the content, are specified. This information is also tabulated for the presentation elements in Section 3.1.3 Required Arguments.

2.1.4 MathML and Rendering

MathML presentation elements only recommend (i.e., do not require) specific ways of rendering; this is in order to allow for medium-dependent rendering and for individual preferences of style.

Nevertheless, some parts of this specification describe these recommended visual rendering rules in detail; in those descriptions it is often assumed that the model of rendering used supports the concepts of a well-defined 'current rendering environment' which, in particular, specifies a 'current font', a 'current display' (for pixel size) and a 'current baseline'. The 'current font' provides certain metric properties and an encoding of glyphs.

2.1.5 MathML Attribute Values

MathML elements take attributes with values that further specialize the meaning or effect of the element. Attribute names are shown in a monospaced font throughout this document. The meanings of attributes and their allowed values are described within the specification of each element. The syntax notation explained in this section is used in specifying allowed values.

Except when explicitly forbidden by the specification for an attribute, MathML attribute values may contain any legal characters specified by the XML recommendation. See Chapter 7 Characters, Entities and Fonts for further clarification.

2.1.5.1 Syntax notation used in the MathML specification

To describe the MathML-specific syntax of attribute values, the following conventions and notations are used for most attributes in the present document. We use below the notation beginning with U+ that is recommended by Unicode for referring to Unicode characters [see [Unicode], page xxviii].

Notation What it matches
decimal-digit a decimal digit from the range U+0030 to U+0039
hexadecimal-digit a hexadecimal (base 16) digit from the ranges U+0030 to U+0039, U+0041 to U+0046 and U+0061 to U+0066
unsigned-integer a string of decimal-digits, representing a non-negative integer
positive-integer a string of decimal-digits, but not consisting solely of "0"s (U+0030), representing a positive integer
integer an optional "-" (U+002D), followed by a string of decimal digits, and representing an integer
unsigned-number a string of decimal digits with up to one decimal point (U+002E), representing a non-negative terminating decimal number (a type of rational number)
number an optional prefix of "-" (U+002D), followed by an unsigned number, representing a terminating decimal number (a type of rational number)
character a single non-whitespace character
string an arbitrary, nonempty and finite, string of characters
length a length, as explained below, Section 2.1.5.2 Length Valued Attributes
unit a unit, typically used as part of a length, as explained below, Section 2.1.5.2 Length Valued Attributes
namedlength a named length, as explained below, Section 2.1.5.2 Length Valued Attributes
color a color, as explained below, Section 2.1.5.3 Color Valued Attributes
id an identifier, unique within the document; must satisfy the NAME syntax of the XML recommendation [XML]
idref an identifier referring to another element within the document; must satisfy the NAME syntax of the XML recommendation [XML]
URI a Uniform Resource Identifier [RFC3986]. Note that the attribute value is typed in the schema as anyURI which allows any sequence of XML characters. Systems needing to use this string as a URI must encode the bytes of the UTF-8 encoding of any characters not allowed in URI using %HH encoding where HH are the byte value in hexadecimal. This ensures that such an attribute value may be interpreted as an IRI, or more generally a LEIRI, see [IRI].
italicized word values as explained in the text for each attribute; see Section 2.1.5.4 Default values of attributes
"literal" quoted symbol, literally present in the attribute value (e.g. "+" or '+')

The ‘types’ described above, except for string, may be combined into composite patterns using the following operators. The whole attribute value must be delimited by single (') or double (") quotation marks in the marked up document. Note that double quotation marks are often used in this specification to mark up literal expressions; an example is the "-" in line 5 of the table above.

In the table below a form f means an instance of a type described in the table above. The combining operators are shown in order of precedence from highest to lowest:

Notation What it matches
( f ) same as f
f? an optional instance of f
f * zero or more instances of f, with separating whitespace characters
f + one or more instances of f, with separating whitespace characters
f1 f2 ... fn one instance of each form fi, in sequence, with no separating whitespace
f1, f2, ..., fn one instance of each form fi, in sequence, with separating whitespace characters (but no commas)
f1 | f2 | ... | fn any one of the specified forms fi

The notation we have chosen here is in the style of the syntactical notation of the RelaxNG used for MathML's basic schema, Appendix A Parsing MathML.

Since some applications are inconsistent about normalization of whitespace, for maximum interoperability it is advisable to use only a single whitespace character for separating parts of a value. Moreover, leading and trailing whitespace in attribute values should be avoided.

For most numerical attributes, only those in a subset of the expressible values are sensible; values outside this subset are not errors, unless otherwise specified, but rather are rounded up or down (at the discretion of the renderer) to the closest value within the allowed subset. The set of allowed values may depend on the renderer, and is not specified by MathML.

If a numerical value within an attribute value syntax description is declared to allow a minus sign ('-'), e.g., number or integer, it is not a syntax error when one is provided in cases where a negative value is not sensible. Instead, the value should be handled by the processing application as described in the preceding paragraph. An explicit plus sign ('+') is not allowed as part of a numerical value except when it is specifically listed in the syntax (as a quoted '+' or "+"), and its presence can change the meaning of the attribute value (as documented with each attribute which permits it).

2.1.5.2 Length Valued Attributes

Most presentation elements have attributes that accept values representing lengths to be used for size, spacing or similar properties. The syntax of a length is specified as

Type Syntax
length number | number unit | namedspace

There should be no space between the number and the unit of a length.

The possible units and namedspaces, along with their interpretations, are shown below. Note that although the units and their meanings are taken from CSS, the syntax of lengths is not identical. A few MathML elements have length attributes that accept additional keywords; these are termed pseudo-units and specified in the description of those particular elements; see, for instance, Section 3.3.6 Adjust Space Around Content <mpadded>.

A trailing "%" represents a percent of the default value. The default value, or how it is obtained, is listed in the table of attributes for each element. (See also Section 2.1.5.4 Default values of attributes.) A number without a unit is intepreted as a multiple of the default value. This form is primarily for backward compatibility and should be avoided, prefering explicit units for clarity.

In some cases, the range of acceptable values for a particular attribute may be restricted; implementations are free to round up or down to the closest allowable value.

The possible units in MathML are:

Unit Description
em an em (font-relative unit traditionally used for horizontal lengths)
ex an ex (font-relative unit traditionally used for vertical lengths)
px pixels, or size of a pixel in the current display
in inches (1 inch = 2.54 centimeters)
cm centimeters
mm millimeters
pt points (1 point = 1/72 inch)
pc picas (1 pica = 12 points)
% percentage of the default value

Some additional aspects of units are discussed further below, in Section 2.1.5.2.1 Additional notes about units.

The following constants, namedspaces, may also be used where a length is needed; they are typically used for spacing or padding between tokens. Recommended default values for these constants are shown; the actual spacing used is implementation specific.

namedspace Recommended default
veryverythinmathspace 1/18em
verythinmathspace 2/18em
thinmathspace 3/18em
mediummathspace 4/18em
thickmathspace 5/18em
verythickmathspace 6/18em
veryverythickmathspace 7/18em
negativeveryverythinmathspace -1/18em
negativeverythinmathspace -2/18em
negativethinmathspace -3/18em
negativemediummathspace -4/18em
negativethickmathspace -5/18em
negativeverythickmathspace -6/18em
negativeveryverythickmathspace -7/18em
2.1.5.2.1 Additional notes about units

Lengths are only used in MathML for presentation, and presentation will ultimately involve rendering in or on some medium. For visual media, the display context is assumed to have certain properties available to the rendering agent. A px corresponds to a pixel on the display, to the extent that is meaningful. The resolution of the display device will affect the correspondence of pixels to the units in, cm, mm, pt and pc.

Moreover, the display context will also provide a default for the font size; the parameters of this font determine the initial values used to interpret the units em and ex, and thus indirectly the sizes of namedspaces. Since these units track the display context, and in particular, the user's preferences for display, the relative units em and ex are generally to be preferred over absolute units such as px or cm.

Two additional aspects of relative units must be clarified, however. First, some elements such as Section 3.4 Script and Limit Schemata or mfrac, implicitly switch to smaller font sizes for some of their arguments. Similarly, mstyle can be used to explicitly change the current font size. In such cases, the effective values of an em or ex inside those contexts will be different than outside. The second point is that the effective value of an em or ex used for an attribute value can be affected by changes to the current font size. Thus, attributes that affect the current font size, such as mathsize and scriptlevel, must be processed before evaluating other length valued attributes.

If, and how, lengths might affect non-visual media is implementation specific.

2.1.5.3 Color Valued Attributes

The color, or background color, of presentation elements may be specified as a color using the following syntax:

Type Syntax
color #RGB | #RRGGBB | html-color-name

A color is specified either by "#" followed by hexadecimal values for the red, green, and blue components, with no intervening whitespace, or by an html-color-name. The color components can be either 1-digit or 2-digit, but must all have the same number of digits; the component ranges from 0 (component not present) to FF (component fully present). Note that, for example, by the digit-doubling rule specified under Colors in [CSS21] #123 is a short form for #112233.

Color values can also be specified as an html-color-name, one of the color-name keywords defined in [HTML4] ("aqua", "black", "blue", "fuchsia", "gray", "green", "lime", "maroon", "navy", "olive", "purple", "red", "silver", "teal", "white", and "yellow"). Note that the color name keywords are not case-sensitive, unlike most keywords in MathML attribute values, for compatibility with CSS and HTML.

When a color is applied to an element, it is the color in which the content of tokens is rendered. Additionally, when inherited from a surrounding element or from the environment in which the complete MathML expression is embedded, it controls the color of all other drawing due to MathML elements, including the lines or radical signs that can be drawn in rendering mfrac, mtable, or msqrt.

When used to specify a background color, the keyword "transparent" is also allowed. The recommended MathML visual rendering rules do not define the precise extent of the region whose background is affected by using the background attribute on an element, except that, when the element's content does not have negative dimensions and its drawing region is not overlapped by other drawing due to surrounding negative spacing, this region should lie behind all the drawing done to render the content of the element, but should not lie behind any of the drawing done to render surrounding expressions. The effect of overlap of drawing regions caused by negative spacing on the extent of the region affected by the background attribute is not defined by these rules.

2.1.5.4 Default values of attributes

Default values for MathML attributes are, in general, given along with the detailed descriptions of specific elements in the text. Default values shown in plain text in the tables of attributes for an element are literal, but when italicized are descriptions of how default values can be computed.

Default values described as inherited are taken from the rendering environment, as described in Section 3.3.4 Style Change <mstyle>, or in some cases (which are described individually) taken from the values of other attributes of surrounding elements, or from certain parts of those values. The value used will always be one which could have been specified explicitly, had it been known; it will never depend on the content or attributes of the same element, only on its environment. (What it means when used may, however, depend on those attributes or the content.)

Default values described as automatic should be computed by a MathML renderer in a way which will produce a high-quality rendering; how to do this is not usually specified by the MathML specification. The value computed will always be one which could have been specified explicitly, had it been known, but it will usually depend on the element content and possibly on the context in which the element is rendered.

Other italicized descriptions of default values which appear in the tables of attributes are explained individually for each attribute.

The single or double quotes which are required around attribute values in an XML start tag are not shown in the tables of attribute value syntax for each element, but are around attribute values in examples in the text, so that the pieces of code shown are correct.

Note that, in general, there is no mechanism in MathML to simulate the effect of not specifying attributes which are inherited or automatic. Giving the words "inherited" or "automatic" explicitly will not work, and is not generally allowed. Furthermore, the mstyle element (Section 3.3.4 Style Change <mstyle>) can even be used to change the default values of presentation attributes for its children.

Note also that these defaults describe the behavior of MathML applications when an attribute is not supplied; they do not indicate a value that will be filled in by an XML parser, as is sometimes mandated by DTD-based specifications.

2.1.6 Attributes Shared by all MathML Elements

In addition to the attributes described specifically for each element, the attributes in the following table are allowed on every MathML element. Also allowed are attributes from the xml namespace, such as xml:lang, and attributes from namespaces other than MathML, which are ignored by default.

Name values default
id id none
Establishes a unique identifier associated with the element to support linking, cross-references and parallel markup. See xref and Section 5.4 Parallel Markup.
xref idref none
References another element within the document. See id and Section 5.4 Parallel Markup.
class string none
Associates the element with a set of style classes for use with [XSLT] and [CSS21]. Typically this would be a space separated sequence of words, but this is not specified by MathML. See Section 6.5 Using CSS with MathML for discussion of the interaction of MathML and CSS.
style string none
Associates style information with the element for use with [XSLT] and [CSS21]. This typically would be an inline CSS style, but this is not specified by MathML. See Section 6.5 Using CSS with MathML for discussion of the interaction of MathML and CSS.
href URI none
Can be used to establish the element as a hyperlink to the specfied URI.

Note that MathML 2 had no direct support for linking, and instead followed the W3C Recommendation "XML Linking Language" [XLink] in defining links using the xlink:href attribute. This has changed, and MathML 3 now uses an href attribute. However, particular compound document formats may specify the use of XML linking with MathML elements, so user agents that support XML linking should continue to support the use of the xlink:href attribute with MathML 3 as well.

See also Section 3.2.2 Mathematics style attributes common to token elements for a list of MathML attributes which can be used on most presentation token elements.

The attribute other, is deprecated (Section 2.3.3 Attributes for unspecified data) in favor of the use of attributes from other namespaces.

Name values default
other string none
DEPRECATED but in MathML 1.0.

2.1.7 Collapsing Whitespace in Input

In MathML, as in XML, "whitespace" means simple spaces, tabs, newlines, or carriage returns, i.e., characters with hexadecimal Unicode codes U+0020, U+0009, U+000A, or U+000D, respectively; see also the discussion of whitespace in Section 2.3 of [XML].

MathML ignores whitespace occurring outside token elements. Non-whitespace characters are not allowed there. Whitespace occurring within the content of token elements , except for <cs>, is normalized as follows. All whitespace at the beginning and end of the content is removed, and whitespace internal to content of the element is collapsed canonically, i.e., each sequence of 1 or more whitespace characters is replaced with one space character (U+0020, sometimes called a blank character).

For example, <mo> ( </mo> is equivalent to <mo>(</mo>, and

<mtext>
  Theorem
  1:
</mtext>

is equivalent to <mtext>Theorem 1:</mtext> or <mtext>Theorem&#x20;1:</mtext>.

Authors wishing to encode white space characters at the start or end of the content of a token, or in sequences other than a single space, without having them ignored, must use &nbsp; (U+00A0) or other non-marking characters that are not trimmed. For example, compare the above use of an mtext element with

<mtext>
&#xA0;<!--NO-BREAK SPACE-->Theorem &#xA0;<!--NO-BREAK SPACE-->1: 
</mtext> 

When the first example is rendered, there is nothing before "Theorem", one Unicode space character between "Theorem" and "1:", and nothing after "1:". In the second example, a single space character is to be rendered before "Theorem"; two spaces, one a Unicode space character and one a Unicode no-break space character, are to be rendered before "1:"; and there is nothing after the "1:".

Note that the value of the xml:space attribute is not relevant in this situation since XML processors pass whitespace in tokens to a MathML processor; it is the requirements of MathML processing which specify that whitespace is trimmed and collapsed.

For whitespace occurring outside the content of the token elements mi, mn, mo, ms, mtext, ci, cn, cs, csymbol and annotation, an mspace element should be used, as opposed to an mtext element containing only whitespace entities.

2.2 The Top-Level math Element

MathML specifies a single top-level or root math element, which encapsulates each instance of MathML markup within a document. All other MathML content must be contained in a math element; in other words, every valid MathML expression is wrapped in outer <math> tags. The math element must always be the outermost element in a MathML expression; it is an error for one math element to contain another. These considerations also apply when sub-expressions are passed between applications, such as for cut-and-paste operations; See Section 6.3 Transferring MathML.

The math element can contain an arbitrary number of child elements. They render by default as if they were contained in an mrow element.

2.2.1 Attributes

The math element accepts any of the attributes that can be set on Section 3.3.4 Style Change <mstyle>, including the common attributes specified in Section 2.1.6 Attributes Shared by all MathML Elements. In particular, it accepts the dir attribute for setting the overall directionality; the math element is usually the most useful place to specify the directionality (See Section 3.1.5 Directionality for further discussion). Note that the dir attribute defaults to "ltr" on the math element (but inherits on all other elements which accept the dir attribute); this provides for backward compatibility with MathML 2.0 which had no notion of directionality. Also, it accepts the mathbackground attribute in the same sense as mstyle and other presentation elements to set the background color of the bounding box, rather than specifying a default for the attribute (see Section 3.1.10 Mathematics style attributes common to presentation elements)

In addition to those attributes, the math element accepts:

Name values default
display "block" | "inline" inline
specifies whether the enclosed MathML expression should be rendered as a separate vertical block (in display style) or inline, aligned with adjacent text. When display="block", displaystyle is initialized to "true", whereas when display="inline", displaystyle is initialized to "false"; in both cases scriptlevel is initialized to 0 (See Section 3.1.6 Displaystyle and Scriptlevel). Moreover, when the math element is embedded in a larger document, a block math element should be treated as a block element as appropriate for the document type (typically as a new vertical block), whereas an inline math element should be treated as inline (typically exactly as if it were a sequence of words in normal text). In particular, this applies to spacing and linebreaking: for instance, there should not be spaces or line breaks inserted between inline math and any immediately following punctuation. When the display attribute is missing, a rendering agent is free to initialize as appropriate to the context.
maxwidth length available width
specifies the maximum width to be used for linebreaking. The default is the maximum width available in the surrounding environment. If that value cannot be determined, the renderer should assume an infinite rendering width.
overflow "linebreak" | "scroll" | "elide" | "truncate" | "scale" linebreak
specifies the preferred handing in cases where an expression is too long to fit in the allowed width. See the discussion below.
altimg URI none
provides a URI referring to an image to display as a fall-back for user agents that do not support embedded MathML.
altimg-width length width of altimg
specifies the width to display altimg, scaling the image if necessary; See altimg-height.
altimg-height length height of altimg
specifies the height to display altimg, scaling the image if necessary; if only one of the attributes altimg-width and altimg-height are given, the scaling should preserve the image's aspect ratio; if neither attribute is given, the image should be shown at its natural size.
altimg-valign length | "top" | "middle" | "bottom" 0ex
specifies the vertical alignment of the image with respect to adjacent inline material. A positive value of altimg-valign shifts the bottom of the image above the current baseline, while a negative value lowers it. The keyword "top" aligns the top of the image with the top of adjacent inline material; "center" aligns the middle of the image to the middle of adjacent material; "bottom" aligns the bottom of the image to the bottom of adjacent material (not necessarily the baseline). This attribute only has effect when display="inline". By default, the bottom of the image aligns to the baseline.
alttext string none
provides a textual alternative as a fall-back for user agents that do not support embedded MathML or images.
cdgroup URI none
specifies a CD group file that acts as a catalogue of CD bases for locating OpenMath content dictionaries of csymbol, annotation, and annotation-xml elements in this math element; see Section 4.2.3 Content Symbols <csymbol>. When no cdgroup attribute is explicitly specified, the document format embedding this math element may provide a method for determining CD bases. Otherwise the system must determine a CD base; in the absence of specific information http://www.openmath.org/cd is assumed as the CD base for all csymbol, annotation, and annotation-xml elements. This is the CD base for the collection of standard CDs maintained by the OpenMath Society.

In cases where size negotiation is not possible or fails (for example in the case of an expression that is too long to fit in the allowed width), the overflow attribute is provided to suggest a processing method to the renderer. Allowed values are:

Value Meaning
"linebreak" The expression will be broken across several lines. See Section 3.1.7 Linebreaking of Expressions for further discussion.
"scroll" The window provides a viewport into the larger complete display of the mathematical expression. Horizontal or vertical scroll bars are added to the window as necessary to allow the viewport to be moved to a different position.
"elide" The display is abbreviated by removing enough of it so that the remainder fits into the window. For example, a large polynomial might have the first and last terms displayed with "+ ... +" between them. Advanced renderers may provide a facility to zoom in on elided areas.
"truncate" The display is abbreviated by simply truncating it at the right and bottom borders. It is recommended that some indication of truncation is made to the viewer.
"scale" The fonts used to display the mathematical expression are chosen so that the full expression fits in the window. Note that this only happens if the expression is too large. In the case of a window larger than necessary, the expression is shown at its normal size within the larger window.

2.2.2 Deprecated Attributes

The following attributes of math are deprecated:

Name values default
macros URI * none
intended to provide a way of pointing to external macro definition files. Macros are not part of the MathML specification.
mode "display" | "inline" inline
specified whether the enclosed MathML expression should be rendered in a display style or an inline style. This attribute is deprecated in favor of the display attribute.

2.3 Conformance

Information nowadays is commonly generated, processed and rendered by software tools. The exponential growth of the Web is fueling the development of advanced systems for automatically searching, categorizing, and interconnecting information. In addition, there are increasing numbers of Web services, some of which offer technically based materials and activities. Thus, although MathML can be written by hand and read by humans, whether machine-aided or just with much concentration, the future of MathML is largely tied to the ability to process it with software tools.

There are many different kinds of MathML processors: editors for authoring MathML expressions, translators for converting to and from other encodings, validators for checking MathML expressions, computation engines that evaluate, manipulate, or compare MathML expressions, and rendering engines that produce visual, aural, or tactile representations of mathematical notation. What it means to support MathML varies widely between applications. For example, the issues that arise with a validating parser are very different from those for an equation editor.

This section gives guidelines that describe different types of MathML support and make clear the extent of MathML support in a given application. Developers, users, and reviewers are encouraged to use these guidelines in characterizing products. The intention behind these guidelines is to facilitate reuse by and interoperability of MathML applications by accurately setting out their capabilities in quantifiable terms.

The W3C Math Working Group maintains MathML Compliance Guidelines. Consult this document for future updates on conformance activities and resources.

2.3.1 MathML Conformance

A valid MathML expression is an XML construct determined by the MathML RelaxNG Schema together with the additional requirements given in this specification.

We shall use the phrase "a MathML processor" to mean any application that can accept or produce a valid MathML expression. A MathML processor that both accepts and produces valid MathML expressions may be able to "round-trip" MathML. Perhaps the simplest example of an application that might round-trip a MathML expression would be an editor that writes it to a new file without modifications.

Three forms of MathML conformance are specified:

  1. A MathML-input-conformant processor must accept all valid MathML expressions; it should appropriately translate all MathML expressions into application-specific form allowing native application operations to be performed.

  2. A MathML-output-conformant processor must generate valid MathML, appropriately representing all application-specific data.

  3. A MathML-round-trip-conformant processor must preserve MathML equivalence. Two MathML expressions are "equivalent" if and only if both expressions have the same interpretation (as stated by the MathML Schema and specification) under any relevant circumstances, by any MathML processor. Equivalence on an element-by-element basis is discussed elsewhere in this document.

Beyond the above definitions, the MathML specification makes no demands of individual processors. In order to guide developers, the MathML specification includes advisory material; for example, there are many recommended rendering rules throughout Chapter 3 Presentation Markup. However, in general, developers are given wide latitude to interpret what kind of MathML implementation is meaningful for their own particular application.

To clarify the difference between conformance and interpretation of what is meaningful, consider some examples:

  1. In order to be MathML-input-conformant, a validating parser needs only to accept expressions, and return "true" for expressions that are valid MathML. In particular, it need not render or interpret the MathML expressions at all.

  2. A MathML computer-algebra interface based on content markup might choose to ignore all presentation markup. Provided the interface accepts all valid MathML expressions including those containing presentation markup, it would be technically correct to characterize the application as MathML-input-conformant.

  3. An equation editor might have an internal data representation that makes it easy to export some equations as MathML but not others. If the editor exports the simple equations as valid MathML, and merely displays an error message to the effect that conversion failed for the others, it is still technically MathML-output-conformant.

2.3.1.1 MathML Test Suite and Validator

As the previous examples show, to be useful, the concept of MathML conformance frequently involves a judgment about what parts of the language are meaningfully implemented, as opposed to parts that are merely processed in a technically correct way with respect to the definitions of conformance. This requires some mechanism for giving a quantitative statement about which parts of MathML are meaningfully implemented by a given application. To this end, the W3C Math Working Group has provided a test suite.

The test suite consists of a large number of MathML expressions categorized by markup category and dominant MathML element being tested. The existence of this test suite makes it possible, for example, to characterize quantitatively the hypothetical computer algebra interface mentioned above by saying that it is a MathML-input-conformant processor which meaningfully implements MathML content markup, including all of the expressions in the content markup section of the test suite.

Developers who choose not to implement parts of the MathML specification in a meaningful way are encouraged to itemize the parts they leave out by referring to specific categories in the test suite.

For MathML-output-conformant processors, information about currently available tools to validate MathML is maintained at the W3C MathML Validator. Developers of MathML-output-conformant processors are encouraged to verify their output using this validator.

Customers of MathML applications who wish to verify claims as to which parts of the MathML specification are implemented by an application are encouraged to use the test suites as a part of their decision processes.

2.3.1.2 Deprecated MathML 1.x and MathML 2.x Features

MathML 3.0 contains a number of features of earlier MathML which are now deprecated. The following points define what it means for a feature to be deprecated, and clarify the relation between deprecated features and current MathML conformance.

  1. In order to be MathML-output-conformant, authoring tools may not generate MathML markup containing deprecated features.

  2. In order to be MathML-input-conformant, rendering and reading tools must support deprecated features if they are to be in conformance with MathML 1.x or MathML 2.x. They do not have to support deprecated features to be considered in conformance with MathML 3.0. However, all tools are encouraged to support the old forms as much as possible.

  3. In order to be MathML-round-trip-conformant, a processor need only preserve MathML equivalence on expressions containing no deprecated features.

2.3.1.3 MathML Extension Mechanisms and Conformance

MathML 3.0 defines three basic extension mechanisms: the mglyph element provides a way of displaying glyphs for non-Unicode characters, and glyph variants for existing Unicode characters; the maction element uses attributes from other namespaces to obtain implementation-specific parameters; and content markup makes use of the definitionURL attribute, as well as Content Dictionaries and the cd attribute, to point to external definitions of mathematical semantics.

These extension mechanisms are important because they provide a way of encoding concepts that are beyond the scope of MathML 3.0 as presently explicitly specified, which allows MathML to be used for exploring new ideas not yet susceptible to standardization. However, as new ideas take hold, they may become part of future standards. For example, an emerging character that must be represented by an mglyph element today may be assigned a Unicode code point in the future. At that time, representing the character directly by its Unicode code point would be preferable. This transition into Unicode has already taken place for hundreds of characters used for mathematics.

Because the possibility of future obsolescence is inherent in the use of extension mechanisms to facilitate the discussion of new ideas, MathML can reasonably make no conformance requirements concerning the use of extension mechanisms, even when alternative standard markup is available. For example, using an mglyph element to represent an 'x' is permitted. However, authors and implementers are strongly encouraged to use standard markup whenever possible. Similarly, maintainers of documents employing MathML 3.0 extension mechanisms are encouraged to monitor relevant standards activity (e.g., Unicode, OpenMath, etc.) and to update documents as more standardized markup becomes available.

2.3.2 Handling of Errors

If a MathML-input-conformant application receives input containing one or more elements with an illegal number or type of attributes or child schemata, it should nonetheless attempt to render all the input in an intelligible way, i.e., to render normally those parts of the input that were valid, and to render error messages (rendered as if enclosed in an merror element) in place of invalid expressions.

MathML-output-conformant applications such as editors and translators may choose to generate merror expressions to signal errors in their input. This is usually preferable to generating valid, but possibly erroneous, MathML.

2.3.3 Attributes for unspecified data

The MathML attributes described in the MathML specification are intended to allow for good presentation and content markup. However it is never possible to cover all users' needs for markup. Ideally, the MathML attributes should be an open-ended list so that users can add specific attributes for specific renderers. However, this cannot be done within the confines of a single XML DTD or in a Schema. Although it can be done using extensions of the standard DTD, say, some authors will wish to use non-standard attributes to take advantage of renderer-specific capabilities while remaining strictly in conformance with the standard DTD.

To allow this, the MathML 1.0 specification [MathML1] allowed the attribute other on all elements, for use as a hook to pass on renderer-specific information. In particular, it was intended as a hook for passing information to audio renderers, computer algebra systems, and for pattern matching in future macro/extension mechanisms. The motivation for this approach to the problem was historical, looking to PostScript, for example, where comments are widely used to pass information that is not part of PostScript.

In the next period of evolution of MathML the development of a general XML namespace mechanism seemed to make the use of the other attribute obsolete. In MathML 2.0, the other attribute is deprecated in favor of the use of namespace prefixes to identify non-MathML attributes. The other attribute remains deprecated in MathML 3.0.

For example, in MathML 1.0, it was recommended that if additional information was used in a renderer-specific implementation for the maction element (Section 3.7.1 Bind Action to Sub-Expression <maction>), that information should be passed in using the other attribute:

<maction actiontype="highlight" other="color='#ff0000'"> expression </maction>

From MathML 2.0 onwards, a color attribute from another namespace would be used:

<body xmlns:my="http://www.example.com/MathML/extensions">
...
<maction actiontype="highlight" my:color="#ff0000"> expression </maction>
...
</body>

Note that the intent of allowing non-standard attributes is not to encourage software developers to use this as a loophole for circumventing the core conventions for MathML markup. Authors and applications should use non-standard attributes judiciously.

Overview: Mathematical Markup Language (MathML) Version 3.0
Previous: 1 Introduction
Next: 3 Presentation Markup