Working Draft 15-May-1997

7. The MathML Core Interface



Embedded MathML notation must interact with a variety of renderers and processors. The MathML core specification given in this document is designed to be a "fullform" markup, capable of encoding very complex presentational and semantic structure. Therefore, much of the future potential for MathML lies in its ability to smoothly interface with more specialized, more user-friendly input and output models.

In this section, we describe the basic interface mechanisms that are part of the MathML core specification. We also describe some of the future extensions planned by the HTML Math working group.

7.1 Embedding MathML in HTML

The most fundamental interface mechanism provided by the MathML core specification governs the embedding of Math markup in an HTML document. MathML proposes two sets of top level tags, which surround all other MathML markup in an HTML document.

Since MathML is a verbose language, designed more to be explicit and complete for automatic processing, many authors will likely choose to use terser notations which may be filtered into MathML markup. Indeed, many renderers will probably transparently accept several input syntaxes, while returning MathML when queried for cut and paste, or searching operations.

In order to accommodate alternative syntaxes, MathML provides one set of top-level tags for pure MathML XML markup, and one for other input syntax intended for processing by a specific renderer. More specifically the content model is explicitly declared to be XML MathML markup for one tag set, while the content model for the second tag set is CDATA. One consequence is that alternative input syntaxes must either prohibit the character sequence '</' or provide an escape mechanism, since it is not allowed in CDATA.

Each tag set contains a tag for block level notation, and another for character level notation. In typesetting language, these are typically called display and in-line notation respectively.

7.1.1 Top-Level Tags

The proposed tag names are:

MATHXML/character level (inline)
MATHDISPXML/block level (display)
 
FCDATA/character level (inline)
FDCDATA/block level (display)

Note: The tag names F and FD are meant to be mnemonic for "formula" and "formula display" in analogy with math and math display. Since we anticipate that the XML full form mark up will typically be written with the aid of authoring tools, and that authors writing mark up by hand are more likely to use an alternative input syntax, we have elected to propose the short tag names for the CDATA content model.

Note: Since F and FD are valid MathML tags, it is syntactically permitted to nest F tags within MATH tags. However, renders may not be able to display the embedded alternative markup.

7.1.2 Attributes of the Top-Level Tags

The MATH, MATHDISP, F and FD tags all have the same attributes. They are:

TYPE="mime type"
The type attribute assigns a MIME type to the tag content. This attribute is used to invoke an embedded element, such as a Java applet, plug-in or ActiveX control, to render the tag content as described in the next section.

NAME="value"
Provided for scripting.

CLASS="value"
STYLE="value"
Provided for Cascading Style Sheet compatibility.

HEIGHT=nn
WIDTH=nn
BASELINE=nn
It is our hope and belief that embedded elements will soon be able to dynamically negotiate height, width and baseline alignment with browsers. However, these optional attributes are suggested as an interim solution for browser makers that want to support Math, but are unable to provide dynamic resizing and alignment.

OVERFLOW="pan|scroll|elide|truncate| scale"
In cases where size negotiation is not possible or fails (for example in the case of an extremely long equation) this attribute is provided to suggest an alternative processing method to a renderer.

ALTIMG=URL
ALTTEXT="value"
These attributes provide graceful fallbacks for browsers that do not support embedded elements, or images respectively.

MACROS="URL; URL; .."
This attribute provides a way of pointing to external macro definition files. Macros are not part of the MathML core specification, but we anticipate defining a macro mechanism in a subsequent proposal. Macros may also be utilized by alternative input syntaxes.

7.1.3 Invoking Embedded Elements as Renderers

Ideally, Math will be supported natively by browsers. However, until that happens, we anticipate that Math will be implemented via embedded elements which process and render the math mark up.

For this to be an effective solution, it must be possible to automatically invoke an embedded element on the content of both the MATH/MATHDISP and F/FD tags. Moreover, in the case of the F/FD tags, it is necessary to convey to the renderer the nature of the particular input syntax used.

The mechanism we proposed for accomplishing these objectives is to assign MIME types to tag contents. There are two main reasons for this:

One drawback of this scheme is that it requires embedded element renderer vendors and authors to use consistent MIME types. In the long run, we anticipate that widely used input syntaxes will be assigned official MIME types. In the short run, however, authors using alternative input syntaxes will have to assign MIME types with specific renderers in mind.

On the other hand, this problem exists with other schemes, as well, such as a "notation type" attribute; if the attribute value is defined as a text string, it shares the same problem as MIME types, and if it is defined to take one of a list of predefined values, it is not sufficiently flexible.

7.1.4 HTML Tags permitted in MathML

Ideally, any HTML tag which makes sense in a math context should be permitted within the scope of a MathML expression. However, given what is currently possible by way of interaction between browsers and embedded elements, the burden of supporting arbitrary HTML tags in an MathML renderer is unrealistically heavy.

Moreover, the problem of supporting HTML tags in XML applications is not specific to MathML. Since other groups are working on solutions to this problem, the MathML core standard does not specify which HTML tags must be permitted within a MathML expression.

At the same time, there are a number of HTML tags that have an obvious, natural interpretation within a MathML expression. MathML renderers are strongly encouraged to support these tags if possible.

The list of recommended tags includes:

7.2 Interacting with Renderers

Eventually, one hopes many different MathML renderers will be implemented. In addition to supporting the MathML core language, it is reasonable to assume that some of these renderers will provide additional specialized capabilities. Consequently, the MathML core specification provides mechanisms for passing additional information directly to renderers that can take advantage of it.

At the same time, it is important to clearly specify what it means to be a MathML compliant renderer, so that authors can be assured that their documents will be generally accessible if they refrain from using proprietary extensions.

Definition: An MathML compliant renderer must:

Beyond this, the MathML core specification makes no demands of individual renderers. However, in order to guide developers, the specification includes advisory material; for example, there are suggested rendering rules included in section 3. In addition, the remainder of this section makes suggestions about a number of interface issues a renderer should address in some fashion

7.2.1 Handling of Errors

If an HTML-Math compliant renderer receives input containing one or more elements with an illegal number or type of arguments, it should nonetheless attempt to render all the input in an intelligible way, i.e. to render normally those parts of the input which were well-formed, and to render an error message (rendered as if enclosed in an <MERROR> element) explaining what was wrong with the incorrect expression, in the same place as a correct rendering would have appeared, and perhaps including normal renderings of the correctly formed argument elements.

Note that the <MERROR> element is provided for the convenience of other mathematical software applications which generate MathML expressions from possibly invalid input.

7.2.2 XML Extensions to the MathML

The set of elements and attributes specified in the MathML core proposal are necessary for rendering common math expressions. It is recognized that not all mathematical notation is covered by this set of elements, that new notations are continually invented, and that sub-communities within mathematics often have specialized notations; and furthermore that the explicit extension of a standard is a necessarily slow and conservative process; this implies that the HTML-Math standard could never explicitly cover all the presentational forms used by every sub-community of authors and readers of mathematics.

In order to facilitate the use of HTML-Math by the widest possible audience, and to enable its smooth evolution to encompass more notational forms (perhaps eventually covered by explicit extensions to the standard), the set of tags and attributes is open-ended, in the sense described in this section.

The HTML-Math tag set is described by an XML-compliant DTD, which necessarily limits the tags and attributes to those which occur in the DTD. Renderers desiring to accept nonstandard elements or attributes, and authors desiring to include these in documents, should accept or produce documents which conform to an appropriately extended XML-compliant DTD which has the standard HTML-Math DTD as a subset.

HTML-Math compliant renderers are allowed, but not required, to accept nonstandard tags and attributes, and to render them in any way. If a renderer does not accept some or all nonstandard tags or attributes, it is encouraged to either handle them as errors as described above for elements with the wrong number of arguments, or to render their arguments as if they were arguments to an <MROW>, in either case rendering all standard parts of the input normally.

7.2.3 An Attribute for Unspecified Data

The MathML attributes described in the MathML core proposal are necessary for display and content markup. Ideally, the MathML attributes should be an open-ended list so that users could add specific attributes for specific renderers. However, this can't be done within the confines of a single XML DTD. Although it can be done using extensions of the standard DTD, as described earlier, some authors will wish to use nonstandard attributes while remaining strictly in compliance with the standard DTD.

To allow this, this proposal also allows the attribute other="..." for all elements, for use as a hook to pass on renderer-specific information. In particular, it can be used as a hook for passing information to audio renderers, computer algebra systems, and for pattern matching in any future macro/extension mechanism. This idea is used in other languages. For example, Postscript comments are widely used to pass information that is not part of Postscript.

At the same time, the intent of the "other" attribute is not to encourage software developers to use this as a loophole for circumventing the MathML core markup conventions. We trust both renderers and authors will use the "other" attribute judiciously.

The value of the other attribute should be a string containing an attribute list in valid XML format (i.e., attr1="val1" attr2="val2" ..., with appropriate escaping of the double quotes). Renderers which accept nonstandard attributes directly should also accept them when they occur within the string value of the other attribute. This is not required for attributes specifically documented by the HTML-Math standard.

7.2.4 The <MACTION> Tag

Authors can make links from MathML subexpressions by using the standard HTML <a href="..."> construction, as described in section 7.1.4. However, links are only one of many ways which specific renderers might provide authors for making math notation active. For example, in lengthy mathematical expressions, the ability to "fold" expressions might be provided, i.e. a renderer might allow a reader to toggle between an ellipsis and a much longer expression which it represents.

To provide a mechanism for binding actions to expressions, MathML proposes the <MACTION> tag. The <MACTION> tag accepts any number of subexpressions as arguments, and a single actiontype attribute with a string value. By default, renderers which do not recognize the specified action type should render the first argument (if any are present).

It is worth noting that in order to be fully MathML compliant, a renderer need only implement the default behavior.

A suggested list of action types and usages might include:

<maction actiontype="toggle"> (first expression) (second expression)... </MACTION>

For this action type, a renderer would alternately display the given expressions, cycling through them when a reader clicked on the active expression. Typical uses would be for exercises in education, ellipses in long computer algebra output, or to illustrate alternate notations.

<maction actiontype="statusline"> (expression) (message) </MACTION>

In this case, renderers would display the expression in context on the screen. When a reader clicked on the expression or moved the mouse over it, the renderer would send a rendering of the message to the browser statusline. Presumably authors would use plain text in an <MTEXT> construct for the message in most circumstances.

<maction actiontype="highlight" other="color='#ff0000'"> expression </MACTION>

In this case, a renderer might highlight the enclosed expression on a "mouse-over" event. In the example given above, use is being made of the "other" attribute to pass a color to a specific renderer as well.

<maction actiontype="popup" > (menu item 1) (menu item 2) ... </MACTION>

A more elaborate renderer might implement a pop up menu feature, to provide one to many linking capability.

7.3 Future Extensions

The MathML core specification is designed to encode complex notational and semantic structure in an explicit and flexible way. The price for such expressive power is that MathML is verbose compared to markup languages like TeX.

To some extent, authoring tools can address this issue; MathML is designed to be easy to generate and process. At the same time, electronic documents are written by authors, not machines, and most authors of technical documents are already comfortable and efficient with other editors, authoring tools and markup languages.

In order to help authors bridge the gap over the coming years, it is vital for MathML to accommodate the present conventions for human authoring of technical material. This is also important for the conversion of existing legacy documents

The HTML Math working group charter specifies that the group will make a proposal on extension protocols for MathML to help accommodate hand authoring, among other things, in May, 1998. The proposal will address at least two such extensions: macros, and alternative input syntaxes.

7.3.1 Macros

Macros can play a very useful role in encoding mathematical content and meaning, since symbolic manipulation is so pervasive in mathematics. At the same time, it is difficult to devise a coherent, general macro system for MathML, because there are so many distinct applications for them:

7.3.2 Alternative Input Syntaxes

Authors who write electronic documents by hand need terse and easy to type input syntaxes. Many will want to continue to use TeX, for example. Others, who are not already users of TeX, or wish to take advantage of the ability of HTML Math to carry semantic meaning, might be expected to learn new languages which require only a minimal amount of additional information, and which are then parsed by correspondingly more sophisticated software.

The HTML Math working group will investigate several alternative input syntaxes. In particular, the group will study augmented operator precedence based languages, such as that proposed by Wolfram Research, variants of TeX, and very simple syntaxes for easy input of common notations.

The group will also consider the issues surrounding the implementation of filters from existing markup languages like TeX and ISO12083.

It is worth noting that the <F> and <FD> tags in the MathML core specification provide the mechanism for using an alternative input syntax. Developers interested in implementing renderers which process an alternative input syntax can do so independently of the HTML Math working group activity.