Copyright © 2003 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
This Note discusses the facilities that are available in the MathML 2.0 Recommendation to facilitate the capturing of mathematical type information. It demonstrates how a combination of these features can be systematically used to provide support for general mathematical types.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This Note is a self-contained discussion of mathematical types in Content MathML. It contains non-normative interpretations of the MathML 2 Recommendation (2nd Edition) [MathML22e] and provides guidelines for the handling of advanced mathematical types using Content MathML. Please direct comments and report errors in this document to www-math@w3.org.
This document has been produced by the W3C Math Working Group as part of the W3C Math Activity (Activity statement). The goals of the Working Group are discussed in the Working Group Charter. A list of participants in the W3C Math Working Group is available.
Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress. Patent disclosures relevant to this Note may be found on the Math Working Group's patent disclosure page.
1 Introduction
1.1 Goals
1.2 Overview
2 The type attribute
3 Advanced Typing
3.1 The semantics Element
3.2 Some Encodings for Structured Type Objects
3.3 Associating Structured Types with MathML Objects.
3.4 Structured Types and Bound Variables.
4
Related Work: Types in OpenMath
4.1 Representing and Associating Types in OpenMath
4.2 OpenMath's Small Type System
5 Types as used in appendix C of MathML 2
6 An Example with Complex Types
The goals of this Note are to demonstrate how the existing structures in MathML can be used to attach detailed mathematical type information to various parts of an expression. This document attempts to:
clarify the role of the type
attribute in MathML,
demonstrate in detail how to attach more advanced type information to a MathML object,
illustrate this technique with examples using various type systems, including the types discussed in Appendix C (Content Element Definitions) of the MathML 2.0 Recommendation, the type system used by OpenMath [OpenMath], and an example of a more complicated type system.
MathML is an XML application for describing mathematical notation which allows the encoding of both structure and content of mathematics. While based on a mathematical vocabulary derived largely from school and early university levels, it is extensible in a number of ways. The primary purpose of this Note is to show how these extension mechanisms can be used to associate detailed mathematical type information with individual expressions or parts of expressions.
MathML has many roles. Depending on the goal of an author they may choose to provide a detailed visual presentation of a mathematical object, a more abstract mathematical description of the object, or both. More than one presentation of MathML content is possible, visual, aural or other, and authors are able to attach a wide variety of additional information to their mathematical equations of formulas.
When the author's goal is to provide a detailed representation of a mathematical object, it is necessary to be able to record the mathematical types of the various parts of this representation. For example, the terms in a product may be scalars or vectors, or the elements of a vector may be integers, complex numbers or symbols representing them.
Content MathML, as specified in Chapter
4 of the MathML 2 Recommendation defines an
attribute called type
, which provides limited
support for mathematical types by allowing authors to
specify simple values such as "real",
"integer" or
"complex-cartesian" as the content of the
attribute. Such a simple construct is hardly sufficient
for specifying complex type information, especially as the
use of the attribute is not allowed on all the elements
where one might want to use it. Furthermore, another
problem is illustrated by the example below:
<math> <apply> <tendsto type="above"/> <ci>x</ci> <cn>0</cn> </apply> </math>
In this example, the type
attribute does not indicate
the domain of the tendsto
element, but rather the fact that
the limit is reached from above. The attribute can therefore not be
used to specify the domain of the tendsto
operator.
Another situation where the current approach is
problematic is where one needs to specify more complex
types, for instance the type of functions on complex
numbers yielding pairs of real numbers. Even though the
content of the type
attribute is open-ended, a
specification of all possible object types would
necessitate defining an attribute value syntax, e.g.
"complex → pair(real,real)". However,
this would add considerable complexity to the design of
MathML implementations, as this syntax would require
writing an specific parser.
Nevertheless there is a pressing need to express complex structured types in MathML, as modern treatments of formal and computational aspects of mathematics increasingly make use of elaborate type systems. However the observation that structured objects such as the example type above can be expressed very naturally in Content MathML leads to a better mechanism, detailed below.
The next section discusses the type
attribute in detail. The following sections describe and
discuss methods for incorporating richer type systems in
Content MathML, followed by a number of examples showing
possible scenarios.
In Content MathML, The type
attribute is used to specify the basic type of a limited number of primitive objects. These include:
The cn
element is used to specify actual
numerical quantities. The attribute type
indicates what type of number is being represented
such as integer, rational, real or complex,
The following values are used in the Recommendation with a specific meaning: "real", "integer", "rational", "complex-cartesian", "complex-polar" and "constant". Other values are permitted for extensibility.
The ci
element is used to name mathematical objects. It
uses the type
attribute primarily to indicate the kind of
objects that are being represented.
The declare
element can be used to
establish a default type value for associating a
type with other MathML elements such as the
ci
element. Either the type
attribute or the contents of the declare
element can be used to specify a type.
The set
element is the container element
that constructs a set of elements. The
type
attribute is used on the set
element to clarify the kind of set that is
represented. For example, a set may be a
multiset.
The tendsto
element is used to express
the relation that a quantity is tending to a
specified value. The type
attribute is
used on the tendsto
element to indicate the
direction of limits.
The MathML specification does not restrict the values of the type attribute in any way. Indeed the attribute content is defined as CDATA in the MathML Document Type Definition, which is sufficiently open to allow encoding elaborate type structures. However, this approach is not appropriate in many cases, for several reasons:
There are MathML elements that may require detailed type
specifications but cannot have a type
attribute.
The type
attribute cannot easily be used
for two purposes at once. For example, one might want
to indicate that a set is a multiset and also that its
elements are integers, or that a limit is reached from
above and that it is over the domain of integers. To
deal with these cases, it would be necessary, at least,
to develop a notation for encoding such type
combinations in the attribute value.
Embedding detailed type information in the value of
the type
attribute means that a complex
syntax must be defined. Because it would reside in an
attribute value, this syntax could not be XML, and thus
would put extra implementation burden on MathML-aware
applications.
Mathematical types are mathematical objects themselves and thus require a rich structure model, such as that of Content MathML, to describe them
In order to deal with more sophisticated typing systems, two key issues must be solved:
Specify a way to encode non-trivial mathematical type objects.
Determine a method to associate the type objects with the corresponding MathML object.
One simple way deal with the second issue is to use
MathML's semantics
element, which is defined to
generally associate additional information with mathematical
objects. Additionally this element supports both XML and
non-XML based encodings through using annotation-xml
and annotation
elements.
Regarding the encoding problem, it is a well-known fact
that developing a mathematical typing system is a difficult
task. However, several such systems exist and each represents
a major piece of development effort. Examples include the
Simple Type System developed by the OpenMath Consortium
[STS]. A significant part of Appendix C of
MathML itself deals with the issues of types and how they are
related. Fortunately, the semantics
element is
designed to support alternative notations, and thus allows
using the typing systems mentioned.
The following section discusses in detail how the
semantics
element can be used to describe types.
semantics
ElementThe semantics
element is designed to group
together various kinds of information related to a
particular mathematical object. For example,
semantics
is often used to provide both a Content MathML
and a Presentation MathML representation of an object.
The semantics
element is the container element
for the MathML expression, together with its semantic
mappings. semantics
contains a variable number of
child elements, the first of which is the object (which
may itself be a complex element structure) to which
additional semantic information is attached. The second
and subsequent children elements, when present, are
annotation
or annotation-xml
elements.
Each of these elements can contain a
definitionURL
and a encoding
attribute. Those attributes can be used to clarify the
meaning of the content of a particular
annotation-xml
element, and the manner in which
it is encoded, respectively.
The annotation
element is a container
for arbitrary data. This data may be in the form of
raw text, computer algebra encodings, C programs, or
any syntax a processing application might expect.
The annotation
element can contain an attribute
called "encoding", defining the format in use.
Note that the content model of annotation
is PCDATA, so care must be taken that the
particular encoding does not conflict with XML
parsing rules.
The annotation-xml
element is a
container for semantic information in XML
syntax. For example, an XML form of the OpenMath
semantics could be contained in the element. Another
possible use is to embed the Presentation MathML
form of a construct given in Content MathML format
in the first child element of semantics
(or
vice versa). annotation-xml
can contain an
encoding
attribute specifying the
format used.
By replacing a simple mathematical expression by a
semantics
element it is possible to attach
additional information to a mathematical object. Some of
that information may be multiple views of the object (in
different formats), and one may be type information about
the object. For example:
<semantics> <apply><divide/><cn>123</cn><cn>456</cn></apply> <annotation encoding="Mathematica">N[123/456, 39]</annotation> <annotation encoding="TeX"> $0.269736842105263157894736842105263157894\ldots$ </annotation> <annotation encoding="Maple">evalf(123/456, 39);</annotation> <annotation-xml encoding="MathML-Presentation"> <mrow> <mn> 0.269736842105263157894 </mn> <mover accent='true'> <mn> 736842105263157894 </mn> <mo> ‾ </mo> </mover> </mrow> </annotation-xml> <annotation-xml encoding="OpenMath"> <OMA xmlns="http://www.openmath.org/OpenMath"> <OMS cd="arith1" name="divide"/> <OMI>123</OMI> <OMI>456</OMI> </OMA> </annotation-xml> </semantics>
As the example shows, the encoding attribute is used similarly to a media type, as it indicates how to parse the content of the element. The attribute might indicate the use of a particular XML application such as an OpenMath dictionary or a particular entry of such a dictionary
The next section discusses some of the possible type encodings supported by this approach.
Some computer algebra systems provide elaborate support for arbitrarily complex type systems. Such systems make it possible, for example, to associate with an identifier information such as the fact that it represents an operator that takes unary real-valued functions as input and returns similar functions as values (one example is the differentiation operator). This Note takes the approach that types are structured mathematical objects like any other object.
For example, the function type indicating a function that maps integers to natural numbers (Z → R) is sometimes written as:
(integer) --> (naturalnumber)
(This syntax is used for formatting signatures in appendix C of the MathML 2.0 Recommendation.) This function type might be written as a Content MathML object as
<apply> <csymbol definitionURL="types.html#funtype"/> <csymbol definitionURL="types.html#int_type"/> <csymbol definitionURL="types.html#nat_type"/> </apply>
where we assume a document called types.html
,
which specifies the type system employed and provides definitions
for the symbols Z, →, and N.
The type object
Z → R
is obtained by applying the function type constructor
→
to the set
Z
of integers and the set
N
of natural numbers.
An alternative representation of this function type that makes more direct use of existing content MathML elements is:
<apply> <csymbol definitionURL="types.html#funtype"/> <integers/> <naturalnumbers/> </apply>
The last two representations above explicitly encode
the structure of the type as a Content MathML object. This
representation of types greatly extends the set of
available types in MathML, compared to the use of the
type
attribute (barring the use of URIs in the
attribute).
The semantics
element can be used to directly associate
structured types (as introduced in the previous section) with a particular MathML object.
For example:
<semantics> <ci>F</ci> <annotation encoding="ASCII" definitionURL="http://www.example.org/MathML-2.0-appendixC"> (integer) --> (naturalnumber) <annotation> <annotation-xml encoding="MathMLType" definitionURL="types.html"> <apply> <csymbol definitionURL="types.html#funtype"/> <integers/> <naturalnumbers/> </apply> <annotation-xml> </semantics>
The second child element of the semantics
element is a annotation
element that
contains type information in textual format. The
third child element is an annotation-xml
element that contains the XML representation of the
type, using Content MathML markup.
The definitionURL
attribute on the
annotation-xml
and annotation
elements
are used here to specify the nature of the information
provided within the element (in this case, type
information). The encoding
attribute on the
annotation-xml
element specifies the format of
the XML data. MathML applications can use the value of
these attributes to determine whether they can deal with
the type information, i.e. whether they support that
particular type system.
In summary the MathML specification allows using the
semantics
element as a general annotation device
for MathML objects, where the definitionURL
attribute specifies the relation of the annotation to the
first child of the semantics
element. In
particular, this mechanism can be used to attach
structured type objects to mathematical objects. Below is
a more complex example showing such a type annotation in
OpenMath markup of the variable XZ
represented using Content MathML markup.
(see 4
Related Work: Types in OpenMath for more details).
<semantics> <semantics id="typed-var"> <ci>X</ci> <annotation-xml definitionURL="type.html" encoding="MathMLType"> <integers/> </annotation-xml> </semantics> <annotation-xml encoding="OpenMath"> <OMOBJ xmlns="http://www.openmath.org/OpenMath"> <OMATTR xlink:href="typed-var"> <OMATP> <OMS cd="sts" name="type"/> <OMS cd="setname1" name="integers"/> </OMATP> <OMV name="X"/> </OMATTR> </OMOBJ> </annotation-xml> </semantics>
To make the correspondence clear, the parallel markup uses cross-references, as described in section 5.3.4 of the MathML2 specification.
A natural place to annotate types in formulas is in the declaration of bound variables. For example:
<math> <apply> <forall/> <bvar id="bvF"> <semantics> <ci>F</ci> <annotation-xml encoding="content-MathML" definitionURL="types.html"> <apply> <csymbol definitionURL="types.html#funtype"/> <csymbol definitionURL="types.html#int_type"/> <csymbol definitionURL="types.html#nat_type"/> </apply> </annotation-xml> <annotation encoding="ASCII" definitionURL="type"> integer --> nat </annotation> </semantics> </bvar> <bvar id="bvx"> <semantics> <ci>x</ci> <annotation-xml encoding="content-MathML"> <csymbol definitionURL="types.html#int_type"/> </annotation-xml> <annotation encoding="ASCII" definitionURL="type"> integer </annotation> </semantics> </bvar> <apply><geq/> <apply><ci definitionURL="bvF">F</ci> <ci definitionURL="bvx">x</ci> </apply> <cn>0</cn> </apply> </apply> </math>
Here the bound variables
F
and
x
are annotated with type information by using the
semantics
element. The correspondence between
the declaring occurrence of the variable and the bound
occurrences are made explicit by the use of the
definitionURL
attribute, which points to the
respective bvar
element (see [MathMLBVar] for a discussion of this
technique).
Note that in this example, only the type information makes the formula true (or even meaningful) as in general it is not the case that applying arbitrary functions to arbitrary values yields results that are greater than or equal to zero.
A related way to express the information in the example above
that does not use the semantics
element in
bvar
is:
<math> <apply> <forall/> <bvar><ci>F</ci></bvar> <condition> <apply> <csymbol definitionURL="types.html#type_of"/> <ci>F</ci> <apply> <csymbol definitionURL="types.html#funtype"/> <csymbol definitionURL="types.html#int_type"/> <csymbol definitionURL="types.html#nat_type"/> </apply> </apply> </condition> <bvar><ci>x</ci></bvar> <condition> <apply> <csymbol definitionURL="types.html#type_of"/> <ci>x</ci> <csymbol definitionURL="types.html#int_type"/> </apply> </condition> <apply><geq/><apply><ci>F</ci><ci>x</ci></apply><cn>0</cn></apply> </apply> </math>
Here, the condition
element is used to express the type condition "such
that
F
has type
Z
→
R".
The transformation from the first to the second representation is a
well-known technique in typed logics called
relativization[Oberschelp], which transforms
formulas in a typed logic into untyped formulas without changing their
meaning. Although the two representations have similar semantics (given
reasonable assumptions), both have dramatically different
computational properties in most formal systems. Therefore, they
should not be conflated in Content MathML.
OpenMath is an XML encoding of mathematical objects,
designed for extensibility. OpenMath and Content MathML are
largely equivalent in expressive power, the main difference
being that OpenMath uses a csymbol
-like mechanism
for representing symbols.
In this section we briefly compare the representation of structured types in content MathML and in OpenMath.
The following representation, which is the counterpart of the example above, shows how types are represented in OpenMath:
<OMOBJ> <OMBIND> <OMS cd="quant1" name="forall"/> <OMBVAR> <OMATTR> <OMATP> <OMS cd="types" name="type"/> <OMA> <OMS cd="types" name="funtype"/> <OMS cd="types" name="int_type"/> <OMS cd="types" name="nat_type"/> </OMA> </OMATP> <OMV name="F"/> </OMATTR> <OMATTR> <OMATP> <OMS cd="types" name="type"/> <OMS cd="types" name="int_type"/> </OMATP> <OMV name="x"/> </OMATTR> </OMBVAR> <OMA> <OMS cd="relation1" name="geq"/> <OMA><OMV name="F"/><OMV name="x"/></OMA> <OMI>0</OMI> </OMA> </OMBIND> </OMOBJ>
OpenMath uses the OMBIND
element for binding
objects while MathML uses the apply
element with
bvar
child (which corresponds to the OpenMath
element OMBVAR
, except that the latter can hold
more than one variable). OpenMath uses the OMA
element for applications, the OMS
element instead
of MathML's csymbol
, and the OMV
element
for variables, instead of ci
. This markup
corresponds directly to Content MathML.
The main difference to the Content MathML example above
is the treatment of annotations. Instead of the MathML
semantics
element, OpenMath uses the
OMATTR
element. Its first child is an
OMATP
element, which contains "attribute
value pairs": the children at odd positions are
symbols that specify the role of their following siblings
(they are sometimes called "features"). In
this case, the first OMS
element whose
name
attribute has the value type
specifies that the element following it is a type. The
second child of the OMATTR
element is the object
to which the information in the first one is attached.
The representation of structured types proposed in
this Note makes Content MathML as expressive as OpenMath for
types. The feature symbols in OpenMath OMATP
elements directly correspond to the value of the
definitionURL
attribute on the
annotation-xml
and annotation
elements
in exactly the same way the definitionURL
attribute on a csymbol
element corresponds to an
OpenMath OMS
element. MathML even slightly
generalizes OpenMath, which only allows OpenMath
representations as feature values. MathML allows arbitrary
XML representations in annotation-xml
elements,
whose format can be specified in the encoding
attribute. The equivalent of the annotation
element can be used by the OpenMath OMSTR
element
as a feature value.
OpenMath comes with a dedicated type system, the "small type system (STS)" [STS]). Since all the core OpenMath symbols (which include counterparts to all MathML symbols and containers) have STS types, this makes it a good candidate for using structured types in content MathML. The simple type system can be used to check arities of functions, and expresses roughly the same information as the content validation grammar in Appendix B of the MathML 2 specification.
The symbols supplied by the STS system can be found in the
sts
OpenMath Content Dictionary; the symbols corresponding to the values
used in the type
attribute discussed in
2 The type attribute can be found in the
mathmltypes Content Dictionary. Using these
Content Dictionaries as targets for the definitionURL
attribute
makes the following three representations equivalent:
<ci type="integer">φ</ci> <semantics> <ci>φ</ci> <annotation-xml encoding="Content-MathML" definitionURL="http://www.openmath.org/standard/sts.pdf"> <csymbol definitionURL="http://www.openmath.org/cd/mathmltypes.html#complex_cartesian_type"/> </annotation-xml> </semantics> <semantics> <ci>φ</ci> <annotation-xml encoding="OpenMath" definitionURL="http://www.openmath.org/standard/sts.pdf"> <OMS xmlns="http://www.openmath.org/OpenMath" cd="mathmltypes" name="complex_cartesian_type"/> </annotation-xml> </semantics>
The primary place where mathematical types arise in MathML is in the notation of the signature of the various mathematical objects defined in appendix C:
The signature is a systematic representation that associates the types of different possible combinations of attributes and function arguments to type of mathematical object that is constructed. The possible combinations of parameter and argument types (the left-hand side) each result in an object of some type (the right-hand side). In effect, it describes how to resolve operator overloading.
For constructors, the left-hand side of the signature describes the
types of the child elements and the right-hand side describes the type of
object that is constructed. For functions, the left-hand side of the
signature indicates the types of the parameters and arguments that would be
expected when it is applied, or used to construct a relation, and the
right-hand side represents the mathematical type of the object constructed
by the apply
. Modifiers modify the attributes of an
existing object. For example, a symbol might become a
symbol of type vector.
The signature must be able to record specific attribute values and argument types on the left, and parameterized types on the right.. The syntax used for signatures is of the general form:
[<attribute name>=<attribute-value>]( <list of argument types> ) --> <mathematical result type>(<mathematical subtype>)
The MMLattributes, if any, appear in the form
<name>=<value>
. They are separated notationally
from the rest of the arguments by square brackets. The possible values are
usually taken from an enumerated list, and the signature is usually
affected by selection of a specific value.
For the actual function arguments and named parameters on the left,
the focus is on the mathematical types involved. The function argument
types are presented in a syntax similar to that used for a DTD, with the
one main exception. The types of the named parameters appear in the
signature as
<elementname>=<type>
in a manner analogous for that used for attribute values. For example,
if the argument is named (e.g. bvar
) then it is
represented in the signature by an equation as in:
[<attribute name>=<attributevalue>]( bvar=symbol,<argument list> ) --> <mathematical result type>(<mathematical subtype>)
There is no formal type system in MathML. The type values that are used
in the signatures are common mathematical types such as integer, rational,
real, complex (such as found in the description of cn
),
or a name such as string or the name of a MathML constructor.
Various collections of types such as anything, matrixtype
are used from time to time. The type name mmlpresentation
is used to represent any valid MathML presentation object and the name
MathMLtype is used to describe the collection of all MathML types.
The type algebraic is used for expressions constructed
from one or more symbols through arithmetic operations and interval-type
refers to the valid types of intervals as defined in chapter 4.
The collection of types is not closed. Users writing their own definitions
of new constructs may introduce new types.
The type of the resulting expression depends on the types of the operands, and the values of the MathML attributes.
For example, the definition of csymbol
provides a
signature which takes the form:
[definitionURL=definition]({string|mmlpresentation}) -> defined_symbol
This signature describes how a combination of various arguments produces an object which is a defined symbol.
The signature of more traditional mathematical functions such as exp
is given as:
real -> real
complex -> complex
These signatures relate the mathematical type of applying the given function to arguments of a particular mathematical type.
A more complicated example is the definition of limit, which has the signature:
(algebraic,algebraic) -> tendsto
[ type=direction ](algebraic,algebraic) -> tendsto(direction)
This example shows a typed version of the "inner product" operator. Conceptually, this operator takes an n-vector as the first argument, an n×m-matrix and yields an m-vector. The type in the example below can be described as "For all αtype, nZ, mZ. vector(n,α) × matrix(n,m,α) → vector(m,α)". It is a so-called "universal type", i.e. for all types and all natural numbers n and m, the inner product operator has the appropriate function type. Note that in this type system, types include typed variables themselves, for instance n and m are typed to be integers. Furthermore, note that types have a rich set of "type constructors" like the vector type constructor that takes a natural number n and a type α to construct the type of n-vectors with elements of type α .
<math> <semantics> <ci>inner-product</ci> <annotation-xml definitionURL="deptypes-system.html" encoding="content-MathML"> <apply> <csymbol definitionURL="deptypes-system.html#all_type"/> <bvar> <semantics> <ci>&alpha;</ci> <annotation-xml definitionURL="deptypes-system.html" encoding="content-MathML"> <csymbol definitionURL="deptypes-system.html#type_type"/> </annotation-xml> </semantics> </bvar> <bvar> <semantics> <ci>n</ci> <annotation-xml definitionURL="deptypes-system.html" encoding="content-MathML"> <csymbol definitionURL="deptypes-system.html#int_type"/> </annotation-xml> </semantics> </bvar> <bvar> <semantics> <ci>m</ci> <annotation-xml definitionURL="deptypes-system.html" encoding="content-MathML"> <csymbol definitionURL="deptypes-system.html#int_type"/> </annotation-xml> </semantics> </bvar> <apply> <csymbol definitionURL="deptypes-system.html#fun_type"/> <apply> <csymbol definitionURL="deptypes-system.html#vector_type"/> <cn>n</cn> <cn>α</cn> </apply> <apply> <csymbol definitionURL="deptypes-system.html#matrix_type"/> <cn>n</cn> <cn>m</cn> <cn>&alpha;</cn> </apply> <apply> <csymbol definitionURL="deptypes-system.html#vector_type"/> <cn>m</cn> <cn>alpha</cn> </apply> </apply> </apply> </annotation-xml> </semantics> </math>