W3C

Bound Variables in MathML

W3C Working Group Note 10 November 2003

This version:
http://www.w3.org/TR/2003/NOTE-mathml-bvar-20031110/
Latest version:
http://www.w3.org/TR/mathml-bvar/
Previous version:
This is the first version of this Note.
Editors:
Stan Devitt - Invited Expert, StratumTek
Michael Kohlhase, DFKI

Abstract

This Note examines the treatment of bound variables in Content MathML. Bound variables are central representational primitives in mathematical languages. They allow one to express functions, quantification, and operators with qualifiers. The first edition of the MathML 2.0 Recommendation [MathML2] was somewhat vague about the identity conditions on bound variables, and as a consequence Content MathML applications were left to guess the exact meaning. This Note provides some of the rationale behind how this has been clarified in the second edition [MathML22e].

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This Note provides a self-contained explanation of bound variables in Content MathML. It contains non-normative interpretations of the MathML 2.0 Recommendation (Second Edition) [MathML22e] and it provides guidelines for representing mathematical objects in Content MathML. Please direct comments and report errors in this document to www-math@w3.org.

This document has been produced by the W3C Math Working Group as part of the W3C Math Activity (Activity statement). The goals of the Working Group are discussed in the Working Group Charter. A list of participants in the W3C Math Working Group is available.

Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress. Patent disclosures relevant to this Note may be found on the Math Working Group's patent disclosure page.

Table of Contents

1 Introduction
2 The problem
3 Background on bound variables in logics and mathematical foundations
4 Analysis
5 Using definitionURL for Bound Variable Identification.
    5.1 Why definitionURL?
    5.2 Clarifying the Role of definitionURL in MathML
6 Conclusion

Appendices

A Bibliography
B Definitions of Equality


1 Introduction

Bound variables are central representational primitives in mathematical languages. They allow one to express functions, quantification, and operators with qualifiers. The first edition of the MathML 2.0 Recommendation [MathML2] was vague about the identity conditions on bound variables. For example, all examples in that Recommendation use simple ci elements, for which there is never any doubt when an instance of the bound variable is the same as the defining occurrence. In fact, the question of identity only seriously arises when one starts to use more elaborate presentations inside of the ci elements. Content MathML applications were left trying to decide when two presentations are equal, possibly in the presence of heavy use of semantics tags and/or styling attributes. How extensively could a ci element be modified (even for presentational purposes) without changing it mathematically?

The safe answer to this question was "not at all", but there were legitimate examples where at very least one of the presentations needed to be changed for expository purposes while retaining the mathematical identity. Furthermore, the first edition of the MathML 2.0 Recommendation [MathML2] did not elaborate on this point.

This Note investigates the recognition of instances of bound variables and shows how existing mechanisms for dealing with semantics can be used to make such identifications explicit without depending on the direct comparison of the presentational markup. This is the approach that has been used to clarify the notion of identity of bound variables in the second edition of the MathML 2.0 Recommendation [MathML22e].

2 The problem

When complex Presentation MathML annotations are used, the association of a particular instance of a ci with its defining occurrence in the bvar element can be difficult to establish, and the first edition of the MathML 2.0 Recommendation [MathML2] did not specify a way to do so. Note that a strict definition of identity based solely on the XML structure (possibly based on the XML Information Set) would be too restrictive.

In most purely Content MathML cases (including all the examples in the the first edition of the MathML 2.0 Recommendation [MathML2]) there is no problem; the content of the bvar element is a single ci, whose content is just a simple character string. There is little doubt that any ci in the body of the apply element that have the same name after whitespace normalizations is bound by the bvar declaration.

<apply>
  <int/>
  <bvar><ci>x</ci></bvar>
  <interval><cn>0</cn><cn>1</cn></interval>
  <apply>
    </plus>  
    <apply><power/><ci>x</ci><cn>2</cn></apply>
    <apply><times/><cn>2</cn><ci>x</ci></apply>
    <cn>5</cn>
  </apply>
</apply>

However, the Recommendation does allow arbitrary Presentation MathML inside a ci, element and this feature is often used for variables with sub/superscripts, or variables in different font families. In these cases, it becomes less clear which variables are bound by the bvar declaration. For example a simple color change might be used to draw attention to one of the instances of a bound variable as in the following variant of the the above example.

<apply>
  <int/>
  <bvar>
    <ci>
      <mstyle color="red">
	<msub><mi>x</mi><mn>2</mn></msub>
      </mstyle>
    </ci>
  </bvar>
  <interval><cn>0</cn><cn>1</cn></interval>
  <apply>
    </plus>  
    <apply><power/><ci><msub><mi>x</mi><mn>2</mn></msub></ci><cn>2</cn></apply>
    <apply><times/><cn>2</cn><ci><msub><mi>x</mi><mn>2</mn></msub></ci></apply>
    <cn>5</cn>
  </apply>
</apply>

The goal is to clarify how to use presentational information on bound variables without losing track of their essential role as bound variables.

3 Background on bound variables in logics and mathematical foundations

The problem of handling bound variables has been studied extensively in logic and the foundations of mathematics in the last one hundred years. In fact the handling of bound variables was the main contribution of Frege's treatment of first-order logic that is the basis for the set-theoretic foundation of mathematics. This section defines the terminology that will be used in the rest of this document.

A variable is called a [Definition: bound variable] in an expression of the form O x.B[x] , if O is a binding operator. We use B[x] for an expression (called the [Definition: body]) that may contain the identifier x. The occurrence of x directly after the operator O is called the [Definition: declaring occurence] and those in the body are called the [Definition: bound occurrences].

In first-order logic (and all other foundations of mathematics), bound variables have two crucial properties:

Both bound variable identity (BVI) and bound variable names (BVN) are essential aspects of bound variables.

Bound variable identity provides the semantical essence of bound variables. There are formulations of mathematics without (explicit) bound variables (Category theory, combinatory logic, De Bruijn Indices), but they are usually hard for humans to understand.

Bound variable names help humans understand the formulae involved. In fact, un-intuitive renaming can render complex formulae unintelligible. This is also the main reason why the variable-free logics mentioned above have remained theoretical tools rather than communication devices.

4 Analysis

In the first edition of MathML 2.0 [MathML2] Content MathML only explicitly supported a BVN-based representation of bound variables (only the Presentation MathML ci content could be used for identification). As a consequence, it was difficult to decide the identity question, a resolution of which was needed for formal treatment of Content MathML formulae/objects.

This is not a problem in most cases, where names are simple. However, when names involve complex (Presentation MathML) objects, the BVN-based specification potentially leaves open the issue of when a bound occurrence refers to a declaring occurrence -- the central question to be resolved in order to decode the meaning of the expression.

There are really only a few sensible choices for identifying a bound variable with its defining occurrence. They include:

The XML-Infoset-equality approach is impractical for complicated presentations where the whole issue of equivalence of presentations becomes as complicated as the issue of equivalence of images. This BVN-based identification schema leads at best to computationally expensive equality notions.

However, explicit links back to the defining occurrence of a bound variable can provide an optional alternative identification schema especially for use in the more difficult cases. These links make the bound variable identification (BVI) explicit and the implemenation makes natural and intuitive use of existing machinery in MathML. The advantages of such an approach include:

specification:

By design, presentation is permitted in Content MathML in only two rather controlled ways. It may occur (much like an image) inside a ci element, or it may be tied in via the semantics element. The troublesome case is the content of ci. The view taken when this was introduced was that such presentation was analogous to embedding an image. It was to be an indecomposable token. Anything more complicated was to be done using semantics tags. By analogy with images, two ci elements are equal just if they are essentially identical which may formally defined by equality of the src attributes. Expecting ci elements with two completely different presentation contents (albeit visually indistinguishable) to be equivalent would be the same as expecting images in two different files and/or formats to be identical. Short of "decomposing" the objects and doing some potentially complicated analysis, equivalence could not possibly be determined.

computation:

When there is a need to transform Content MathML representation into other content formats, the binding relation of the bound variables must be identified. In a BVN-based approach, this entails the need to compute the equivalence of variable names as Presentation MathML representations. The sophisticated equality tests that may be needed here are not very well-supported by XML tools - some sort of specialized XML diff program appears to be implied. Thus, there are too many factors such This is a serious problem the adoption of Content MathML in more formal settings.

compatibility:

OpenMath [OpenMath] is based on a BVI-based approach; OMV elements are empty, they do not have names.

5 Using definitionURL for Bound Variable Identification.

Compatibility with existing applications is important. Thus the solution must augment the existing BVN-based treatment of bound variables with a BVI-based treatment that allows the user to make the identity condition of bound variables clear, where it is not obvious from the name alone.

This can be accomplished through the use of the definitionURL attribute on the bound ci element to point to the declaring instance as identified by the bvar element, as in the following example

<lambda>
  <bvar>
    <ci id="the-boundvar">complex presentation</ci>
  </bvar>
  <apply>
    <plus/>
    <ci definitionURL="#the-boundvar">complex presentation</ci>
    <ci definitionURL="#the-boundvar">complex presentation</ci>
  </apply>
</lambda>

Instead of computing whether "complex presentation" is identical in each occurrence, it is only necessary to check whether the definitionURL point to the same declared ci. The content of the definitionURL is a URI, which has well-understood identity conditions (URIs are the basis of the Web).

Of course, use of this BVI-based approach to bound variables is a strictly optional way of clarifying bound variable semantics.

5.1 Why definitionURL?

The definitionURL attribute is used in Content MathML to point to the (external) definition of an operator. The relevant part of the Recommendation is 4.3.2.3, where it says

" definitionURL [...] points to an external definition of the semantics of the symbol or construct being declared. The value is a URL or URI that should point to some kind of definition. This definition overrides the MathML default semantics. [...] "

The main statement here is that the value of the definitionURL attribute is a URI points to a definition of the semantics of the element. In the foundation of mathematics it is an accepted fact that bound variables behave just like symbols/constants, only that the scope (and definition) of bound variables are more local than symbols/constants. In fact most foundational systems treat the symbols/constants as bound at the theory level. There is a very clear understanding that the declaring instance (in MathML the bvar element) behaves like the (local) definition of the symbol. In fact many modern foundational systems make this a key design decision.

Using definitionURL is conformant to the current DTD and content grammar.

5.2 Clarifying the Role of definitionURL in MathML

One sometimes sees Content MathML representations like the following one.

<declare definitionURL=".../formalpowerseries"><ci>F</ci></declare>
<apply>
  <set/>
  <bvar><ci>F</ci></bvar>
  <condition>
    <apply>
      <in/>
      <ci>F</ci>
      <csymbol definitionURL="XXX">My Special Formal PSeries</csymbol>
    </apply>
  </condition>
  <ci>x</ci>
</apply>

Here the declare element is used to give the bound variable ci element a definitionURL to attribute a property to a MathML element (in this case being a formal power series), possibly to allow presentation algorithms may make use of this extra knowledge to display the cis in a different manner.

This of course conflicts with the use of definitionURL as recommended above, since then one cannot possibly use definitionURL to refer back to the defining instance of the bvar, and at the same time to the external definition of the special formal power series.

This example is somewhat contrived in that the definitionURL is being used in the spirit of a type declaration. In some sense, this example actually reflects a misunderstanding of the definitionURL attribute. The definitionURL is supposedly reserved for "Definitions", i.e. for statements that fix the meaning of an element (possibly up to isomorphism).

No matter how the points are argued, the example represents a legitimate authoring need and must be accommodated in some manner. The only conflict comes when one tries to use the BVI style of linking bound variables. Fortunately, in that case, it is a prime candidate for the typing through the use of the new general typing mechanism for Content MathML [MathMLTypes], so no functionality is lost: the above example can be re-represented explicitly by

<declare>
  <ci definitionURL="#the-bvar">F</ci>
  <semantics>
    <ci definitionURL="#the-bvar">F</ci>
    <annotation-xml definitionURL="types.html#type_of">
      <csymbol definitionURL=".../formalpowerseries"/>
    </annotation-xml>
  </semantics>
</declare>
<apply>
  <set/>
  <bvar><ci id="the-bvar">F</ci></bvar>
  <condition>
    <apply>
      <in/>
      <ci definitionURL="#the-bvar">F</ci>
      <csymbol definitionURL="XXX">My Special Formal PSeries</csymbol>
    </apply>
  </condition>
  <ci>x</ci>
</apply>

The declare merely associates the ci element with an appropriate semantics tag which provides the type and all is well. Note that in this case, the ci element that is the first child of the declare element also links back to the ci element in the bvar element. Clearly, the definitionURL attribute on the ci element in the first child of the declare element restricts the substitution to those ci that carry a matching definitionURL attribute. Without this, the substitution would have affected other <ci> F </ci> elements outside the scope of the top apply element shown in the example, but inside the math element that governs the scope of the declare.

6 Conclusion

This Note provides guidelines for representing mathematical objects containing bound variables in Content MathML. The Note represents the current consensus in the Math Working Group of the World Wide Web Consortium (W3C), but is not normative.

A Bibliography

MathMLTypes
Stan Devitt, Michael Kohlhase; Structured Types in MathML 2.0W3C Note 2003. (http://www.w3.org/TR/2003/NOTE-mathml-types-20031110/)
MathML2
David Carlisle, Patrick Ion, Robert Miner, Nico Poppelier, Mathematical Markup Language (MathML) Version 2.0 W3C Recommendation 21 February, 2001 (http://www.w3.org/TR/2002/WD-MathML2-20021219/)
MathML22e
David Carlisle, Patrick Ion, Robert Miner, Nico Poppelier, Mathematical Markup Language (MathML) Version 2.0 (2nd Edition) W3C Recommendation 21 October, 2003 (http://www.w3.org/TR/2003/REC-MathML2-20031021/)
OpenMath
O. Caprotti, D. P. Carlisle, A. Cohen (editors); The OpenMath Standard, February, 2000 (http://www.openmath.org/standard)

B Definitions of Equality

The definition of equality of expressions for purposes of bvar and declare is not discussed in version 2.0 of the Recommendation beyond noting that the content of ci elements is treated much like an image of the formula and then systematically basing all examples on a very simple notion of equality of the element content as ASCII strings. While this is a very simple example of XML information set equality, there are a number of different definitions that strictly speaking could not be ruled out. This could be one of many plausible criteria, including:

For purposes of this Note, the notion of equality has generally been based on XML information sets after white space normalization. Roughly speaking, this means that if all attributes that are present have the same value as strings, and recursively, the element content is the same the same, then the objects are identical.