Version: 11 July 2010
The MathML 3.0 Candidate Recommendation set out as an exit criterion that each feature of the specification be demonstrated by two conforming implementations. That goal has been met. This document describes how that has been achieved, detailing the implementation work, testing process, and a discussion of the results.
The Math Working Group created a MathML Test Suite for MathML 2.0 and has since updated and extended it to cover new functionality in MathML 3.0. The test suite now consists of 1680 individual test cases. It can be navigated via a variety of "views" so that implementors can choose test cases exercising only MathML 2.0 functionality, new MathML 3.0 functionality, or all MathML 3.0 functionality. The test suite also offers a view containing only test cases lying in the MathML for CSS profile.
The test suite is primarily aimed at testing the visual rendering of MathML expressions, providing a reference image for each test. This necessarily introduces an element of judgment in determining when an implementation passes a test, since the MathML specification only suggests renderings. This is because MathML is used in a wide range of application areas. Hence it is necessary to allow renderings to be to some extent context- and implementation-specific. To assist implementors in making judgments as to whether a rendering should be deemed acceptable, descriptions of the purpose of each test are given, along with various other test case metadata. In general, the degree of leeway for presentational rendering tests is small, while more liberty may be taken with rendering content MathML. In a number of cases where questions arose, specific results were discussed in the Math WG, and consensus decisions were made, sometimes leading to clarifications in the text of the MathML 3.0 specification.
To facilitate CR testing, an automated testing framework was developed where implementors could step through each test in the suite, comparing their rendering against the reference rendering. Each test could be marked as "passed", "some", "failed", "not tested" or "broken". The last status was used to help debug and improve the test suite, and no tests reported as broken remain in the test suite. Results were then submitted via a web form, and automatically tabulated. The results are publicly available, and discussed in detail below.
Some features of MathML 3.0 are not amenable to testing via comparing renderings. For example, MathML 3.0 makes recommendations about how MathML expressions should be treated for data exchange, e.g. via OS clipboard services. In these cases, the test suite still contains tests, but merely describes the expected results. The largest group of such tests concern the notion of strict Content markup, introduced in MathML 3.0. The testing of this aspect of MathML 3 is discussed separately below.
Six of the implementations are from organizations that participate in the Math Working Group, and two were not. Of the six from member organizations, two were largely carried out by teams not directly involved with the Math Working group. Thus about half of the implementation was done by WG members and half by people working from the specification alone.
All implementations are currently in development, but ctop and pmml2tex are publicly available continuously as an Open Source project, as is MathJax. However, an official MathJax release is slated in the next weeks, and several other implementations are not far behind with release dates later in 2010.
At least four other projects are known to be in the midst of implementation of at least some features from MathML 3.0. These are Mozilla, jEuclid, Gemse, and fmath. In addition, two specialized implementations provided validation of the algorithm for representing all content MathML expressions in MathML 3 strict format. These are c2s and cmml2om, described below.
In is important to note that the MathML 3.0 CR exit criterion was that each feature individually be implemented by two conforming implementations, not that there be two conforming implementations of the entirety of MathML 3.0. This choice reflects the nature of MathML, in that it provides vocabulary for a wide range of math communication goals. Most notably, MathML provides vocabularies for both presentational and semantic purposes, and it is typical for an application to use one or the other, but not both.
However, in order to validate that MathML 3 is consistent, one of the implementations (MathPlayer) did attempt to implement all of MathML 3.0, passing 97% of all tests. In this sense, it can be considered as a reference implementation. A second implementation (ctop) also implements the great majority of MathML, passing 74% of all tests.
The other five tested implementations did not attempt full implementations within the CR period and did not test all implemented features. In most cases, this is merely a matter of time and resources, and full implementations are planned (possibly restricted to either the content or presentation vocabularies). In these cases, features were selectively implemented and tested, in order to demonstrate coverage by two conforming implementations for all features.
Current results from CR testing are publicly available. The following discussion refers to the linked page.
The MathML 3 test suite is organized hierarchically by feature area. At the top level, there are eight broad areas of functionality. Five of these area have 100% coverage by at least two conforming implementations for each test: General, Characters, Presentation, Content, and ErrorHandling. Note these are the areas of functionality already present in MathML 2.0, so one would expect a solid base of coverage. However, MathML 3 does add functionality in these areas, particularly in Presentation, so the 100% coverage figure does represent real implementation effort.
The remaining three areas, Topics, Strict Content, and Torture Tests do not have 100% coverage, but closer examination of the results reveals that the CR exit criteria are adequately demonstrated. Each area is discussed in detail below.
The Topics area of the test suite denotes tests covering advanced or complex functionality. Many of these tests have been contributed over time in areas where implementation has proven to be difficult or error prone. There are nine sub-topics in the Topics area, three of which are new functionality in MathML 3, while the remainder represent functionality present to some degree in MathML 2.0. Implementation coverage in the the six MathML 2.0 sub-topic is 100%, as is the coverage of MathML 3.0 BiDi functionality. That leaves just two MathML 3.0 sub-topics, Elementary Math Examples and LineBreak without 100% coverage.
The LineBreak section falls short by a single test (linebreaking005-linebreakstyle.xml) that has only a single passing implementation. However, note that RichEdit is listed as passing some of this test. This is essentially due to poor test design. The test combines three different styles of duplicating operators on a linebreak in one test. In practice, only one style is used throughout a document, and since RichEdit is a document editor, it allows setting the linebreak style at the document level, but not the equation level. Consequently, RichEdit implements each of the three break styles, but there was no easy way to conduct a test of a single equation with three styles. Therefore, it was deemed that RichEdit passes the intent of the test, which gives 100% coverage of the Linbreaking area.
The Elementary Math sub-topic is a demanding new feature in MathML 3.0. There is huge variation in styles of elementary math presentation worldwide, and MathML 3.0 supports a large number of them (including this one).
In the MathPlayer implementation, 5 tests are currently marked as partial or failed. This was essentially a time and resources decision, as the "east" and "west" locations for "carries" are comparatively rare, as is the "righttop" long division style. The MathPlayer implementor (Neil Soiffer, Math WG member) was convinced that implementing these options present no new technical difficulties, but left them till the end due to other demands on his time. With an additional day of development, these tests could be passed at any point, but it was decided he should focus on another implementation task, namely, speech rendering.
Since a major impetus behind adding support for elementary math is accessibility, the Math WG felt it was important to validate that the new constructs can be accessibly rendered. Consequently, Soiffer has implemented speech rendering in the MathFlow codebase for these constructs. This work is ongoing, and results are being updated as they come in. However, as can be seen, coverage is already quite good, well over 80%. Some of the tests are testing visual alignment and are not sensible in speech and were not tested.
Work is also ongoing with implementation of elementary math in
the pmml2tex package maintained by David Carlisle (also a Math WG
member). His original intent was to implement the MathML 3.0 elementary
math features in his ctop package, and he has completed the most
difficult aspects, which involve parsing out the
construct, and computing column locations for carries, etc. However,
using the MathML 2.0 table model to implement the rendering is complex,
and of dubious value. So Carlisle is instead porting the code to
pmml2tex where the low-level TeX positioning code can be used. One
can inspect the current coverage of this effort on the results
Consequently, while not all elementary math options have been fully implemented, all constructs have been, and rounding out the options presents no new technical issues. The implementors are confident that 100% coverage will be completed within a couple of weeks. In light of the fact that one implementation is essentially complete, and two additional implementations, one a speech renderer, are within a few days of being complete, the Math WG felt that this was adequate to demonstrate the CR goal of two conforming implementations for the Elementary Math feature.
The Strict Content section is an atypical category in that the MathML 3 test suite was essentially created to be a rendering test suite, whereas rendering is somewhat orthogonal to the primary function of strict Content markup. Consequently, providing two implementations that render each strict content test case in traditional math notation is not strictly a requirement to demonstrate MathML 3 has met its CR requirements.
In theory, an XSL stylesheet that merely wrote out strict content markup in prefix functional form would qualify as a conforming rendering, and the Math WG considered testing such a stylesheet. However, as it would have served only the limited purpose of demonstrating a complete visual rendering of strict content markup, and not a genuinely useful one, this was not done. Note that in spite of the nature of strict Content markup, the section has 91% coverage. This is mostly due to the fact that most strict content can be rendered unambiguously in traditional math notation for minimal incremental cost, so both MathPlayer and ctop did this for most strict constructions.If rendering is not the appropriate test of strict Content markup, this begs the question of what is. To a certain extent, strict Content requires no direct implementation, since it exists in large part to provide a Content MathML representation isomorphic to the OpenMath semantic representation of mathematics, which can be checked by inspection. However, there are two related, testable assertions.
In addition, it is desirable that the correspondence capture the likely semantic intent of Content MathML 2.0 expressions. But since MathML 2.0 did not provide a rigorous definition of semantics, this is a matter of judgment, and not a testable assertion.
To validate assertions 1 and 2, two implementations of the algorithm in Chapter 4 were produced, and both applied to all the expressions in the Content section of the test suite. The two conversions were compared and manually checked. One implementation provided the desired correspondence in all cases, and was used to generate the reference markup in the Strict Content section of the test suite. The c2s XSL stylesheet developed by Robert Miner (Math WG co-chair) and David Carlisle (Math WG member) directly implements the algorithm from Chapter 4 of the MathML 3.0 specification.
The second implementation is an adaptation of an older XSL script, cmml2om, which performs conversion of Content MathML directly to OpenMath, and back-translates to strict Content markup. The point here was to verify that the algorithm in the specification produces the same interpretation of author intent in ambiguous cases as the long-established cmml2om script, which had become something of a de facto standard in the OpenMath community. The output of the cmml2om and c2s scripts were verified as corresponding on mathematical structure. The cmml2om script only provides partial coverage of the test suite cases, since because of its history, it strips out some non-strict attributes that should be converted to annotations, and so on. Given that the purpose of validating the semantics of the 4.6 algorithm could be achieved without that, it was decided that it wasn't worth doing at this time.
The Torture Tests section of the test suite is aimed primarily at benchmarking non-functional aspects of implementations. As such, two conforming implementation are not strictly required to demonstrate MathML 3.0 has met its CR requirements. Two of the subsections deal with very large expressions and large numbers of expressions in a single page. The other three are more specialized. The LineBreak Extreme tests provide tests of problematic constructs with no good break point, where the specification requires no specific behavior. The remaining Varying Token Extreme tests require access to math fonts containing glyphs and character mappings not generally available.
The BiDi-Elementary tests are somewhat different in nature. They were classified as torture tests because they require both MathML 3.0 Elementary Math and BiDi functionality. Because of the division of labor between CR implementors, two implementations that covered both of these complex areas simultaneously are not yet available, though note that the reference implementation does provide at least one implementation for most tests.