Parsing MathML

A.1 Use of MathML as Well-Formed XML

DTD and W3C XML Schema
Issue update_schema	`wiki (member only)`
DTD and W3C XML Schema need updating to MathML3
Resolution	None recorded

A MathML document must be a well-formed XML document using elements in the MathML namespace as defined by this specification, however it is not required that the document refer to any specific Document Type Definition (DTD) or schema that specifies MathML. It is sometimes advantagous not to specify such a language definition as these files are large, often much larger than the MathML expression and unless they have been previously cached by the MathML application, the time taken to fetch the DTD or schema may have an appreciable effect on the processing of the MathML document.

Note also that if no DTD is specified with a DOCTYPE declaration, that entity references (for example to refer to MathML characters by name) may not be used. The document should be encoded in an encoding (for example UTF-8) in which all needed characters may be encoded as character data, or characters may be referenced using numeric character references, for example ∫ rather than ∫

If a MathML fragment is parsed without a DTD, in other words as a well-formed XML fragment, it is the responsibility of the processing application to treat the white space characters occurring outside of token elements as not significant.

However, in many circumstances, especially while producing or editing MathML, it is useful to use a language definition, to constrain the editing process or to check the correctness of generated files. The following section, Section A.2 Using the RelaxNG Schema for MathML3, discusses the RelaxNG Schema for MathML3 [RelaxNG], which forms a normative part of the specification. Following that, Section A.4 Using the MathML XML Schema, and Section A.3 Using the MathML DTD discuss alternative languages definition using the document type definitions (DTD) and the W3C XML schema language, [XMLSchemas], both of which are derived from the normative RelaxNG schema automatically. One should note that the schema definitions of the language is currently stricter than the DTD version. That is, a schema validating processor will declare invalid documents that are declared valid by a (DTD) validating XML parser. This is partly due to the fact that the XML schema language may express additional constraints not expressable in the DTD, and partly due to the fact that for reasons of compatibility with earlier releases, the DTD is intentionally forgiving in some places and does not enforce constraints that are specified in the text of this specification.

A.2 Using the RelaxNG Schema for MathML3

MathML documents should be validated using the RelaxNG Schema for MathML, either in the XML encoding (http://www.w3.org/Math/RelaxNG/mathml3/mathml3.rng) or in compact notation (http://www.w3.org/Math/RelaxNG/mathml3/mathml3.rnc) which is also shown below.

In contrast to DTDs there is no in-document method to associate a RelaxNG schema with a document.

We provide five RelaxNG schemata for sub-languages of MathML3:

The grammar for full MathML
The grammar for Presentation MathML without content elements mixed in
The grammar for strict Content MathML3
The grammar for pragmatic Content MathML3 without presentation MathML in token elements
The grammar for the deprecated parts of MathML

Editorial note: MiKo
I think that this is no longer correct

we will present them in detail in the next sections below. As the compact notation for RelaxNG grammars is more readable, we will use this format here.

Note that the RelaxNG grammars here are considerably more strict than the MathML2 DTDs (even in strict mode).

A.2.1 Full MathML

The RelaxNG schema for full MathML builds on the schema describing the various arts of teh language which are given in the following sections. It can be found at http://www.w3.org/Math/RelaxNG/mathml3/mathml3.rnc.

#     This is the Mathematical Markup Language (MathML) 3.0, an XML
#     application for describing mathematical notation and capturing
#     both its structure and content.
#
#     Copyright 1998-2008 W3C (MIT, ERCIM, Keio)
# 
#     Use and distribution of this code are permitted under the terms
#     W3C Software Notice and License
#     http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231
#
#
#     Revision:    mathml3.rnc,v 1.7 2008/11/09 00:24:40 dcarlis Exp $
#
#    Update to MathML3 and Relax NG: David Carlisle and Michael Kohlhase

default namespace m = "http://www.w3.org/1998/Math/MathML"


## the core, strict Content MathML
include "mathml3-strict.rnc"

## Content Expressions now allow pMathML in ci and csymbol
include "mathml3-pragmatic.rnc" {
 
}
 

## Presentation Expressions allow Content Expressions mixed in everywhere
include "mathml3-presentation.rnc"

## include the relevant content dictionaries
include "mathml3-cds-pragmatic.rnc"


## deprecated constucts
include "mathml3-deprecated.rnc"
 {
 
}

 
ContInPres |= ContExp

A.2.2 The Grammar for Presentation MathML

#     This is the Mathematical Markup Language (MathML) 3.0, an XML
#     application for describing mathematical notation and capturing
#     both its structure and content.
#
#     Copyright 1998-2008 W3C (MIT, ERCIM, Keio)
# 
#     Use and distribution of this code are permitted under the terms
#     W3C Software Notice and License
#     http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231
#
#
#     Revision:    mathml3-presentation.rnc,v 1.8 2008/11/09 11:15:50 mkohlhas2 Exp $
#
#    Update to MathML3 and Relax NG: David Carlisle and Michael Kohlhase

default namespace m = "http://www.w3.org/1998/Math/MathML"


math.content |= ContInPres*

MathML.Common.attrib |= attribute class {xsd:NMTOKENS}?,attribute style {xsd:string}?


Browser-interface.attrib = attribute baseline {xsd:string}?,
                           attribute overflow {"scroll" | "elide" | "truncate" | "scale" | "linebreak"}?,
                           attribute altimg {xsd:anyURI}?,
                           attribute alttext {xsd:string}?,
       			   attribute type {xsd:string}?,
			   attribute name {xsd:string}?,	    
			   attribute height {xsd:string}?,
			   attribute width {xsd:string}?

math.attlist |= Browser-interface.attrib,attribute display {"block" | "inline"}?,
                attribute dir {"ltr" | "rtl"}?,
                linebreak.attrib

simple-size = "small" | "normal" | "big"

centering.values = "left" | "center" | "right"

named-space = "veryverythinmathspace" | "verythinmathspace" | "thinmathspace" | 
              "mediummathspace" | 
              "thickmathspace" | "verythickmathspace" | "veryverythickmathspace"
thickness = "thin" | "medium" | "thick"

# number with units used to specified lengths 

length-with-unit = 
    xsd:string #{pattern="(-?([0-9]+|[0-9]*\.[0-9]+)(em|ex|px|in|cm|mm|pt|pc|%))|0"}
length-with-optional-unit = 
   xsd:string #{pattern="-?([0-9]+|[0-9]*\.[0-9]+)(em|ex|px|in|cm|mm|pt|pc|%)?"}

# This is just "infinity" that can be used as a length 
infinity = "infinity"

# colors defined as RGB 
RGB-color = xsd:string {pattern="#(([0-9]|[a-f]){3}|([0-9]|[a-f]){6})"}

# The mathematics style attributes. These attributes are valid on all
#     presentation token elements except "mspace" and "mglyph", and on no
#     other elements except "mstyle". 

Token-style.attrib = attribute mathvariant
		       {"normal" | "bold" | "italic" | "bold-italic" | "double-struck" | 
                        "bold-fraktur" | "script" | "bold-script" | "fraktur" | 
 			"sans-serif" | "bold-sans-serif" | "sans-serif-italic" | 
			"sans-serif-bold-italic" | "monospace" | 
                        "initial" | "tailed" | "looped" | "stretched"}?,
                     attribute mathsize {simple-size | length-with-unit}?,

                     attribute mathcolor {xsd:string}?,
   		     attribute mathbackground {xsd:string}?

truefalse = "true" | "false"

Operator.attrib = 
# this attribute value is normally inferred from the position of
# the operator in its "<mrow"> 
   attribute form {"prefix" | "infix" | "postfix"}?,
   # set by dictionary, else it is "thickmathspace" 
   attribute lspace {length-with-unit | named-space}?,
   # set by dictionary, else it is "thickmathspace" 
   attribute rspace {length-with-unit | named-space}?,
   # set by dictionnary, else it is "false" 
   attribute fence {truefalse}?,
   # set by dictionnary, else it is "false" 
   attribute separator {truefalse}?,
   # set by dictionnary, else it is "false" 
   attribute stretchy {truefalse}?,
   # set by dictionnary, else it is "true" 
   attribute symmetric {truefalse}?,
   # set by dictionnary, else it is "false" 
   attribute movablelimits {truefalse}?,
   # set by dictionnary, else it is "false" 
   attribute accent {truefalse}?,
   # set by dictionnary, else it is "false" 
   attribute largeop {truefalse}?,
   attribute minsize {length-with-unit | named-space}?,
   attribute maxsize {length-with-unit | named-space | infinity | xsd:float}?


mglyph = element mglyph {MathML.Common.attrib,
                     attribute alt {xsd:string}?,
                     (attribute src {xsd:anyURI}| attribute fontfamily {xsd:string}),
		     attribute width {xsd:string}?,
		     attribute height {xsd:string}?,
		     attribute baseline {xsd:string}?,
		     attribute index {xsd:positiveInteger}?}


linethickness.attrib = attribute linethickness {length-with-optional-unit|thickness}
mline = element mline {MathML.Common.attrib,
      linethickness.attrib?,
      attribute spacing {xsd:string}?,
      attribute length {length-with-unit | named-space}?}

Glyph-alignmark = malignmark|mglyph

mi = element mi {MathML.Common.attrib,Token-style.attrib,(Glyph-alignmark|text)*}

mo = element mo {MathML.Common.attrib,Operator.attrib,Token-style.attrib,
                  linebreak.attrib,
                 (text|Glyph-alignmark)*}

mn = element mn {MathML.Common.attrib,Token-style.attrib,(text|Glyph-alignmark)*}

mtext = element mtext {MathML.Common.attrib,Token-style.attrib,(text|Glyph-alignmark)*}

ms = element ms {MathML.Common.attrib,Token-style.attrib,
                 attribute lquote {xsd:string}?,
		 attribute rquote {xsd:string}?,
		 (text|Glyph-alignmark)*}

# And the group of any token 
Pres-token = mi | mo | mn | mtext | ms

msub = element msub {MathML.Common.attrib,
                  attribute subscriptshift {length-with-unit}?,
                  ContInPres,ContInPres}

msup = element msup {MathML.Common.attrib,
                  attribute supscriptshift {length-with-unit}?, 
                  ContInPres,ContInPres}

msubsup = element msubsup {MathML.Common.attrib,
                     attribute subscriptshift {length-with-unit}?, 
                     attribute supscriptshift {length-with-unit}?, 
                     ContInPres,ContInPres,ContInPres}

munder = element munder {MathML.Common.attrib,
                         attribute accentunder {truefalse}?, 
                         ContInPres,ContInPres}

mover = element mover {MathML.Common.attrib,
                       attribute accent {truefalse}?, 
                       ContInPres,ContInPres}

munderover = element munderover {MathML.Common.attrib,
                                 attribute accentunder {truefalse}?, 
                                 attribute accent {truefalse}?, 
                                 ContInPres,ContInPres,ContInPres}

PresExp-or-none = ContInPres | none
mmultiscripts = element mmultiscripts{MathML.Common.attrib,
	                              ContInPres, 
				      (PresExp-or-none,PresExp-or-none)*,
				      (mprescripts,(PresExp-or-none,PresExp-or-none)*)?}
none = element none {empty}
mprescripts = element mprescripts {empty}

Pres-script = msub|msup|msubsup|munder|mover|munderover|mmultiscripts
linebreak-values = "auto" | "newline" | "indentingnewline" | "nobreak" | "goodbreak" | "badbreak"
mspace = element mspace {MathML.Common.attrib,
                         attribute width {length-with-unit | named-space}?,
	           	 attribute height {length-with-unit}?,
	           	 attribute depth {length-with-unit}?,
attribute spacing {text}?,
                   	 linebreak.attrib}

mrow = element mrow {MathML.Common.attrib,ContInPres*}

mfrac = element mfrac {MathML.Common.attrib,
                       attribute bevelled {truefalse}?,
                       attribute denomalign {centering.values}?,
		       attribute numalign {centering.values}?,
		       linethickness.attrib?,
		       ContInPres,ContInPres}
msqrt = element msqrt {MathML.Common.attrib,ContInPres*}

mroot = element mroot {MathML.Common.attrib,ContInPres,ContInPres}

mpadded-space = xsd:string {pattern="(\+|-)?([0-9]+|[0-9]*\.[0-9]+)(((%?)*(width|lspace|height|depth))|(em|ex|px|in|cm|mm|pt|pc))"}



mpadded-width-space = xsd:string {pattern="((\+|-)?([0-9]+|[0-9]*\.[0-9]+)(((%?) *(width|lspace|height|depth)?)|(width|lspace|height|depth)|(em|ex|px|in|cm|mm|pt|pc)))|((veryverythin|verythin|thin|medium|thick|verythick|veryverythick)mathspace)|0"}

mpadded = element mpadded {MathML.Common.attrib,
	                   attribute width {mpadded-width-space}?,
  			   attribute lspace {mpadded-space}?,
  			   attribute height {mpadded-space}?,
  			   attribute depth {mpadded-space}?,
  			   ContInPres*}

mphantom = element mphantom {MathML.Common.attrib,ContInPres*}

mfenced = element mfenced {MathML.Common.attrib,
                           attribute open {xsd:string}?,
  	                   attribute close {xsd:string}?,
  			   attribute separators {xsd:string}?,
			   ContInPres*}

notation-values = "actuarial"|"longdiv"|"radical"| 
                              "box"|"roundedbox"|"circle"| 
                              "left"|"right"|"top"|"bottom"|
                              "updiagonalstrike"|"downdiagonalstrike"| 
                              "verticalstrike"|"horizontalstrike" | "madruwb"
menclose = element menclose {MathML.Common.attrib,
                          attribute notation {list{notation-values*}}?,
			  ContInPres*}

# And the group of everything 
Pres-layout = mrow|mfrac|msqrt|mroot|mpadded|mphantom|mfenced|menclose

Table-alignment.attrib = attribute rowalign 
 	     {xsd:string {pattern="(top|bottom|center|baseline|axis)(top|bottom|center|baseline|axis)*"}}?,
        attribute columnalign {xsd:string {pattern="(left|center|right)( (left|center|right))*"}}?,
        attribute groupalign {xsd:string}?

mtr.content = mtd
mtr = element mtr {Table-alignment.attrib, MathML.Common.attrib,(mtr.content)+}

mlabeledtr = element mlabeledtr {Table-alignment.attrib,MathML.Common.attrib,(mtr.content)*}

mtd = element mtd {MathML.Common.attrib,
                   Table-alignment.attrib,
                   attribute columnspan {xsd:positiveInteger}?,
  		   attribute rowspan {xsd:positiveInteger}?,
		   ContInPres*}

mtable.content = mtr|mlabeledtr
mtable = element mtable {Table-alignment.attrib,
                         attribute align {xsd:string}?,
			 attribute alignmentscope {xsd:string {pattern="(true|false)( true| false)*"}}?,
			 attribute columnwidth {xsd:string}?,
  			 attribute width {xsd:string}?,
  			 attribute rowspacing {xsd:string}?,
  			 attribute columnspacing {xsd:string}?,
  			 attribute rowlines {xsd:string}?,
  			 attribute columnlines {xsd:string}?,
  			 attribute frame {"none" | "solid" | "dashed"}?,
  			 attribute framespacing {xsd:string}?,
  			 attribute equalrows {truefalse}?,
  			 attribute equalcolumns {truefalse}?,
  			 attribute displaystyle {truefalse}?,
			 attribute side {"left"|"right"|"leftoverlap"|"rightoverlap"}?,
  			 attribute minlabelspacing {length-with-unit}?,
  			 MathML.Common.attrib,
			 (mtable.content)*}

maligngroup = element maligngroup {MathML.Common.attrib,
     attribute groupalign {"left" | "center" | "right" | "decimalpoint"}?}

malignmark = element malignmark {MathML.Common.attrib,attribute edge {"left" | "right"}?}

Pres-table = mtable|maligngroup|malignmark

mcolumn = element mcolumn {MathML.Common.attrib,
     attribute align {"left" | "right"}?,ContInPres*}

mstyle = element mstyle {MathML.Common.attrib,
                         linebreak.attrib,
                         attribute scriptlevel {xsd:integer}?,
                         attribute displaystyle {truefalse}?,
			 attribute scriptsizemultiplier {xsd:decimal}?,
  			 attribute scriptminsize {length-with-unit}?,
  			 attribute background {xsd:string}?,
  			 attribute veryverythinmathspace {length-with-unit}?,
  			 attribute verythinmathspace {length-with-unit}?,
			 attribute thinmathspace {length-with-unit}?,
                         attribute mediummathspace {length-with-unit}?,
                         attribute thickmathspace {length-with-unit}?,
                         attribute verythickmathspace {length-with-unit}?,
                         attribute veryverythickmathspace {length-with-unit}?,
                         linethickness.attrib?,
  			 Operator.attrib,Token-style.attrib,
			 ContInPres*}

merror = element merror {MathML.Common.attrib,ContInPres*}

maction = element maction {MathML.Common.attrib,
			   attribute actiontype {xsd:string}?,
  	                   attribute selection {xsd:positiveInteger}?,
  			   ContInPres*}

semantics-pmml = element semantics {semantics.attribs,PresExp, semantics-annotation*}

PresExp = Pres-token | Pres-layout | Pres-script | Pres-table 
	      |  mspace | mline | mcolumn |  maction | merror | mstyle
	      | semantics-pmml

ContInPres |= PresExp


Issue ednote_rnc_browserinterface_	`wiki (member only)`
rnc:browserinterface
Resolution	None recorded


Issue ednote_rnc_units-patterns_	`wiki (member only)`
rnc:units-patterns
Resolution	None recorded


Issue ednote_rnc_mathvariant_	`wiki (member only)`
rnc:mathvariant
Resolution	None recorded


Issue ednote_mglyph_alt_	`wiki (member only)`
mglyph_alt
Resolution	None recorded


Issue ednote_rnc_leftover-max_	`wiki (member only)`
rnc:leftover-max
Resolution	None recorded

Issue permissive_units wiki (member only)

more permissive lengths/widths

Issue permissive_units	`wiki (member only)`
more permissive lengths/widths
David wrote in an e-mail: `length-with-unit` doesn't allow white space (anywhere) which (if any) of the following do we want to allow " 2em ", "2 em", "- 2 em". Also it insists on starting with a digit or -, but do we want to allow ".5em" "-.5em" However we do claim css compatibility here which may suggest some answers to the above `http://www.w3.org/TR/CSS21/syndata.html#length-units`. css allows an optional leading `+` as well `+2em` css requires number to "immediately" follow any sign and the unit to "immediately" follow the number, which I think means no intervening white space. css <number> are allowed to start with a `.` so `.5em` is allowed. css insists on a digit following a `.` so `5.em` is not allowed. Once we have firm answers to the above it should be easy to drop the regexp back in, and make the text match. I think we should not allow white space except at beginning and end but allow a leading `+` (a change from mathml2) and allow no digits before the `.`, but insist on digits after a `.` which would be `[\-\+]?([0-9]+(\.[0-9]+)?\|\.[0-9]+)(em\|ex\|px\|in\|cm\|mm\|pt\|pc\|%))\|0` as written this doesn't allow " `2em` " but I think we can set white space trim properties to apply before the regex is checked (I'll check)
Resolution	None recorded

David wrote in an e-mail: length-with-unit doesn't allow white space (anywhere) which (if any) of the following do we want to allow " 2em ", "2 em", "- 2 em". Also it insists on starting with a digit or -, but do we want to allow ".5em" "-.5em"

However we do claim css compatibility here which may suggest some answers to the above http://www.w3.org/TR/CSS21/syndata.html#length-units.

css allows an optional leading + as well +2em css requires number to "immediately" follow any sign and the unit to "immediately" follow the number, which I think means no intervening white space. css <number> are allowed to start with a . so .5em is allowed. css insists on a digit following a . so 5.em is not allowed.

Once we have firm answers to the above it should be easy to drop the regexp back in, and make the text match.

I think we should not allow white space except at beginning and end but allow a leading + (a change from mathml2) and allow no digits before the ., but insist on digits after a . which would be [\-\+]?([0-9]+(\.[0-9]+)?|\.[0-9]+)(em|ex|px|in|cm|mm|pt|pc|%))|0 as written this doesn't allow " 2em " but I think we can set white space trim properties to apply before the regex is checked (I'll check)

Resolution None recorded

A.2.3 The Grammar for Strict Content MathML3

The grammar for Strict Content MathML3 can be found at http://www.w3.org/Math/RelaxNG/mathml3/mathml3-strict.rnc.

#     This is the Mathematical Markup Language (MathML) 3.0, an XML
#     application for describing mathematical notation and capturing
#     both its structure and content.
#
#     Copyright 1998-2008 W3C (MIT, ERCIM, Keio)
# 
#     Use and distribution of this code are permitted under the terms
#     W3C Software Notice and License
#     http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231
#
#
#     Revision:    mathml3-strict.rnc,v 1.8 2008/11/09 11:15:50 mkohlhas2 Exp $
#
#    Update to MathML3 and Relax NG: David Carlisle and Michael Kohlhase
#
#  This is the RelaxNG schema module for the strict content part of MathML.

default namespace m = "http://www.w3.org/1998/Math/MathML"

include "mathml3-common.rnc"

math.content |= ContExp


opel.content = text

# we want to extend this in pragmatic CMathML, so we introduce abbrevs here.

cn.content = text |(cn,cn)
cn.type.vals  = "integer"|"real"|"double"

cn = element cn {attribute base {text}?,
                 attribute type {cn.type.vals}?,
  		 Definition.attrib,
  		 MathML.Common.attrib,	
		 (cn.content)*}

ci = element ci {attribute type {xsd:string}?,
                 attribute nargs {xsd:string}?,
		 attribute occurrence {xsd:string}?,		
                 Definition.attrib,	
  		 MathML.Common.attrib,
		 opel.content,
		 name.attrib?}

cdname.attrib = attribute cd {xsd:NCName}

csymbol       = element csymbol {MathML.Common.attrib,
	                         Definition.attrib,cdname.attrib?,cdbase.attrib?, 
				 opel.content}

# the content of the apply element, leave it empty and extend it later
apply = element apply {MathML.Common.attrib,cdbase.attrib?,apply.content}
apply-head = apply|bind|ci|csymbol|semantics-apply
apply.content = apply-head,ContExp*
semantics-apply = element semantics {semantics.attribs,apply-head, semantics-annotation*}

qualifier = notAllowed

# the content of the bind element, leave it empty and extend it later
bind = element bind {MathML.Common.attrib,cdbase.attrib?,bind.content}
bind-head = apply|csymbol|semantics-bind
bind.content = bind-head,bvar*,qualifier?,ContExp
semantics-bind   = element semantics {semantics.attribs,bind-head, semantics-annotation*}

bvar = element bvar {MathML.Common.attrib,cdbase.attrib?,bvar-head}
bvar-head = ci|semantics-bvar
semantics-bvar   = element semantics {semantics.attribs,bvar-head, semantics-annotation*}

share = element share {MathML.Common.attrib,attribute href {xsd:anyURI}}

# the content of the cerror element, leave it empty and extend it later
cerror = element cerror {MathML.Common.attrib,cdbase.attrib?,cerror.content}
cerror-head = csymbol|apply|semantics-cerror
cerror.content = cerror-head,ContExp*
semantics-cerror = element semantics {semantics.attribs,cerror-head, semantics-annotation*}

semantics-cmml = element semantics {semantics.attribs,ContExp, semantics-annotation*}

ContExp = cn| ci | csymbol | apply | bind | share | cerror | semantics-cmml


Issue ednote_rnc_opel-content_	`wiki (member only)`
rnc:opel-content
Resolution	None recorded


Issue ednote_rnc_cn-content_	`wiki (member only)`
rnc:cn-content
Resolution	None recorded

A.2.4 The Grammar for Pragmatic MathML

The grammar for pragmatic MathML3 can be found at http://www.w3.org/Math/RelaxNG/mathml3/mathml3-pragmatic.rnc.

#     This is the Mathematical Markup Language (MathML) 3.0, an XML
#     application for describing mathematical notation and capturing
#     both its structure and content.
#
#     Copyright 1998-2008 W3C (MIT, ERCIM, Keio)
# 
#     Use and distribution of this code are permitted under the terms
#     W3C Software Notice and License
#     http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231
#
#
#     Revision:    mathml3-pragmatic.rnc,v 1.10 2008/11/09 17:55:28 dcarlis Exp $
#
#    Update to MathML3 and Relax NG: David Carlisle and Michael Kohlhase
#
#     This is the RelaxNG schema module for the pragmatic content part of 
#     MathML (but without the presentation in token elements).

default namespace m = "http://www.w3.org/1998/Math/MathML"


## the content of "cn" may have <sep> elements in it
sep = element sep {empty}
cn.content |= (sep|text|Glyph-alignmark)*
cn.type.vals |= "e-notation"|"rational"|"complex-cartesian"|"complex-polar"|"constant" 

## allow degree in bvar
degree = element degree {MathML.Common.attrib,ContExp}
logbase = element logbase {MathML.Common.attrib,ContExp}
momentabout = element momentabout {MathML.Common.attrib,ContExp}
bvar-head |= (degree?,ci)|(ci,degree?)

## allow degree to modify <root/>
apply.content |= root_arith1_elt,degree,ContExp*
apply.content |= moment_s_data1_elt,(degree? & momentabout?),ContInPres*
apply.content |= log_transc1_elt,logbase,ContExp*

##allow apply to act as a binder
apply.content |= bind.content

domainofapplication = element domainofapplication {Definition.attrib,MathML.Common.attrib,cdbase.attrib?,ContExp}

lowlimit = element lowlimit {Definition.attrib,MathML.Common.attrib,cdbase.attrib?,ContExp+}
uplimit = element uplimit {Definition.attrib,MathML.Common.attrib,cdbase.attrib?,ContExp+}

condition = element condition {Definition.attrib,cdbase.attrib?,ContExp}

## allow the non-strict qualifiers
qualifier |= domainofapplication|(uplimit,lowlimit?)|(lowlimit,uplimit?)|degree|condition

## we collect the operator elements by role
opel.constant = notAllowed
opel.binder = notAllowed
opel.application = notAllowed
opel.semantic-attribution = notAllowed
opel.attribution = notAllowed
opel.error = notAllowed

opels = opel.constant | opel.binder | opel.application | 
        opel.semantic-attribution | opel.attribution |
	opel.error
container = notAllowed

## the values of the MathML type attributes;  
MathMLType |= "real" | "complex" | "function" | "algebraic" | "integer"


## we instantiate the strict content model by structure checking
apply-binder-head = semantics-apply-binder|opel.binder
apply.content |= apply-binder-head,bvar*,qualifier?,ContExp*
semantics-apply-binder = element semantics {semantics.attribs,apply-binder-head, semantics-annotation*}

apply-head |= opel.application
bind-head |= opel.binder
cerror-head |= opel.error

## allow all functions, constants, and containers to be content expressions on their own
ContExp |= opel.constant|opel.application|container 


# allow no body
bind.content |= bind-head,bvar*,qualifier?

# not sure what a sequence of things is supposed to map to in strict/OM
# but is definitely allowed in pragmatic
# see Content/SequencesAndSeries/product/rec-product3
math.content |= ContExp*

opel.content |= PresExp|Glyph-alignmark

#     This is the Mathematical Markup Language (MathML) 3.0, an XML
#     application for describing mathematical notation and capturing
#     both its structure and content.
#
#     Copyright 1998-2008 W3C (MIT, ERCIM, Keio)
# 
#     Use and distribution of this code are permitted under the terms
#     W3C Software Notice and License
#     http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231
#
#
#     Revision:    mathml3-pragmatic.rnc,v 1.10 2008/11/09 17:55:28 dcarlis Exp $
#
#    Update to MathML3 and Relax NG: David Carlisle and Michael Kohlhase
#
#     This is the RelaxNG schema module for the pragmatic content part of 
#     MathML (but without the presentation in token elements).

default namespace m = "http://www.w3.org/1998/Math/MathML"


## the content of "cn" may have <sep> elements in it
sep = element sep {empty}
cn.content |= (sep|text|Glyph-alignmark)*
cn.type.vals |= "e-notation"|"rational"|"complex-cartesian"|"complex-polar"|"constant" 

## allow degree in bvar
degree = element degree {MathML.Common.attrib,ContExp}
logbase = element logbase {MathML.Common.attrib,ContExp}
momentabout = element momentabout {MathML.Common.attrib,ContExp}
bvar-head |= (degree?,ci)|(ci,degree?)

## allow degree to modify <root/>
apply.content |= root_arith1_elt,degree,ContExp*
apply.content |= moment_s_data1_elt,(degree? & momentabout?),ContInPres*
apply.content |= log_transc1_elt,logbase,ContExp*

##allow apply to act as a binder
apply.content |= bind.content

domainofapplication = element domainofapplication {Definition.attrib,MathML.Common.attrib,cdbase.attrib?,ContExp}

lowlimit = element lowlimit {Definition.attrib,MathML.Common.attrib,cdbase.attrib?,ContExp+}
uplimit = element uplimit {Definition.attrib,MathML.Common.attrib,cdbase.attrib?,ContExp+}

condition = element condition {Definition.attrib,cdbase.attrib?,ContExp}

## allow the non-strict qualifiers
qualifier |= domainofapplication|(uplimit,lowlimit?)|(lowlimit,uplimit?)|degree|condition

## we collect the operator elements by role
opel.constant = notAllowed
opel.binder = notAllowed
opel.application = notAllowed
opel.semantic-attribution = notAllowed
opel.attribution = notAllowed
opel.error = notAllowed

opels = opel.constant | opel.binder | opel.application | 
        opel.semantic-attribution | opel.attribution |
	opel.error
container = notAllowed

## the values of the MathML type attributes;  
MathMLType |= "real" | "complex" | "function" | "algebraic" | "integer"


## we instantiate the strict content model by structure checking
apply-binder-head = semantics-apply-binder|opel.binder
apply.content |= apply-binder-head,bvar*,qualifier?,ContExp*
semantics-apply-binder = element semantics {semantics.attribs,apply-binder-head, semantics-annotation*}

apply-head |= opel.application
bind-head |= opel.binder
cerror-head |= opel.error

## allow all functions, constants, and containers to be content expressions on their own
ContExp |= opel.constant|opel.application|container 


# allow no body
bind.content |= bind-head,bvar*,qualifier?

# not sure what a sequence of things is supposed to map to in strict/OM
# but is definitely allowed in pragmatic
# see Content/SequencesAndSeries/product/rec-product3
math.content |= ContExp*

opel.content |= PresExp|Glyph-alignmark

This grammar focuses on the pragmatic extensions in , , , and .

Editorial note: MiKo
check this again

The pragmatic extensions in , , and rely on information that is specified in the MathML content dictionaries. This is handled in the schema http://www.w3.org/Math/RelaxNG/mathml3/mathml3-cds-pragmatic.rnc.

Editorial note: MiKo
The generated grammar allows `type` attributes for the operator elements, this is incorrect

Finally, the pragmatic extensions given in are not covered in this schema, but will be left for full MathML in the next section.

A.2.5 Deprecated Features

The grammar for the deprecated features in MathML3 can be found at http://www.w3.org/Math/RelaxNG/mathml3/mathml3-deprecated.rnc.

#     This is the Mathematical Markup Language (MathML) 3.0, an XML
#     application for describing mathematical notation and capturing
#     both its structure and content.
#
#     Copyright 1998-2008 W3C (MIT, ERCIM, Keio)
# 
#     Use and distribution of this code are permitted under the terms
#     W3C Software Notice and License
#     http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231
#
#
#     Revision:    mathml3-deprecated.rnc,v 1.9 2008/12/17 09:10:34 mkohlhas2 Exp $
#
#    Update to MathML3 and Relax NG: David Carlisle and Michael Kohlhase

default namespace m = "http://www.w3.org/1998/Math/MathML"


Token-style.attrib &=
  attribute fontsize {xsd:string}? ,
  attribute fontstyle {xsd:string}? ,
  attribute fontweight {xsd:string}? ,
  attribute color {xsd:string}? ,
  attribute fontfamily {xsd:string}?

#Deprecated Content Elements
dep-content = 
  element reln {ContExp*}|
  element fn {ContExp}

ContExp |= dep-content

apply-head |= dep-content

declare = element declare {attribute type {xsd:string}?,
                           attribute scope {xsd:string}?,
                           attribute nargs {xsd:nonNegativeInteger}?,
                           attribute occurrence {"prefix"|"infix"|"function-model"}?,
                           Definition.attrib,cdbase.attrib?, 
                           ContExp+}
ContExp |= declare

mtr.content |= ContInPres

A.2.6 MathML as a module in a RelaxNG Schema

Normally, a MathML expression does not constitute an entire XML document. MathML is designed to be used as the mathematics fragment of larger markup languages. In particular it is designed to be used as a module in documents marked up with the XHTML family of markup languages. As RelaxNG directly supports modular development, this is usually very easy: an XHTML+MathML schema can be specified as simply as

# A RelaxNG Schema for  XHTML+MathML
include "xhtml.rnc"
math = external "mathml3.rnc"
Inline.class |= math
Block.class |= math

assuming that we have access to a modular RelaxNG schema for xhtml that uses Inline.class and Block.class to collect the the content models for inline and block-level elements.

Editorial note: Miko
check this and reference an external schema

Specilizing the MathML3 schema so that we can check the content of annotation-xml elements is similarly simple:

# A RelaxNG Schema for MathML with OpenMath3 annotations
omobj = external "openmath3.rnc" 
include "mathml3.rnc" {anotation-xml.model = omobj}

For details about RelaxNG grammars and modularization see [RelaxNG] or [RelaxNGBook].

Editorial note: Miko
check this and reference an external schema; I think we can even tie the OpenMath model to the value `OpenMath` in the `encoding` attribute.

A.3 Using the MathML DTD

Editorial note: David
DTD to be generated from Relax NG

Editorial note: Bruce
I've moved DTD related material from Chapter 2 to here. It most likely needs to be pruned somewhat

A.3.1 Document Validation Issues

The use of namespace prefixes creates an issue for DTD validation of documents embedding MathML. DTD validation requires knowing the literal (possibly prefixed) element names used in the document. However, the Namespaces in XML Recommendation [Namespaces] allows the prefix to be changed at arbitrary points in the document, since namespace prefixes may be declared on any element.

The 'historical' method of bridging this gap was to write a DTD with a fixed prefix, or in the case of XHTML and MathML, with no prefix, and mandate that the specified form must be used throughout the document. However, this is somewhat restricting for a modular DTD that is intended for use in conjunction with another DTD, which is exactly the situation with MathML in XHTML. In essence, the MathML DTD would have to allocate a prefix for itself and hope no other module uses the same prefix to avoid name clashes, thus losing one of the main benefits of XML namespaces.

One strategy for addressing this problem is to make every element name in the DTD be accessed by an entity reference. This means that by declaring a couple of entities to specify the prefix before the DTD is loaded, the prefix can be chosen by a document author, and compound DTDs that include several modules can, without changing the module DTDs, specify unique prefixes for each module to avoid clashes. The MathML DTD has been designed in this fashion. See Section A.3 Using the MathML DTD and [Modularization] for details.

An extra issue arises in the case where explicit prefixes are used on the top-level math element, but a default namespace is used for other MathML elements. In this case, one wants the MathML module to be included into XHTML with the prefix set to empty. However, the 'driver' DTD file that sets up the inclusion of the MathML module would then need to define a new element called m:math. This would allow the top-level math element to use an explicit prefix, for attaching rendering behaviors in current browsers, while the contents would not need an explicit prefix, for ease of interoperability between authoring tools, etc.

A.3.2 Attribute values in the MathML DTD

In an XML DTD, allowed attribute values can be declared as general strings, or they can be constrained in various ways, either by enumerating the possible values, or by declaring them to be certain special data types. The choice of an XML attribute type affects the extent to which validity checks can be performed using a DTD.

The MathML DTD specifies formal XML attribute types for all MathML attributes, including enumerations of legitimate values in some cases. In general, however, the MathML DTD is relatively permissive, frequently declaring attribute values as strings; this is done to provide for interoperability with SGML parsers while allowing multiple attributes on one MathML element to accept the same values (such as "true" and "false"), and also to allow extension to the lists of predefined values.

At the same time, even though an attribute value may be declared as a string in the DTD, only certain values are legitimate in MathML, as described above and in the rest of this specification. For example, many attributes expect numerical values. In the sections which follow, the allowed attribute values are described for each element. To determine when these constraints are actually enforced in the MathML DTD, consult Appendix A Parsing MathML. However, lack of enforcement of a requirement in the DTD does not imply that the requirement is not part of the MathML language itself, or that it will not be enforced by a particular MathML renderer. (See Section 2.3.2 Handling of Errors for a description of how MathML renderers should respond to MathML errors.)

Furthermore, the MathML DTD is provided for convenience; although it is intended to be fully compatible with the text of the specification, the text should be taken as definitive if there is a contradiction. (Any contradictions which may exist between various chapters of the text should be resolved by favoring Chapter 7 Characters, Entities and Fonts first, then Chapter 3 Presentation Markup, Chapter 4 Content Markup, then Section 2.1 MathML Syntax and Grammar, and then other parts of the text.) For the MathML schema the situation will be the same: the published Recommendation text takes precedence. Though this is what is intended to happen, there is a practical difficulty. If the system processing the MathML uses a validating parser, whether it be based on a DTD or on a schema, the process will probably simply stop when it hits something held to be incorrect syntax, whether or not further MathML processing in full harmony with the specification would have processed the piece correctly.

A.4 Using the MathML XML Schema

Editorial note: David
XSD schema to be generated from Relax NG