Content Markup Validation Grammar

Informal EBNF grammar for Content Markup structure validation

This defines the valid expression trees in content markup it does not define attribute validation - this has to be done on top
Presentation_tags is a placeholder for a valid presentation element start tag or end tag
#PCDATA is the XML parsed character data
symbols beginning with '_' for example _mmlarg are internal symbols (recursive grammar usually required for recognition)
all-lowercase symbols for example 'ci' are terminal symbols representing MathML content elements
symbols beginning with Uppercase are terminals representating other tokens
revised sb 3.nov.97, 16.nov.97 and 22.dec.1997
revised sb 6.jan.98, 6.Feb.1998 and 4.april.1998
revised sb 27.nov.2000 for MathML2.0

whitespace definitions including Presentation_tags

[1]	`Presentation_tags`	::=	`"presentation"`	/ placeholder /
[2]	`Space`	::=	`#x09 \| #x0A \| #x0D \| #x20`	/ tab, lf, cr, space characters /
[3]	`S`	::=	`(Space \| Presentation_tags)*`	/ treat presentation as space /

Characters, only for content validation characters

[4] Char ::= Space | [#x21 - #xFFFD] | [#x00010000 - #x7FFFFFFFF] /* valid XML chars */

start(\%x) returns a valid start tag for the element \%x
end(\%x) returns a valid end tag for the element \%x
empty(\%x) returns a valid empty tag for the element \%x

 start(ci)    ::= "<ci>"
 end(cn)      ::= "</cn>"
 empty(plus)  ::= "<plus/>"

The reason for doing this is to avoid writing a grammar for all the attributes. The model below is not complete for all possible attribute values.

start and end tag functions

[5]	`_start(\%x)`	::=	`"<\%x" (Char - '>')* ">"`	/ returns a valid start tag for the element \%x /
[6]	`_end(\%x)`	::=	`"<\%x" Space* ">"`	/ returns a valid end tag for the element \%x /
[7]	`_empty(\%x)`	::=	`"<\%x" (Char - '>')* "/>"`	/ returns a valid empty tag for the element \%x /
[8]	`_sg(\%x)`	::=	`S _start(\%x)`	/ start tag preceded by optional whitespace /
[9]	`_eg(\%x)`	::=	`_end(\%x) S`	/ end tag followed by optional whitespace /
[10]	`_ey(\%x)`	::=	`S _empty(\%x) S`	/ empty tag preceded and followed by optional whitespace /

mathml content constructs

[11]	`_mmlall`	::=	`_container \| _relation \| _operator \| _qualifier \| _other`
[12]	`_mmlarg`	::=	`_container`
[13]	`_container`	::=	`_token \| _special \| _constructor`
[14]	`_token`	::=	`ci \| cn \| csymbol \| _constantsym`
[15]	`_special`	::=	`apply \| lambda \| reln \| fn`
[16]	`_constructor`	::=	`interval \| list \| matrix \| matrixrow \| set \| vector \| piecewise \| piece \| otherwise`
[17]	`_other`	::=	`condition \| declare \| sep`
[18]	`_qualifier`	::=	`lowlimit \| uplimit \| bvar \| degree \| logbase \| domainofapplication \| momentabout`
[19]	`_constantsym`	::=	`integers \| rationals \| reals \| naturalnumbers \| complexes \| primes \| exponentiale \| imaginaryi \| notanumber \| true \| false \| pi \| eulergamma \| infinity`

relations

[20]	`_relation`	::=	`_genrel \| _setrel \| _seqrel2ary`
[21]	`_genrel`	::=	`_genrel2ary \| _genrelnary`
[22]	`_genrel2ary`	::=	`ne`
[23]	`_genrelnary`	::=	`eq \| leq \| lt \| geq \| gt`
[24]	`_setrel`	::=	`_seqrel2ary \| _setrelnary`
[25]	`_setrel2ary`	::=	`in \| notin \| notsubset \| notprsubset`
[26]	`_setrelnary`	::=	`subset \| prsubset`
[27]	`_seqrel2ary`	::=	`tendsto`

operators

functional operators

[29]	`_funcop`	::=	`_funcop1ary \| _funcopnary`
[30]	`_funcop1ary`	::=	`inverse \| ident \| domain \| codomain \| image`
[31]	`_funcopnary`	::=	`fn\| compose`	/ general user-defined function is n-ary /

(note minus is both 1ary and 2ary)

arithmetic operators

[32]	`_arithop`	::=	`_arithop1ary \| _arithop2ary \| _arithopnary \| root`
[33]	`_arithop1ary`	::=	`abs \| conjugate \| factorial \| minus \| arg \| real \| imaginary \| floor \| ceiling`
[34]	`_arithop2ary`	::=	`quotient \| divide \| minus \| power \| rem`
[35]	`_arithopnary`	::=	`plus \| times \| max \| min \| gcd \| lcm`

calculus and vector calculus

[36]	`_calcop`	::=	`int \| diff \| partialdiff`
[37]	`_vcalcop`	::=	`divergence \| grad \| curl \| laplacian`

sequences and series

[38] _seqop ::= sum | product | limit

elementary classical functions and trigonometry

[39]	`_classop`	::=	`exp \| ln \| log`
[40]	`_trigop`	::=	`sin \| cos \| tan \| sec \| csc \| cot \| sinh \| cosh \| tanh \| sech \| csch \| coth \| arcsin \| arccos \| arctan`

statistics operators

[41]	`_statop`	::=	`_statopnary \| moment`
[42]	`_statopnary`	::=	`mean \| sdev \| variance \| median \| mode`

linear algebra operators

[43]	`_lalgop`	::=	`_lalgop1ary \|_lalgop2ary \| _lalgopnary`
[44]	`_lalgop1ary`	::=	`determinant \| transpose`
[45]	`_lalgop2ary`	::=	`vectorproduct \| scalarproduct \| outerproduct`
[46]	`_lalgopnary`	::=	`selector`

logical operators

[47]	`_logicop`	::=	`_logicop1ary \| _logicopnary \| _logicop2ary \| _logicopquant`
[48]	`_logicop1ary`	::=	`not`
[49]	`_logicop2ary`	::=	`implies \| equivalent \| approx \| factorof`
[50]	`_logicopnary`	::=	`and \| or \| xor`
[51]	`_logicopquant`	::=	`forall \| exists`

set theoretic operators

[52]	`_setop`	::=	`_setop1ary \|_setop2ary \| _setopnary`
[53]	`_setop1ary`	::=	`card`
[54]	`_setop2ary`	::=	`setdiff`
[55]	`_setopnary`	::=	`union \| intersect \| cartesianproduct`

operator groups

[56]	`_unaryop`	::=	`_funcop1ary \| _arithop1ary \| _trigop \| _classop \| _calcop \| _vcalcop \| _logicop1ary \| _lalgop1ary \| _setop1ary`
[57]	`_binaryop`	::=	`_arithop2ary \| _setop2ary \| _logicop2ary \| _lalgop2ary`
[58]	`_naryop`	::=	`_arithopnary \| _statopnary \| _logicopnary \| _lalgopnary \| _setopnary \| _funcopnary`
[59]	`_ispop`	::=	`int \| sum \| product`
[60]	`_diffop`	::=	`diff \| partialdiff`
[61]	`_binaryrel`	::=	`_genrel2ary \| _setrel2ary \| _seqrel2ary`
[62]	`_naryrel`	::=	`_genrelnary \| _setrelnary`

separator

[63] sep ::= _ey(sep)

leaf tokens and data content of leaf elements

[64]	`_mdatai`	::=	`(#PCDATA \| Presentation_tags)*`	/ note _mdata includes Presentation constructs here. /
[65]	`_mdatan`	::=	`(#PCDATA \| sep \| Presentation_tags)*`	/ note _mdata includes Presentation constructs here. /
[66]	`ci`	::=	`_sg(ci) _mdatai _eg(ci)`
[67]	`cn`	::=	`_sg(cn) _mdatan _eg(cn)`
[68]	`csymbol`	::=	`_sg(csymbol) _mdatai _eg(csymbol)`

condition - constraints constraints. contains either a single reln (relation), or an apply holding a logical combination of relations, or a set (over which the operator should be applied)

condition

[69] condition ::= _sg(condition) reln | apply | set _eg(condition)

domains for integral, sum , product

[70] _ispdomain ::= (lowlimit uplimit?) | uplimit | interval | condition

Note that apply is used in place of the deprecated reln in MathML2.0 for relational operators as well as arithmetic, algebraic etc.

apply construct

[71]	`apply`	::=	`_sg(apply) _applybody \| _relnbody _eg(apply)`
[72]	`_applybody`	::=	`( _unaryop _mmlarg )`	/ 1-ary ops /
			`\| (_binaryop _mmlarg _mmlarg)`	/ 2-ary ops /
			`\| (_naryop _mmlarg*)`	/ n-ary ops, enumerated arguments /
			`\| (_naryop bvar* condition _mmlarg)`	/ n-ary ops, condition defines argument list /
			`\| (_ispop bvar? _ispdomain? _mmlarg)`	/ integral, sum, product /
			`\| (_ispop domainofapplication? _mmlarg)`	/ integral, sum, product /
			`\| (_diffop bvar* _mmlarg)`	/ differential ops /
			`\| (log logbase? _mmlarg)`	/ logs /
			`\| (moment degree? momentabout? _mmlarg*)`	/ statistical moment /
			`\| (root degree? _mmlarg)`	/ radicals - default is square-root /
			`\| (limit bvar* lowlimit? condition? _mmlarg)`	/ limits /
			`\| (_logicopquant bvar+ condition? (reln \| apply))`	/ quantifier with explicit bound variables /

equations and relations - reln uses lisp-like syntax (like apply) the bvar and condition are used to construct a "such that" or "where" constraint on the relation . Note that reln is deprecated but still valid in MathML2.0

equations and relations

[73]	`reln`	::=	`_sg(reln) _relnbody _eg(reln)`
[74]	`_relnbody`	::=	`( _binaryrel bvar* condition? _mmlarg _mmlarg ) \| ( _naryrel bvar* condition? _mmlarg* )`

fn construct Note that fn is deprecated but still valid in MathML2.0

[75]	`fn`	::=	`_sg(fn) _fnbody _eg(fn)`
[76]	`_fnbody`	::=	`Presentation_tags \| container`

lambda construct - note at least 1 bvar must be present

[77]	`lambda`	::=	`_sg(lambda) _lambdabody _eg(lambda)`
[78]	`_lambdabody`	::=	`bvar+ _container`	/ multivariate lambda calculus /

declare construct

[79]	`declare`	::=	`_sg(declare) _declarebody _eg(declare)`
[80]	`_declarebody`	::=	`ci (fn \| constructor)?`

constructors

[81]	`interval`	::=	`_sg(interval) _mmlarg _mmlarg _eg(interval)`	/ start, end define interval /
[82]	`set`	::=	`_sg(set) _lsbody _eg(set)`
[83]	`list`	::=	`_sg(list) _lsbody _eg(list)`
[84]	`_lsbody`	::=	`_mmlarg*`	/ condition constructs arguments /
			`\| (bvar* condition _mmlarg)`	/ enumerated arguments /
[85]	`matrix`	::=	`_sg(matrix) matrixrow* _eg(matrix)`
[86]	`matrixrow`	::=	`_sg(matrixrow) _mmlall* _eg(matrixrow)`	/ allows matrix of operators /
[87]	`vector`	::=	`_sg(vector) _mmlarg* _eg(vector)`
[88]	`piecewise`	::=	`_sg(piecewise) piece* otherwise? _eg(piecewise)`
[89]	`piece`	::=	`_sg(piece) _mmlall _eg(piece)`	/ allows piecewise construct of operators /
[90]	`otherwise`	::=	`_sg(otherwise) _mmlall _eg(otherwise)`	/ allows piecewise construct of operators /

qualifiers - note the contained _mmlarg could be a reln

[91]	`lowlimit`	::=	`_sg(lowlimit) _mmlarg _eg(lowlimit)`
[92]	`uplimit`	::=	`_sg(uplimit) _mmlarg _eg(uplimit)`
[93]	`bvar`	::=	`_sg(bvar) ci degree? _eg(bvar)`
[94]	`degree`	::=	`_sg(degree) _mmlarg _eg(degree)`
[95]	`logbase`	::=	`_sg(logbase) _mmlarg _eg(logbase)`
[96]	`domainofapplication`	::=	`_sg(domainofapplication) _mmlarg _eg(domainofapplication)`
[97]	`momentabout`	::=	`_sg(momentabout) _mmlarg _eg(momentabout)`

the top level math element allow declare only at the head of a math element.

math

[101] math ::= _sg(math) declare* mmlall* _eg(math)

B Content Markup Validation Grammar