W3C Math Working Group XML Query Language Requirements
Draft for Working Group Review November 18, 1998
-
Editors:
-
Stephen Hunt [Lead], Wolfram Research
-
Roger Hunter, MacKichan Software,
Inc.
Abstract
This document details the W3C Math Working
Group's requirements for the XML
Query Language effort. Querying MathML requires a pattern language
for both element structure and content to return a subtree of the MathML
expression tree. The pattern language could be derived from XSL patterns
by allowing multiple branches and positions to be specified relative to
any node and regular expression matching on content.
Summary of Requirements
-
Pattern language - to identify both element
subtree structure and content.
Pattern language
To illustrate the MathML requirement, draft XSL patterns are extended
in a manner consistent with the XQL proposal. The extension is used merely to provide an example of the MathML
requirement and is not an explicit recommendation in itself.
Draft XSL patterns use a directory metaphor to identify a pattern
relative to some node in the expression tree: parent/child, ancestor//descendent.
Extending this specification to include multiple branches and positions
at arbitrary depth relative to any given node would allow the structure
of a MathML subexpression to be identified.
Extension 1: Multiple branches
The qualifier notation can be used to constrain a node to contain branch
patterns. Ordered branch sequences can be delimited by a semicolon and
unordered branch patterns enforced by the 'and' operator.
A MathML pattern is a hierachical structure. By recursively descending
the tree the subtrees are tested for the pattern. The return value is the
subtree. The pattern may contain an arbitrary number of levels. When
you get a match you can stop and return the match or continue and return
the first n matches or all matches from all levels.
Example: b^2
Presentation markup:
<msup>
<mi>b</mi>
<mn>2</mn>
</msup>
Pattern:
sup[mi;mn] - msup element with children mi followed by mn
sup[mi and mn] - msup element with children mi and mn
Example: b^(2a)
<msup>
<mi>b</mi>
<mrow>
<mn>2</mn>
<mo>⁢</mo>
<mi>a</a>
</mrow>
</msup>
Pattern:
sup[mi;mrow/mi] - mrow/mi matches the second branch above
sup[mi and mrow/mi]
Extension2: Position
The ; sequence operator implicitly identifies position. It can be
extended to express sequences of elements/content at a given level.
p;q |
p immediately preceeds q |
p;;q |
p preceeds q |
p;;;q |
p preceeds q, either p or q may be empty |
Example: 4 a c
Presentation markup:
<mrow>
<mn>4</mn>
<mo>⁢</mo>
<mi>a</mi>
<mo>⁢</mo>
<mi>c</mi>
</mrow>
Pattern:
mrow[*;;mi;;*] - only matches first of the two mi
mrow[*;;mi;;;*] - matches both first and second mi
Extension 3: Content
Content can be queried by regarding it as data with no element head.
[content] |
identifies content of an element |
[regular expression] |
identifies content matching regular expression pattern |
Example:
<mn>4</mn>
Pattern:
mn[[4]]
Putting it all together: A compound example.
Example: b^2 - 4 a c
<mrow>
<msup>
<mi>b</mi>
<mn>2</mn>
</msup>
<mo>-</mo>
<mrow>
<mn>4</mn>
<mo>⁢</mo>
<mi>a</mi>
<mo>⁢</mo>
<mi>c</mi>
</mrow>
</mrow>
Query: Search for superscript and a descendant product.
Pattern:
*[msup;*;mrow[*;;mo[[⁢]];;*]]
or
*[msup;*;mrow[*;;mo[[*Times*]];;*]]
References
-
Mathematical Markup Language (MathML) 1.0 Specification
-
http://www.w3.org/TR/REC-MathML
-
The W3C Query Languages Workshop Call for Participation
-
http://www.w3.org/TandS/QL/QL98/cfp
- XML Query Language: A Proposal
-
http://www.w3.org/Style/XSL/Group/1998/09/XQL-proposal.html
Maintained by:
Stephen Hunt(Math
Working Group XML-QL Requirements Lead).
Angel Diaz(co-chair
for the Math working group).
Patrick Ion(co-chair
for the Math working group).
W3C contact for math: Dave Raggett.
Last revised: 1998/11/17 by aldiaz