Macro processing

[Note: I did not have time to update this document to address certain subtle issues. I decided it was worth sending it in this form, which is good enough for getting across my proposal's basic feel, in time for phone discussion Thursday 7 March. I'll revise it soon to deal with these extra issues.]

(This document is essentially a subsection of "Extensibility" -- for now, please refer to that document for an introduction to macros, a rationale for them, and the specification of how the set of macro rules which apply to a given HTML-Math expression is to be found.)

Subsection: When Macro Processing Occurs

The HTML-Math parser receives as input a sequence of HTML-Math input characters, SGML markup tags defined by HTML-Math, and SGML elements defined by HTML and includable in HTML-Math. It tokenizes these into HTML-Math tokens, and then parses them into subexpressions (i.e. fully-expands them), as described in the "Syntax" document, and according to the applicable Math Syntax Model. The result is a subexpression tree, which could be represented directly as fully expanded HTML-Math.

After that, the HTML-Math renderer transforms the subexpression tree using the applicable Macro Rules, as described in the rest of this document.

Finally, the resulting macro-expanded subexpression tree is rendered, according to the applicable Rendering Style, as described in the "Rendering and layout primitives" document.

Subsection: What a Macro Rule Can Express, and How It Works

[Everything in this subsection is a proposal which has not yet been discussed. For convenience in writing, I'll give it outside of []s and use language as if it was already agreed upon. The parts that I am not proposing in a specific way will be inside []s.]

An HTML-Math Macro Rule is a transformation rule (or rewrite rule) for an HTML-Math subexpression tree.

It is specified by giving a pattern, which is simply an HTML-Math subexpression with certain elements designated as named formal parameters, and a result, which is another HTML-Math subexpression which might contain one or more copies of the named formal parameters.

[I'm not yet proposing a specific syntax in which to give a macro rule. I'll give one possible example syntax below.]

The named formal parameters are considered to be "terms" as opposed to "operators" by the syntax (just as are identifiers, number literals, and compound subexpressions), whether they occur in the macro pattern or the macro result expressions. [Therefore this proposal does not allow for "second-order macros". Whether this restriction is reasonable, and if not, avoidable, remains to be discussed.]

The macro rule matches any HTML-Math subexpression (called the "input expression") which matches the pattern, in the sense that there are subexpressions which could be substituted for the formal parameters such that the input expression would be equivalent to the substituted pattern (i.e. have the same fully expanded form).

If a formal parameter with a given name is used multiple times in a macro pattern, the same subexpression must be substituted for each occurrence during the matching. That is, the pattern will only match when the corresponding subexpressions of the input expression are the same.

The result of applying the macro rule to an HTML-Math subexpression which it matches is the macro rule's result, after making in it the same substitutions for named formal parameters which were used in matching the pattern. (If any of these named formal parameters are not present in the pattern, they are replaced with the "missing term" (see "Syntax").)

The result of applying the macro rule to an HTML-Math subexpression which it does not match is the original expression.

Note that full expansion of macro patterns, and/or results, and/or input expressions, does not affect the full expansion of the final result. In a typical implementation, all of these expressions will be represented as subexpression-trees which are identical for any HTM-Math expression and its fully expanded form.

Subsection: Scope of macro rules

[I propose that we provide a general scoping form, e.g.

   <mscope> ... </mscope>

(which, perhaps, can optionally have a NAME attribute), for use in all situations in HTML-Math in which definitions or other "state modifications" ought to have a limited scope. Then, any macro definition (or imported set of definitions, or designation of math syntax model, math rendering-style sheet, etc) will have as its scope everything from immediately after its occurrence in the input sequence, to the end of the smallest MSCOPE element which contains it.

There are other possible uses of an MSCOPE element -- e.g. we might provide an author-optional mechanism for declaring the scope of local math identifiers, and which instances of their use are considered to define them. This would make use of the NAME attribute mentioned above. All other uses of MSCOPE, if any, are discussed in the "Markup Tags" document.]

Subsection: Order of processing of multiple macros; avoiding infinite loops

Macro rule sets are equivalent to ordered lists of macro rules. How these lists are assembled from multiple sources is discussed in "Extensibility".

[The following issues are among the ones I did not yet have time to address:

]

Subsection: Discussion of the power provided by the above macro proposal

[I am not yet proposing a syntax for macro rules. But for examples, I'll use the following possible syntax:

     <mp> X
     <mde> P <as> R </mdef>
]

Note that macros, as defined here, can only transform entire, valid HTML-Math subexpressions, and they always generate entire, valid expressions as their result. In particular, all SGML begin/end tags are guaranteed to match in the result, even though the macro processing mechanism does not even know which tokens are such tags, since such tags are already matched in the macro pattern, the macro result template, and in all the expressions substituted for formal parameters in these.

Note also that this proposal is not restricted to any single syntax for a macro call, such as e.g. m(a,b) with a name m and formal parameters a, b. That kind of call could be defined, e.g. as

    <mdef> m( <mp>a , <mp>b ) <as> <mp>a + <mp> b </mdef>

which defines m(a1,b1) as expanding to a1+b1, but any other syntax can also be treated as a macro call, and indeed any HTML-Math subexpression could in theory be a macro call where the macro was "named" by any portion of its substructure, given the presence of the appropriate macro definition. This has the benefits that:

(end of "Macro processing")