Serving XHTML with math: a recipe for Apache

Part of Tutorials

Author(s) and publish date

By:
Published:
Skip to 1 comments

It is slowly becoming easier to include mathematics in Web pages. Ten years ago, the only way was still to write in LaTeX and apply some software that turned the result into HTML with images of math. Later, some plug-ins and JavaScript-based tricks helped a bit, but now several browsers and other HTML viewers have started to understand MathML directly. Especially if you limit yourself to the subset of MathML called the “MathML for CSS Profile,” it works quite nicely. And with MathML-compatible mark-up an integral part of the next (fifth) version of HTML, that is only going to improve.

W3C's Math working group maintains a page with information about the various ways to put math on the Web and this article is only about one aspect: configuring an Apache server to serve XHTML. That won't help people with older user agents to see the math, but it is necessary for newer software.

Until HTML version 5 is ready, the only way to put math in HTML is to use XHTML instead. But not all user agents accept XHTML. Which gives us at least three choices: offer them the XHTML file anyway (they might not be able to do anything more with it than save it to disk), give them an alternative without math instead (the best solution, but it requires writing a second file), or give them the same file and pretend it is HTML. That last solution isn't very nice: we're serving invalid HTML to those clients. But the XHTML specification has some guidelines for how to limit the damage with a reasonable chance that the result is readable. The math won't show correctly, but everything else probably will. So that is the solution we assume in this article.

Aside: Some browsers don't distinguish HTML and XHTML. They assume that an author who used a <math> tag in HTML just forgot to use XHTML (or is trying to use HTML5 already). But Opera (the current version is 11) is an example of a browser that does not do that. It only treats a document as XHTML if you say that it is XHTML. Which is probably the better approach. At least it complies with the standards.

User agents send a so-called Accept header in each request to a Web server with the list of formats they understand. Each format is identified by its Internet Media Type. The list typically includes things like PNG (“image/png”), JPEG (“image/jpeg”) and, of course, HTML (“text/html”). User Agents often also indicate which formats they prefer over others, but we'll ignore that here, for simplicity. Specifically, we'll check the list for XHTML (“application/xhtml+xml”).

Microsoft's Internet Explorer version 8 is an example of a client that does not accept XHTML, unless the browser has been extended with a plug-in that handles such files, such as MathPlayer.

To summarize: If a request to our Web server indicates that the client accepts XHTML, we'll respond with our XHTML file and label it correctly, i.e., as application/xhtml+xml; if not, we'll send the XHTML file anyway, but tell the client that it is HTML, i.e., text/html.

Different Web servers have different ways to make that happen. Here is a recipe for the Apache server. It relies on Apache's mod_rewrite module. That is an optional module, so make sure that it is enabled. (It usually is.)

We're assuming that our XHTML files have names that end in the extension “.xhtml”. If the extension is different, you'll have to modify the first and third lines in the code below.

In the same directory as those files, we create a file called “.htaccess” with the following content. (If the file already exist, you'll have to find some way to integrate these lines with what is already there. You may need to read the Apache documentation…)

AddType application/xhtml+xml .xhtml
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} .xhtml$
RewriteCond %{HTTP_ACCEPT} !application/xhtml+xml
RewriteRule .* - [type=text/html]

It may be necessary to add a RewriteBase directive as well, depending on how the Web server is set up. If in doubt, add

RewriteBase /path/to/my/files

where “/path/to/my/files” is the part of the URL after the name of the server and before the actual file name. (With a slash at the start and not at the end.)

The first line above tells the Apache server that all files ending in “.xhtml” are XHTML files. The other lines define the exception: when the name of the file that was requested ends in “.xhtml” and the client does not accept “application/xhtml+xml”, we return the file with a media type of “text/html” instead. Note that the “!” on the fourth lines means “not.”

For an example of this recipe in action, try the Math on the Web page. With most recent browsers, the three examples in the middle of the page will show up as actual mathematical formulas. User agents that do not understand the math will show just a string of letters there, but should still show the rest of the page correctly. If your browser offers something like a “View document information” or “Page information” menu, you can check what type the document has: text/html or application/xhtml+xml.

For a page such as this one, where the correct display of the formulas is desired, but not essential, this is good enough.


And if the degraded display of the math is not good enough? Then you will have to do the extra work and write an HTML version with the math replaced by something else: images (such as made with the already mentioned LaTeX2HTML), some JavaScript to simulate math with HTML (such as MathJax), or a textual description of the formulas. The choice depends on the Web clients you want the HTML to be used by.

No need for the “Rewrite…” rules in that case. Just give the XHTML version the file extension “.xhtml” and the HTML one “.html” and let Apache do the rest.

Related RSS feed

Comments (1)

Comments for this post are closed.