Document Oriented Experience Report

Submitted by Nadia Swaby, Pratt & Whitney Canada

I heard that you were looking for some experience reports related to XML Schema in the publishing world. Here is my report.

I work for fairly large company in the Knowledge Management department. Basically, we manage most of the engineering documents. We started out using DTDs, but I am planning to implement XML Schema in the next year or so. Here is the logic behind this decision:

1. Support for namespaces

Our documents have lots of equations, but the original creator of the DTD decided not to use MathML because there was no tool support (we are using XMetal 3.1, but have upgraded to 4.5/4.6 and have purchased Design Science Mathflow). We are now finding that MathML is essential. I know you can use the MathML DTD and included in our existing one, but I find it easier just to import it in a schema. And our DTD contains two elements with that have the same name as elements in the MathML DTD. Some tools complain about this naming conflict (I can't recall if XMetal does, but I know XMLSpy points it out).

2. Restrict what text content appears inside elements

All of our documents have dates in them, usually issue date, revision date, and approval date. These elements all contain year, month, and day elements in their content model. In the DTD system, people can (and do) put anything in these elements (for example "Mar" in the month element instead of "03", which is the standard here). We do have guidelines and the people in my department tend to follow them, but we plan to allow the end user create their own XML documents. I know from experience that they don't always 'play by the rules'. Using schema validation is a good way to enforce content consistency.

Those are my two biggest reasons for using schemas. However, there is one big drawback: no entity support. We have a lot of standard text in documents such as legal statements, addresses, and other text. Legal statement can (and probably should) be standardized at the stylesheet level, but things like an address that may or may not appear in the document can't be. I am unsure yet how I will tackle this problem.

In summation, schema can be very useful in publishing, but the lack of entity support is a huge drawback. I hope that you find my experience useful at this conference.