Statement to W3C Compound Document Committee

Eliot Kimber (ekimber@innodata-isogen.com) and
Michael Priestley (mpriestl@ca.ibm.com)
on behalf of
The OASIS Darwin Information Typing Architecture Technical Committee
(http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=dita)

The subject of compound documents is of primary interest to the DITA Technical Committee. The Darwin Information Typing Architecture defines a set of techniques for using XML in order to enable the effective and efficient development of re-usable information components, primarily in the context of technical documentation, informative Web sites, and similar types of structured, topically-focused information for consumption by humans. This activity naturally involves the combination, whether syntactically or semantically, of elements from different name spaces and governed by different schemas.

The DITA architecture defines several features and techniques that are directly relevant to the subject of compound documents:

As we understand the scope of the Compound Document workshop, it is focused primarily on issues surrounding XML documents that have elements from different name spaces (and thus implicitly, different schemas) and what that means.

Within this scope there are a number of important use cases that must be considered, including the implications for processors that must make sense of compound documents, how communities of interest can define and impose constraints on what combinations are allowed, and how to do controlled specialization of element types in a way that does not, for example, require the creation of overarching XSD schemas that define the specializations as part of the base element type definitions.

DITA has a part to play in each of these areas:

DITA works well for the creation of compound document types where each of the modules involved was defined within the DITA architecture. It does not address how to incorporate markup that was defined outside the DITA architecture.

The DITA TC offers the DITA architecture as an example of a simple, practical, and proven way to combine markup from various domains and information types into cohesive document types that can be reused, related, and published in a controlled and predictable way. The DITA specialization architecture has been in use for several years in a number of enterprises and non-commercial communities. We are hopeful that the Workshop may find some value in the DITA architecture as a way to address at least some of the key use cases involving compound documents.

A note on the term "compound document"

In addition, the DITA TC would like to comment on the use of the term "Compound Document." We find the definition as used in the Workshop announcement a sensible one but we observe that there are other common meanings for compound document and we think that it is an appropriate time for the W3C to clarify the various forms of "compound document" that are starting to become an important focus of W3C and related activities.

As far as we know, no existing XML-related standard, in any recognized standardization domain, codifies the term "compound document." Nevertheless there are a number of important specifications that address various aspects of compound documents.

In particular, we observe that the term "compound document" is often used to refer not to single instances that combine elements from different name spaces but systems of independent documents linked together in order to define a single unit of processing, delivery, or management (i.e., hyperdocuments explicilty created and processed as a single unit of processing, as opposed to ad-hyperdocuments created through the creation of uncoordinated linking actions). Both the XLink and XInclude specifications define mechanisms for creating this type of compound document, as does the current DITA specification (through its map mechanism).

This sense of compound document is largely orthogonal to the question of combining elements from different schemas: most existing systems that create this type of compound document do so in the context of a single document type. [However, because this type of compound document is created by semantic links (and not syntactic inclusion) it is also quite likely that such a compound document might be composed of documents that are, individually, governed by different schemas.]

Therefore we urge the W3C to clarify its use of the term "compound document" to clearly distinguish at least these two senses in order to establish a clear and unambiguous standard vocabulary by which we, as a community, can communicate efficiently and effectively on this important and challenging subjects. We don't have a strong opinion on what the terms should be, although within our community, the term "compound document" is much more often used in the multi-instance, hyperdocument sense. But we are more concerned with having a vocabulary than in the particular terms chosen.

In our individual practices as technical documenters and developers of systems that support technical documentation activities, now that basic issues of document representation, processing, and rendering are largely solved (or at least well provided for by established standards and implementing tools), we are seeing issues of both multi-namespace single-instance documents and multi-instance compound documents coming to the fore as the critical issues to be addressed, both in the standards domain and in systems being implemented. Having a clear vocabulary with which to discuss the issues and business objects involved will help tremendously as the community goes forward to find solid and standard solutions to these challenges.