Possible Implementations of ITS Requirements

Overview

The purpose of this document is to introduce alternative or sometimes complementary implementations of ITS requirements which should be discussed before writing the actual specification. As an template document, the file template.xml (XHTML version: template.html) is used. All implementation use the requirement for annotation markup as a sample requirement.

Note that this document is concerned only with the tag set implementation, i.e. the recommendation we are going to produce. It is not concerned with the working group note we will create.

Separation of Aspects of ITS

The alternative implementations differ on how they deal with the following aspects which should be concerned in the specification:

the formal definition of the implementation (schema language independent, possibly based upon "data categories")
the prose description
the schema language specific implementation (It would be good if (1) could be the basis for the automatic creation of (3). )
implementation as modularizations for existing schemas (e.g. XHTML, DocBook, OpenOffice).

Why these separations? In the requirements document, we describe a lot of potential audiences for ITS:

Designers of content-related formats
Developers of schemas in various formats
Developers of XML authoring tools
Authors of XML content
Developers of localization tools
Localizers involved with XML
Developers of Internet specifications at the World Wide Web Consortium and related bodies

Audiences 4 and 6 do not care about schema languages, modularizations, etc. They want a clear description what markup exists and how to use it. Audiences 1 and 2 have to care about where to integrate the ITS markup into their schemes, i.e. if ITS markup should be allowed to contain markup from the host schema or not. Audiences 3 and 5 need to know how to process ITS information. The audience 7 needs to know possibly everything, depending on the specification into which ITS should be integrated. Without a clear separation of the various aspects of ITS, the ITS spec will be very hard to read for some or all of these audiences.

On fixed modularizations

Thd description of fixed modularizations seems to be necessary for two reasons. First, ITS markup might contain itself tags from the "host" schema. For example, in the case of ruby, the content of the <rb> element in not just character data, but inline xhtml markup. Without a fixed modularization, there is the danger that every schema author describes his own modularizations for the same vocabulary. Another problem with existing schemas is that they differ with respect to schema design. E.g. some consist only of global element declarations (i.e. the DTD way, so-called "salami slice" approach), some have global type declarations and local element declarations (so-called "venetian blind") and so on. See the xml schema tutorial for various design approaches, p. 119 ff.

The second reason is trivial but important: Fixed modularizations in popular formats like XHTML, DocBook and OpenOffice will lead to a widespread adoption of ITS. Hence, it will be easier for the tool developers (audiences 3, 5 and possibly 7) to create ITS processing software.

Implementation of "everything": Data categories, Schema Language specific Implementations and fixed Modularizations

Example: HTML version, xml spec version.

This version implements every aspect of ITS described above, i.e 1-4. Just for illustration, for one schema language (xml schema) there are multiple implementations: Example 2 is an implementation as an xml schema without named types, example 3 is an implementation as an XML Schema with named types and global elements. The implementations are linked to the data category descriptions. The same should happen for the modularizations, see sec. 3.1.3.

Implementation of "Everything" in Separated Sections

Example HTML version, xml spec version.

This version is just a reordered version of the previous version. Data categories are separated from their implementation in different sections.

Implementation only of "Prose"

Example: HTML version, xml spec version.

This version contains only prose descriptions of the data categories. Implementations might be given as examples, but they do not belong to the normative parts of the specification.

Implementation relying on Namespace Sectioning

Example HTML version, xml spec version.

Here the namespace sectioning mechanisms provided by NRL or NVDL are used, instead of fixed modularizations. Advantage: There is no need to change the existing schemas . Disadvantage: Additional software is necessary (i.e. in addition to a schema validator).

Formal Description and Interrelation of Data Categories

To ease the task of reusing and combining ITS in various scenarios, it would be useful to have more than just a "list" of data categories. Ordering of data categories, relying e.g. on RDF based statements or parts of the TEI odd model, seems to be highly valuable. For this possible implementation there are no examples yet.

Version: $Id: Overview.html,v 1.1 2005/09/19 10:03:36 fsasaki Exp $