W3C

NOTE-SOX-19980930

Schema for Object-oriented XML

Submitted to W3C 19980915

This version:
http://www.w3.org/TR/1998/NOTE-SOX-19980930
Latest version:
http://www.w3.org/TR/NOTE-SOX
Authors:
Matt Fuchs (Veo Systems) <matt@veosystems.com>
Murray Maloney (Muzmo Communication) <murray@muzmo.com>
Alex Milowski (Veo Systems) <alex@veosystems.com>

Status of this document

This document is a NOTE made available by the W3 Consortium for discussion only. This indicates no endorsement of its content, nor that the Consortium has, is, or will be allocating any resources to the issues addressed by the NOTE.

This document is a submission to W3C from Veo Systems Inc.. Please see Acknowledged Submissions to W3C regarding its disposition.

Comments on this document should be sent to schema@veosystems.com.


Abstract

This document proposes a schema facility, Schema for Object-oriented XML (SOX), for defining the structure, content and semantics of XML documents to enable XML validation and higher levels of automated content checking. The SOX proposal is informed by the XML 1.0 [XML] specification as well as the XML-Data submission [XML-Data], the Document Content Description submission [DCD] and the EXPRESS language reference manual [ISO-10303-11].

SOX provides an alternative to XML DTDs for modeling markup relationships to enable more efficient software development processes for distributed applications. SOX also provides basic intrinsic datatypes, an extensible datatyping mechanism, content model and attribute interface inheritance, a powerful namespace mechanism, and embedded documentation. As compared to XML DTDs, SOX dramatically decreases the complexity of supporting interoperation among heterogenous applications by facilitating software mapping of XML data structures, expressing domain abstractions and common relationships directly and explicitly, enabling reuse at the document design and the application programming levels, and supporting the generation of common application components

A SOX document, or schema, is a valid XML document instance according to the SOX DTD, that represents a complete XML DTD-like structure. It has a document root element, and a representation of syntax that one would expect from a complete DTD, symbolically generated through the XML document instance.


Table of Contents

Appendixes


Introduction

"In SGML the 'DTD' defines, for an SGML element, what possible other elements may be nested inside it.  For example, in an invoice, it may specify that the signing authority must be either Tom or Joe. It may specify that an item can be any part number or any accessory number or any book number. Checking the SGML validity of a document is a process which can be done automatically from the DTD. This is a check at a certain low level in that it does not verify semantic correctness, only structural correctness.  But the structural constraints alone are useful in many ways. For example, a  user interface for constructing a document can be generated automatically from the structural constraints.

"We plan to introduce more powerful languages for describing not only the structure of a document, but the semantics to an extent that not only can checking be automated to a higher level, but also so can the processing of a document and reasoning about its contents be automated. ..." From Web Architecture: Extensible Languages [WEBARCH-EXTLANG], Tim Berners-Lee and Dan Connolly

Automated processing of business documents in large-scale electronic commerce environments requires rigorous definition of the document structure, content and semantics to enable efficient software development processes for distributed applications. XML offers the Document Type Definition (DTD) as a formalism for defining the syntax and structure of XML documents. However, experience has shown that XML DTDs are not sufficient to specify content or semantics. Moreover, the fact that XML DTD syntax is incompatible with XML document syntax increases the complexity of supporting interoperation among heterogenous applications. Therefore, a schema facility is required to enable XML validation and higher levels of automated content checking by facilitating software mapping of XML data structures, supporting the generation of common application components, and enabling reuse at the document design and the application programming levels.

Schema for Object-oriented XML (SOX)is now being proposed not only as an XML instance replacement syntax for SGML [8859-1]and XML [XML] document type definitions, but a modelling language for information modeling itself. Information modeling is the domain, and therefore the domain-specific constructs provided are those which aid in that task. SOX provides intrinsic datatypes, an extensible datatyping mechanism, content model and attribute interface inheritance, a powerful namespace mechanism, and embedded documentation.

SOX documents can be operated on by a SOX processor to produce many different types of output targets. Transformation of SOX documents will yield XML DTDs and object-oriented language classes to facilitate the develpopment of intelligent applications, such as those needed to perform electronic commerce, for example. Other output targets of a schema include documentation derived from the documentation-based elements in SOX itself, and user interface components. Further output targets are yet to be defined, but the inherent flexibility of this schema language allows for many other options.

Origins

This submission is a collaborative work based on implementation experience at Veo Systems Inc., bringing together experts from the complementary disciplines of electronic commerce, markup languages, formal language theory, SGML systems development, and distributed software systems development. The development of this schema language was begun by Murray Maloney in December, 1997 to satisfy the need for a single XML-based language capable of expressing sufficient information to define simple or complex data definitions, structures and formats, universally usable names or identifiers, and documentation. In early 1998, Matt Fuchs began implementing a processor to derive DTDs, documentation, and programming language interfaces from Common Business Library schemas defined by Terry Allen. Based on experience gained building a processor that generates Java beans from SOX documents, and also his earlier work at Disney and New York University, Matt Fuchs invented many object-oriented extensions that make the SOX inheritance features possible. Alex Milowski suggested the concept of parameterized element types and a syntax for encoding this concept that led to a further refinement of the object-oriented extensions. Terry Allen's practical experience creating schemas fed back into an ongoing refinement. The software development team at Veo, inspired by CTO Bart Meltzer, provided critical feedback.

The result, Schema for Object-oriented XML (SOX), is now offered to the WorldWide Web Consortium (W3C) as a formal submission. We trust that you will deem it worthy of consideration and deliberation in the XML Activity's upcoming round of working drafts on schemas, namespaces, data models, and datatypes.

Goals

The goals of SOX are:

  1. Schema language declaration constructs should be useful for the purpose of modeling markup relationships.
  2. SOX documents, as compared with XML DTDs, should enable more efficient software development processes for distributed applications and dramatically decrease the complexity of supporting interoperation among heterogenous applications.
    • SOX should enable software mapping from SOX documents into data structures in relational databases, common programming languages, and interface definition languages (such as Java, IDL, COM, C and C++), resulting in usable code.
    • SOX should enable reuse at the document design and the application programming levels
    • SOX should be able to express domain abstractions and common relationships among them directly and explicitly. (e.g., subtype/supertype, etc.)
    • SOX should support the generation of common application components (marshal/unmarshal, programming data structures) directly from SOX documents.

Requirements

The requirements for SOX are:

  1. SOX shall use XML syntax and be expressed in valid instances according to a valid XML DTD.
  2. SOX and SOX documents shall be interoperable with XML software and conventions.
  3. SOX shall enable a software mapping from SOX documents into an XML DTD, and from an XML DTD into a SOX document without losing the grammatical structure of the original DTD.
  4. SOX shall provide an extensible datatyping mechanism.
  5. SOX shall comply with and be compatible with applicable W3C recommendations, IETF RFCs and ISO Standards, and Proposed Standards.
  6. SOX documents shall provide support for embedded documentation.
  7. SOX documents shall be human-readable.

Features

SOX is more expressive than XML DTDs in the following critical areas:

base element types

SOX provides for parameterized base element types that can be used to build a foundation of regular patterns in your SOX documents. It allows you to create fully parameterized and complex types to describe the storage patterns that best suit your information-based applications. You can define patterns such as tuples and triples, tabular and columnar data, business documents, indexes, bibliographies. Parameterization allows you to reuse the structure with different content model atoms in another document. Extending base element types allows you to add attributes or further specialize the attribute datatype, enumeration and presence. Code reuse on extended base element types is much higher than without.

datatypes

SOX offers an extensive and extensible set of datatypes that may be applied to data content elements and attribute types. The purpose of datatypes is to provide a contract between parties as to the constraints that are applicable to data content in elements and attribute values. These constraints may be used by a content validation engine, prior to dispatch or upon receipt of an XML document or by user interface methods.

There are three varieties of datatypes in SOX documents: scalar datatypes, enumerated datatypes and format datatypes. Scalar datatypes are derived from the basic number datatype, and support specification of the number of digits and decimal places, minimum and maximum value range, and a mask. An enumerated datatype may be derived from any of the intrinsic datatypes, and may specify an enumeration of valid values. A format datatype may be derived from any of the intrinsic datatypes, and must specify a mask.

SOX provides an extended list of intrinsic datatypes for attributes. Datatype extensibility is built upon a basic set of datatypes (binary, boolean, char, date, number, string, time) commonly used in many programming environments. An extensive list of intrinsic datatypes includes derivations of the intrinsic datatypes, including specializations of numbers, dates and strings. User-defined datatypes may be defined by specifying a base datatype, scale parameters or an enumeration of values, and a lexical format.

See Datatypes, Datatype masks and Datatype library for complete details.

documentation

Definitions provide for accompanying documentation through the intro and explain elements. Permitted within these two element types is a collection of familiar and easy to use HTML [HTML-4] element types. Anybody who writes HTML today will be able to write SOX documentation. Moreover, the application W3C's page authoring guidelines [WAI-PAGEAUTH] for HTML facilitates accessibility. The importance of the embedded documentation technique, or Literate Programming, must not be overlooked. In the right hands, this technique can be used for design, implementation and testing in both rapid prototyping and large-scale development projects.

entity simplification

SOX offers a reduced and restricted subset of the types of entity functionality that is offered by XML. The requirements for XML-style parameter entities have been addressed, either through specializations of some XML entity capability in distinct element types, or by introduction of new language features (such as element and attribute inheritance, enumerations and datatypes), that obviate the need for XML-style parameter entities. In particular, the parameter definitions are constrained to contain only a content model atom. The parameter and paramref element types can be used to define and reference a content model fragment, in much the same way that a parameter entity might be used.

Parsed and unparsed entities may be defined in SOX documents. Entity support is provided to enable simple mapping to/from XML DTDs. There is, however, some question as to the value and life expectancy of the XML unparsed entity approach, which uses a baroque and indirect definition and reference mechanism.

enumerations

In SOX, any attribute or datatype may provide an enumerated list of values for selection. XML provides enumerated lists only for NMTOKEN attributes.

hypertext

In SOX, URIs, URLs, and URNs are provided as a matter of course. Support for HTML anchors and XML Linking is facilitated through inheritance and specialization of hypertext attributes that can be conveniently arranged in neat attribute interfaces.

inheritance

In SOX, element types may inherit their content models and attribute definitions directly from another named element type. An element type may also inherit and extend an attribute list. Specialization of attribute definitions allows refinement and restriction of attribute datatype, enumeration list and default value. Additionally, an attribute value may be defined to be inherited from the identically named attribute on a parent or older ancestor element. Thus, for example, namespaces can be inherited from superordinate elements.

namespace support

The SOX namespace is fully and precisely defined. Objects from any identifiable namespace may be used in building a SOX document. That is, any element, attribute, datatype, enumeration, entity, interface, notation, parameter, or processing instruction may be imported from any namespace.

XML syntax and validation
A SOX document is a valid XML document, according to the SOX DTD. The designer of a schema, or schemographer, is free to employ the same XML tools used for traditional XML documents. This means that a SOX document can processed by a validating XML parser, formatted according to an XSL stylesheet, and managed by any DOM-compliant or SAX-compliant application.
SOX also introduces new levels of syntax checking and verification.

Future work

Pending future developments in the evolution of XML and its related specifications, this submission does not include several important features that were deemed desirable.

The & connector
Object-oriented software and relational databases tend to favor unordered collections of data in class members or table columns. Moreover, many data-intensive applications have little need for the specification of data sequence as publishing applications require. However, previous experience with the SGML & connector has proven that parsers, which require deterministic content models, find it difficult manage the combinations that large "and" groups necesitate. As a result, the & connector was not included in this submission. However, further discussion is warranted.
Validation and well-formededness
Validation is an absolute requirement for business documents used in large e-commerce applications. However, we discovered a need, at times, to be able to escape from validity checking temporarily to allow well-formed content. It seems useful to be able to specify that the content of a given element is well-formed, and to allow a validating processor to switch to well-formedness mode, returning to validation mode when the well-formed element has been processed. However, this would be incompatible with [XML]. As a result, a wfxml content specification, akin to empty or any, was not included in this submission. However, further discussion may lead to a proposal for an amendment to XML 1.0.
XML linking
Consideration was given to including base support for XML linking [XLink]. However, after due consideration, this specification was deemed too immature. As a result it was not included in this submission.
XML pointers
Consideration was given to including support for XML pointers [XPointer] as an intrinsic datatype. However, after due consideration, this specification was deemed too immature. As a result it was not included in this submission.

Terminology

The terminology used to describe SOX documents is defined in the body of this specification. The terms defined in the following list are used in building those definitions and in describing the actions of a SOX processor:

may
Conforming SOX documents and processors are permitted to but need not behave as described.
must
Conforming SOX documents and processors are required to behave as described; otherwise they are in error.
error
A violation of the rules of this specification; results are undefined. Conforming software may detect and report an error and may recover from it.
fatal error
An error which a conforming SOX processor must detect and report to the application.
match
(Of strings or names:) Two strings or names being compared must be identical.
for XML compatibility
A feature included solely to ensure that SOX remains compatible with XML.

Structure of a SOX document

This section describes the basics of SOX. Before getting into the technical details, we examine a SOX document example. This is a high-level view of what a SOX document looks like and what the building blocks are. Some terminology and definitions will be introduced here.

The example presented here is a memorandum. This example was chosen because most people are familiar with the components of a memorandum. Most readers will be better able to understand the concepts that are being presented if they are not burdened by having to also try to understand the document type that is being modelled.

A version of this example without annotations is available in a non-normative appendix.

<schema name="memo" namespace="http://www.veosystems.com/schemas/memo.xml">
<h1>Memo Document Type</h1>

Every SOX document begins with the root element schema, and a top-level heading that provides a title for the SOX document. The schema element may be used to establish the namespace identifier for a SOX document.

<h2>Definitions</h2>
<intro>
<p>...</p>
<ul>
<li>...</li>
<li>...</li>
</ul>
</intro>

Lower-order headings with intro elements may be interspersed among the SOX document's defining elements to provide bridging titles and an introduction. Aside from a handful of custom elements, SOX documentation uses familiar HTML element types for convenience in cut/paste operations, training of average designers and engineers, and straightforward conversion from SOX documents to HTML documents.

The rules for including documentation in SOX documents are fairly simple:

<h3>Memo element type</h3>

<elementtype name="memo">

<explain>
<title>Memo Document</title>
<synopsis>A simple, useful memo.</synopsis>
<help>
<p>Fill in attributes, enter paragraphs, lists and images, and press SEND.</p>
</help>
<p>A memo consists of six required fields and a body.</p>
</explain>

Here, we are defining an element type whose name is "memo".

    <model>
        <sequence>
            <element name="to"/>
            <element name="from"/>
            <element name="cc"/>
            <element name="subject"/>
            <element name="file"/>
            <element name="date"/>
            <element name="body"/>
        </sequence>
    </model>
</elementtype>

Our memo's content model is a sequence of seven subordinate elements. As you can see in the following fragment, the model of the majority of these elements types simply contain text strings.

<h3>Memo fields</h3>

<elementtype name="to">      <model><string/></model>    </elementtype>
<elementtype name="from">    <model><string/></model>    </elementtype>
<elementtype name="cc">      <model><string/></model>    </elementtype>
<elementtype name="subject"> <model><string/></model>    </elementtype>

But notice here that the specification of the file and date elements' string content are slightly different. The file element's string content specifies its datatype attribute to be number. That means that the value must be a number. The datatype of the date element's string content is specified to be a date. That means that the content must match the datatype definition for calendar dates.

<elementtype name="file">
    <model><string datatype="number"/></model>
</elementtype>

<elementtype name="date">
    <model><string datatype="date"/></model>
</elementtype>

The body of the memo is a bit richer, allowing a choice of paragraphs, lists and images. The value of the occurs attribute specifies the minimum and maximum occurrence, or number of times, that the choice group may be used.

<elementtype name="body">
    <model>
        <choice occurs="1,*">
            <element name="p"/>
            <element name="list"/>
            <element name="image"/>
        </choice>
    </model>
</elementtype>

The content model of a paragraph is simply a string.

<elementtype name="p">
    <model>
        <string/>
    </model>
</elementtype>

Getting a bit more creative, and a bit proscriptive, we require that a list have at least three items and no more than nine. (This is a fairly typical editorial style rule in many organizations. We are just using it here to demonstrate.)

<elementtype name="list">
    <model>
        <element name="item" occurs="3,9"/>
    </model>
</elementtype>

Now, we want to an instance of an item to be just like an instance of a paragraph, so we say so to make it so.

<elementtype name="item">
    <instanceof name="p"/>
</elementtype>

The image is an empty element, and the required value of its src attribute must be a URI according to the datatype.

<elementtype name="image">
    <empty/>
    <attdef name="src" datatype="URI">
        <required/>
    </attdef>
</elementtype>
</schema>

But wait, there's more!

Even if you can already see that simple things are fairly simple to do, and you thought that you had seen enough to sell you on the virtues of writing SOX documents, there is in fact much more in the Schema for Object-oriented XML. Read on!


Element type definitions

In SOX documents, element type definitions reproduce the expressiveness of XML element type declarations using explicit element and attribute markup. An element type may be defined, as shown in this example, by using the elementtype element with the required name attribute, and a subordinate model, instanceof or extends elements:

<elementtype name="inline">
    <model>
        <string/>
    </model>
</elementtype>

A mechanism for attaching attributes to an element type is described later in Attribute definitions.

Element type name

The name of an element type may be any valid unqualified XML element type name. The name must be unique among the names of element types defined in the current SOX document.

An element type may be referenced by the element, extends and instanceof elements. Provision for namespace qualification of element type references is discussed in Names and namespaces. The local part of an element type name is specified when defining an element type.

It is a fatal error to re-assign an element name, or to reference an element that has not been defined.

Content model

The content model of an element type defines the structure and composition of an element of that type in an XML instance. The definition of a content model in SOX documents extends the expressiveness of that in XML DTD by providing greater specificity of the minimum and maximum number of times some content model atom may be repeated. This allows a schema designer more precise control than that offered by XML's *, ? and + occurrence indicators.

Element content model atom

In the following example, the definition of the content model for a list element type specifies that it contains a minimum of 3 and a maximum of 9 item elements.

<elementtype name="list">
    <model>
        <element name="item" occurs="3,9"/>
    </model>
</elementtype>

String content model atom

In the following example, the b element type's content model is simply string content.

<elementtype name="b">
    <model>
        <string/>
    </model>
</elementtype>

In this example, the size element type's content model is string content that is constrained to be an int.

<elementtype name="size">
    <model>
        <string datatype="int" />
    </model>
</elementtype>

In this example, the postcode element type's content model is string content that is constrained to match the mask (e.g., L1W 3K6)

<namespace name="canada" namespace="www.canadapost.ca/schemas/postcodes.xml" />

<elementtype name="postcode">
  <model>
    <string>
        <mask>A#A #A#</mask>
    </string>
  </model>
</elementtype>

In this example, the conference element type's content model is string content with a default value of "XML Developers' Days". A SOX processor must provide support for default, inherited, and fixed presence elements when modelling a string. This feature is useful for data entry applications such as a program for forms entry or a text editor. Such an application could insert the default, inherited or fixed value for the form field or when the element is inserted within a document.

<namespace name="gca" namespace="www.gca.org/schemas/xmldevdays.xml" />

<elementtype name="conference">
  <model>
    <string>
        <default>XML Developers' Days</mask>
    </string>
  </model>
</elementtype>

Mixed content model atom

In this example, the p element type's content model is mixed content.

<elementtype name="p">
    <model>
        <mixed>
            <element name="a"/>
            <element name="b"/>
            <element name="i"/>
        </mixed>
    </model>
</elementtype>

Note: Even though mixed content consists of string and element content, the string element is not mentioned in the mixed content model. This is partly an optimization and largely a constraint to prevent inadvertent specification of a datatype, mask and presence for string content within mixed content. The implications of such a combination are unclear, so it is best avoided.

Choice content model atom

In this example, the dl element type's content model specifies that dt or dd elements are allowed any number of times.

<elementtype name="dl">
    <model>
        <choice occurs="*">
            <element name="dt"/>
            <element name="dd"/>
        </choice>
    </model>
</elementtype>

Sequence content model atom

In this example, the dl element type's content model specifies that dt followed by dd is allowed any number of times.

<elementtype name="dl">
    <model>
        <sequence occurs="*">
            <element name="dt"/>
            <element name="dd"/>
        </sequence>
    </model>
</elementtype>

Combining content model atoms

In this example, the dl element type's content model specifies that a dh is followed by two or more dt or dd elements.

<elementtype name="dl">
    <model>
        <sequence>
            <element name="dh"/>
            <choice occurs="2,*">
                <element name="dt"/>
                <element name="dd"/>
            </choice>
        </sequence>
    </model>
</elementtype>

Content specifications

A content specification of any or empty may be used, rather than a content model, in an element definition. In that case, the model element is not required.

Any content specification

In the following example, the any content specification indicates that the HTML element may contain any combination of string content and any element that is defined in the schema.

<elementtype name="HTML">
    <any/>
</elementtype>

Empty content specification

In the following example, the empty content specification indicates that the BR element may not contain any content.

<elementtype name="BR">
    <empty/>
</elementtype>

Element inheritance

In the following example, first the inline element is defined, then the emphasis and strong elements inherit their definitions from inline.

<elementtype name="inline">
    <model><string/></model>
</elementtype>

<elementtype name="emphasis">
     <instanceof name="inline"/>
</elementtype>

<elementtype name="strong">
     <instanceof name="inline"/>
</elementtype>

Extending an element

In the following example, the a element extends the previously defined inline with an attribute definition

<elementtype name="a">
    <extends name="inline">
        <attdef name="href" datatype="uri">
            <required/>
        </attdef>
    </extends>
</elementtype>

Parameterizing content models

Parameters may be scoped to a base element type or to the namespace.

Parameterized element types

In this example, the p1 parameter is defined as an element content model atom. The p element contains a parameter reference to p1 in its content model. The effect of this is that the element atom from the parameter is substituted for the parameter reference, and the content model includes the a element.

<parameter name="p1">
    <element name="a" />
</parameter>

<elementtype name="p">
    <model>
      <mixed>
        <element name="emphasis"/>
        <element name="strong"/>
        <paramref scope="namespace" name="p1"/>
      </mixed>
    </model>
</elementtype>

Parameterized base element types

A parameterized base element type is an element type whose content model contains element-scoped parameter references. Such an element type cannot be instantiated and must be extended to be useful. Also note that an element that is based on a base element type must define all of its parameters; failure to do so is a fatal error.

In this example, block is a base element type. It's mixed content model may contain emphasis and strong elements. When the base element type is extended, the defined value of the element-scoped parameter, p1, replaces the parameter reference.

<elementtype name="block">
    <model>
        <mixed>
            <element name="emphasis"/>
            <element name="strong"/>
            <paramref name="p1" scope="element"/>
        </mixed>
    </model>
 </elementtype>

<elementtype name="p">
    <extends name="block">
        <parameter name="p1">
            <element name="a" />
        </parameter>
    </extends>
</elementtype>

Attribute definitions

Attribute definitions in SOX documents may be defined as part of the element type definition. An attribute definition has a name and a type, and must include a presence element.

<elementtype name="image"> 
    <empty/> 
    <attdef name="id" datatype="ID">
     <implied/>
    </attdef>
 </elementtype>

Attribute name

An attribute's name must be unique among the attributes of its host element type or interface. It is a fatal error to attempt to re-assign an attribute name within its respective scope, except when specializing the attribute.

Attribute datatypes

The attribute's datatype may be any valid XML attribute type (ID, IDREF, IDREFS, ENTITY, ENTITIES, NMTOKEN, NMTOKENS, NOTATION), any extended attribute type (ATTRIBUTE, DATATYPE, ELEMENT, INTERFACE, NAME, NAMESPACE), or any other intrinsic or user-defined datatype.

Attribute enumerated value lists

In SOX documents, unlike XML DTDs, enumerations may be specified for any attribute type. This information will be lost when an XML DTD is generated from a SOX document, except for attributes of type NMTOKEN and NOTATION. However, it may be used by an application to provide an ancillary level of validation, or by a user-interface mechanism to provide appropriate I/O methods.

Any attribute definition may specify an enumerated list of values, even strings. These enumerations are modelled after the HTML form's select element type which effectively provides a menu.

In the following example, the size attribute offers a choice among NMTOKEN values, and the topping attribute offers a selection of STRING values.

<elementtype name="pizza">
    <empty/>

    <attdef name="size" datatype="NMTOKEN">
        <enumeration>
        <option>small</option>
        <option>medium</option>
        <option>large</option>
        <option>party</option>
        </enumeration>
        <required/>
    </attdef>

    <attdef name="topping" datatype="STRING">
        <enumeration multiple="true">
        <option>green pepper</option>
        <option>mushroom</option>
        <option>onion</option>
        <option>pepperoni</option>
        <option>pineapple</option>
        </enumeration>
        <implied/>
    </attdef>
 </elementtype>

Attribute value presence

An attribute value's presence in an instance may be specified as default, fixed, implied, or required as in [XML], or inherited.

Default attribute value

<elementtype name="pizza">
    <empty/>
    <attdef name="size" datatype="NMTOKEN">
        <enumeration>
        <option>small</option>
        <option>medium</option>
        <option>large</option>
        <option>party</option>
        </enumeration>
        <default>small</default>    
    </attdef>
</elementtype>

Fixed attribute value

<elementtype name="glossary">
    <model>....</model>
    <attdef name="id" datatype="ID">
        <fixed>glossary</fixed>
    </attdef>
</elementtype>

Implied attribute value

<elementtype name="A">
    <model>....</model>
    <attdef name="A" datatype="ID">
        <implied/>
    </attdef>
    <attdef name="href" datatype="uri">
        <implied/>
    </attdef>
</elementtype>

Inherited attribute value

This attribute default value type becomes #IMPLIED in the generated DTD, but it may be used by an application to signal that the value of this attribute, if not specified, should be taken to be the value of an attribute whose name matches and has a specified value, and which is attached to nearest ancestor element of the attribute's host element for which that is true; or no value if no such attribute exists in the host element's ancestry. That is the value of an attribute of type INHERITED is scoped to the element on which it occurs.

For example:

<elementtype name="child">
    <model>
        <sequence>
            <element name="child" occurs="*" />
        </sequence>
    </model>
    <attdef name="family" datatype="STRING"><inherited/></attdef>
    <attdef name="given" datatype="STRING"><required/></attdef>
</elementtype>

Required attribute value

<elementtype name="xref">
    <model>
        <string/>
    </model>
    <attdef name="xref" datatype="URI">
        <required/>
    </attdef>
</elementtype>

Interfaces

An attribute interface is similar to one of the uses for an XML parameter entity, but far more powerful than that. An attribute interface is a named object that contains one or more attribute definitions.

Interface names

The local part of an attribute interface name is assigned by defining an attribute interface. An attribute interface name may be referenced by the implements element.

It is a fatal error to re-assign an interface name, or to reference an attribute interface that has not been defined.

Defining an attribute interface

<interface name="anchor">
    <attdef name="href" datatype="uri"><implied/></attdef>
    <attdef name="name" datatype="ID"><implied/></attdef>
</interface>

Implementing an attribute interface

Given the interface defined in the previous example, we can implement that interface in a specific element type definition. In this example, the A element type specifies that it implements the attributes defined in the anchor attribute interface.

<elementtype name="A">
    <model><string/><model>
        <implements name="anchor"/>
    </model>
</elementtype>

Specializing an attribute definition

We can specialize the attributes in an interface. In this example, the LINK element type specifies that the href attribute is now required, and the name attribute is fixed as a null value.

<elementtype name="LINK">
    <model><string/><model>
        <implements name="anchor">
            <attdef name="href" datatype="URI">
                <required/>
            </attdef>
            <attdef name="name" datatype="ID">
                <fixed></fixed>
            </attdef>
        </implements>
    </model>
</elementtype>

We can also specilialize an attribute when extending an element. In the following example, the para element's label attribute is implied. The note element extends para and specializes the label attribute by specifying a default value. The warning element extends para and specializes the label attribute by specifying a fixed value.

<elementtype name="para">
    <model><string/></model>
    <attdef name="label"><implied/></attdef>
</elementtype>

<elementtype name="note">
     <extends name="para">
        <attdef name="label"><default>Note: </default></attdef>
    </extends>
</elementtype>

<elementtype name="warning">
     <extends name="para">
        <attdef name="label"><fixed>Warning: </fixed></attdef>
    </extends>
</elementtype>

There are some rules that apply when specializing an attribute definition inside of an extends or implements element.

datatype
The datatype may be specialized as a true subtype of the base attribute's datatype. For example, if the base attribute datatype is number, the specialization may be any datatype that is derived from number, such as int.
enumeration
An enumeration may be specialized as a restricted version of the base attribute's enumeration. For example, if the base enumeration is (a|b|c|d), the specialization may contain any but not all of a, b, c and d.
presence
Presence may be specialized to be more restrictive. The base attribute's presence may be specialized as follows:
IMPLIED
specialization may be INHERITED, DEFAULT, REQUIRED, FIXED
INHERITED
specialization may be DEFAULT, REQUIRED, FIXED
DEFAULT
specialization may be REQUIRED, FIXED
REQUIRED
specialization may be may be FIXED
FIXED
no specialization possible

Datatypes

Intrinsic datatypes

The intrinsic datatypes define the domains of the atomic data units in SOX documents.

binary
A sequence of bits, being represented by 0 or 1.
Format: [01]*
boolean
The values true or false.
char
A character
Format: X
date
ISO 8601 (5.2.1.1) extended calendar date format [ISO-8601]
Format: YYYY-MM-DD
number
A numeric value. Used when a more specific numeric representation is not required or practical.
There are no contraints on the minimum or maximum values, number of digits, or number of decimal places.
string
A sequence of characters.
Format: X*
time
ISO 8601 (5.3.1.1) extended local time format [ISO-8601]
Format: hh:mm:ss
uri
Universal Resource Identifier
Format: U*

User-defined datatypes

SOX documents provide a mechanism for defining datatypes that can be used to specify the datatype of an attribute or element string content. User-defined datatypes may only be derived from the intrinsic datatypes. A SOX processor must be capable of generating code to perform validation on the values of user-defined datatypes.

The local part of a datatype name is specified in the name attribute of the datatype element. A datatype name may be referenced in the datatype attribute of the attdef, enumeration, format, scalar, and string elements.

It is a fatal error to re-assign a datatype name or to reference a datatype that has not been defined.

User-defined scalar datatypes

User-defined scalar datatypes are derived from the intrinsic number datatype. A derived datatype must specify the number of digits and decimal places, and the minimum and maximum values permitted. An optional mask describes the required format of values that conform to the datatype. The minimum and maximum permitted values may be further constrained by setting the boolean minexclusive and maxexclusive attributes to "1". A SOX processor must be able to generate code that will validate a value against the datatype definition.

<datatype name="inch">
    <scalar datatype="float" digits="4" decimals="2" min="0" max="12">
      <mask>Z#.##</mask>
    </scalar>
</datatype>

User-defined enumeration datatypes

User-defined enumeration datatypes may be derived from any of the intrinsic datatypes. Each of the values specified in an enumerated datatype must conform to the specified type. A SOX processor must be able to generate code that will validate the value against the datatype definition.

<datatype name="postalcodes.ca">
    <enumeration datatype="nmtoken">
        <option>AB</option>
        <option>BC</option>
        <option>MB</option>
        <option>NB</option>
        <option>NF</option>
        <option>NT</option>
        <option>NS</option>
        <option>ON</option>
        <option>PE</option>
        <option>QC</option>
        <option>SK</option>
        <option>YT<option>
    </enumeration >
</datatype>

User-defined format datatypes

User-defined enumeration datatypes may be derived from any of the intrinsic datatypes, but will most commonly be used to specialize string values. A required mask describes the required format of values that conform to the datatype. A SOX processor must be able to generate code that will validate the value against the datatype definition.

<datatype name="part-number">
    <format datatype="string">
        <mask>AAA-###.##-aa</mask>
    </format>
</datatype>

Attribute types

To accommodate SOX itself, these attribute types are available for use as valid attribute datatypes when specified in the value of the datatype attribute of an attdef element.

ATTRIBUTE
Reference to an attribute
DATATYPE
Reference to a datatype.
ELEMENT
Reference to an element type
ENTITY
Reference to an external unparsed entity
ENTITIES
Reference to one or more external unparsed entities, separated by spaces
ID
A unique identifier
IDREF
Reference to a unique identifier
IDREFS
Reference to one or more unique identifiers, separated by spaces
INTERFACE
Reference to an attribute interface
NAME
An XML name.
NAMESPACE
Reference to a namespace.
NMTOKEN
A name token
NMTOKENS
Name tokens, separated by spaces
NOTATION
Reference to a notation

Occurs type

The occurs datatype is available to accommodate SOX itself. It is used by the occurs attribute of the model, element, string, mixed, choice, and sequence elements. It is not intended to be used as an intrinsic datatype.

*
Zero of more occurrences
?
Zero or one occurrence
+
One or more occurences
n,m
A minimum of n and a maximum of m occurences
n must be a positive integer or zero
m may be an integer greater than n
m may be the character "*", indicating that the maximum is unbounded

Included Modules

Collections of definitions known as modules may be directly included into a SOX document by using an include element. Like XML's combination of external parameter entity definition and reference, an inclusion is used to effectively copy the contents of an external resource into the SOX document where the include element is encountered. Unlike external parameter entities, there is no requirement to define a name and then reference that name to invoke the inclusion of the external resource.

For example, to include a module containing definitions that are commonly used for addresses:

<include href="http://www.veosystems.com/schemas/address.xml" />

Entities

In XML, there are two types of entity: parsed and unparsed. Parsed entities are available as internal and external entities, while unparsed entities are only available as external entities.

Entity names

The local part of an entity name is specified when defining a parsed or unparsed entity, and may be re-assigned. A parsed entity name may only be referenced in an XML document instance, not in a SOX document. An unparsed entity name may be referenced in an attribute of type ENTITY or ENTITIES.

It is an error to reference an unparsed entity name that has not been defined.

Internal parsed entities

Internal parsed entities are a feature of XML that enables reuse of text fragments by direct reference. In SOX documents, internal parsed entities may be defined by using the textentity element. A SOX processor must transform this element to its XML equivalent when producing an XML DTD.

External parsed entities

External parsed entities are a feature of XML that offers a baroque method for including well-formed XML document fragments, including text and markup, by direct reference to the storage object of the parsed entity. In SOX documents, external parsed entities may be defined by using the extentity element. A SOX processor must transform this element to its XML equivalent when producing an XML DTD.

External parsed entities are included in SOX documents for XML compatibility. External parsed entities are available as first-class element types to satisfy the need to transform an XML DTD into a SOX document and back again without significant loss.

External unparsed entities

External unparsed entities are a feature of XML that offers a baroque method for including binary data by indirect reference to both the storage object and the the notation type of the unparsed entity. In SOX documents, external parsed entities may be defined by using the entity element. A SOX processor must transform this element to its XML equivalent when producing an XML DTD.

External unparsed entities are included in SOX documents for XML compatibility. External unparsed entities are available as first-class element types to satisfy the need to transform an XML DTD into a SOX document and back again without significant loss.


Comments

The availability of SOX documentation elements should eliminate any need to use traditional XML comments in the body of a SOX document. Comments are available as first-class element types to satisfy the need to transform an XML DTD into a SOX document and back again without significant loss. A SOX processors may, as appropriate to the application design, emit an XML comment into an XML DTD when a SOX comment element is encountered. Otherwise, there are no prescribed processing semantics associated with SOX comments.

<comment>A comment that belongs in an XML DTD</comment>

Notation definitions

A notation may be defined by specifying a name and an identifier for the notation. A notation may be referenced by name as part of an external entity declaration. The external entity name may, in turn, be referenced as the value of an attribute of type entity. In that case, a processor that understands the notation is supposed to deal with the content of the entity.

The local part of a notation name is specified in the name attribute of the notation element. A notation name may be referenced in the notation attribute of the entity element, or by any attribute of type NOTATION.

It is a fatal error to re-assign a notation name, or to reference a notation that has not been defined.

Notations are included in SOX documents for XML compatibility. Notations are available as first-class element types to satisfy the need to transform an XML DTD into a SOX document and back again without significant loss.


Processing instructions

Processing instructions are a feature of XML that provides a mechanism for by-passing the normal operation of an XML processor and delivering instructions directly to a downstream process whose responsibility it is to interpret the instruction and act accordingly.

In SOX documents, processing instructions may be defined by using the pi element. A SOX processor must transform this element to its XML equivalent when producing an XML DTD. The use of XML processing instructions in SOX documents is discouraged, as they are not interpretable by a SOX processor.

Processing instructions are included in SOX documents for XML compatibility. Processing instructions are available as first-class element types to satisfy the need to transform an XML DTD into a SOX document and back again without significant loss.


Names and namespaces

Names of schema objects

The names of SOX elements, attributes, interfaces, datatypes, notations, entities, and namespace identifiers themselves, are required to be valid XML names, with the exception that the colon (:) character is not allowed. That is, all names must begin with a letter followed by any combination of letters, digits, combining characters, extender characters, periods (.), hyphens (-), and underscores (_). However, to maximize interoperability with programming language interfaces, the use of punctuation characters is discouraged. Names are case-sensitive.

Uniqueness of names

The names of objects of a given type must be unique for that object type, and the names of objects of one type do not share the same namespace as objects of other types. That is, a SOX processor is expected to maintain a separate lookup table, or index, for the names of each of the object classes listed here:

The purpose of this section is to define the methods by which the names of objects may be assigned, what a name is deemed to be when considered in the context of imported namespaces and element scopes, how names may be referenced, and the rules governing potential reassignment of an object name.

Name parts

In SOX, the fully-qualified name of any of these objects, except namespace identifiers, is considered to be composed of multiple parts, including:

namespace part (or prefix)
The value of the namespace attribute of that object's defining element, or the current namespace.
namespace URI part
The value of the URI associated with the value of the namespace attribute in that object's defining element
object type part
attribute, datatype, element, entity, interface, namespace, notation, or parameter.
scope part
For attributes, element, or interface.
For parameters, namespace or element.
Not applicable to other object types.
context part
For attributes, the name of the host element type or interface definition.
For element-scoped parameters, the name of the host element type definition.
For namespace-scoped parameters, the value is not required as it is redundant.
Not applicable to other object types.
local name part
The object's name attribute value.

This description of the names is intentionally incompatible with that in [XML-Namespaces].

Establishing the current namespace

The namespace of a SOX document is not required to be specified. However, in cases where multiple namespaces are in use within a SOX document, it may be desirable to establish the current namespace by specify an identifier in the schema element's name attribute and an associated URI in the namespace attribute.

<schema name="invoice" namespace="http://www.veosystems.com/namespaces/invoice.xml" />

When an included external module's schema element has a namespace specified, that namespace becomes established as the current namespace, and the previous namespace is superordinated.

For any reference to an object in an imported namespace, that namespace becomes established as the current namespace while the reference is being reified. That is, the imported namespace becomes the current namespace while any subordinate or superordinate element definitions, or specializations are realized.

Importing a namespace

As mentioned earlier, importing a namespace makes a resource available for any namespace-qualified name references that may be encountered while processing the current SOX document. This means that a SOX document can refer to global elements, attributes, etc., as if they had been defined locally.

In the following example, we create a new kind of memo, based on the memo that we created in the first example. Note that the basic structure of the HTML memo is identical to the earlier memo. But here, we import the memo and HTML namespaces. Aside from memo, all of the element types identify one of the imported namespaces. The resulting from, subject, date, to and cc element types are the ones defined in the memo namespace. The body element is the one defined in the HTML namespace.

<schema name="HTMLmemo" namespace="http://www.veosystems.com/schemas/HTMLmemo.xml" >

<h1>A memo document with HTML body</h1>

<h2>Imported namespaces</h2>
<namespace name="memo" namespace="http://www.veosystems.com/schemas/memo.xml"/>
<namespace name="HTML" namespace="http://www.w3.org/schemas/html.xml"/>

<h2>Memo element type</h2>
<elementtype name="memo">
    <model>
       <sequence>
              <element namespace="memo" name="from"/>
              <element namespace="memo" name="subject"/>
              <element namespace="memo" name="date"/>
              <element namespace="memo" name="to"/>
              <element namespace="memo" name="cc"/>
              <element namespace="HTML" name="body"/>
      </sequence>
   </model>
</elementtype>

</schema>

The example above would be a complete SOX document for a memo with an HTML body.

If we had used an inclusion to source the contents of these two external resources, there would have been a name collision between memo:body and HTML:body.

Referring to names in imported namespaces

Here is another example that references namespaces:

<elementtype name="section">
    <model>
        <element namespace="HTML" name="p" occurs="*" />
    </model>
    <attdef namespace="CALS" name="security" /> 
    <attdef namespace="HTML" scope="element" context="img" name="src" />
</elementtype>

Namespace identifiers

The local part of a namespace identifier is specified when defining a namespace. A namespace identifier may be referenced in an attribute whose name and datatype is namespace. Namespace identifiers may not be namespace-qualified.

It is a fatal error to re-assign a namespace identifiers, or to reference a namespace that has not been defined.


Normative Appendixes


Appendix A: An XML DTD for SOX

The XML DTD for SOX is comprised of the core SOX DTD, the referenced HTML Text definitions, and the textual description found in this document. The definitions of SOX are presented here in two parts.

Core SOX DTD

<!-- ************************************************************* -->
<!-- SOX DTD     -->
<!-- PUBLIC "-//Veo Systems Inc.//DTD SOX 1.0//EN" -->
<!-- SYSTEM "schema.dtd" -->
<!-- Copyright:      Veo Systems Inc., 1997, 1998
     Written by:     Murray Maloney 
     Date created:   17 Dec 1997
     Date revised:   30 Sep 1998
     Version:        1.0 -->
<!-- ************************************************************* -->

<!-- ************************************************************* -->
<!-- Schema  ***************************************************** -->
<!-- ************************************************************* -->
     
<!ELEMENT schema
               (h1, (h2 | h3 | intro 
                | datatype | elementtype | interface 
                | include | namespace
                | comment | pi 
                | entity | extentity | notation | textentity | parameter )*) >
<!ATTLIST schema
                name       NMTOKEN   #IMPLIED
                namespace  CDATA     #IMPLIED
                version    CDATA     #FIXED "1.0" >

<!-- Elements used for documentation components use a limited
     subset of HTML for convenience in cut/paste operations by average
     designers and engineers. Certainly other DTD subsets, such as 
     DocBook, could be used in place of HTML, but learning curve and
     available tools led to this design decision. -->

<!ENTITY % htmltext SYSTEM "htmltext.ent" > %htmltext;

<!-- ************************************************************* -->
<!-- ELEMENTS  *************************************************** -->
<!-- ************************************************************* -->
<!-- An Element Type definition requires a name.
     It is defined to extend a named element, 
     as an instance of a named element,
     as an EMPTY or ANY element with optional attribute definitions,
     or with a content model with optional attribute definitions. -->

<!ELEMENT elementtype 
               (((extends|instanceof) 
                | ((any|empty|model), (attdef | implements)*))), explain?)>
<!ATTLIST elementtype
                name       NMTOKEN   #REQUIRED >

<!ELEMENT extends
               (explain?, (attdef | implements | parameter)+) >
<!ATTLIST extends
                name       NMTOKEN   #REQUIRED
                namespace  NMTOKEN   #IMPLIED
                scope      NMTOKEN   #FIXED "element" >

<!ELEMENT instanceof
               (explain?) >
<!ATTLIST instanceof
                name       NMTOKEN   #REQUIRED
                namespace  NMTOKEN   #IMPLIED >

<!ELEMENT any  (explain?) >
<!ELEMENT empty
               (explain?) >
<!-- ************************************************************* -->
<!-- MODEL  ****************************************************** -->
<!-- ************************************************************* -->
<!ELEMENT model
               (string|element|mixed|choice|sequence|paramref)>
<!ATTLIST model
                occurs     CDATA     #IMPLIED>

<!ELEMENT element
               (explain?, instanceof?) >
<!ATTLIST element
                name       NMTOKEN   #REQUIRED
                namespace  NMTOKEN   #IMPLIED
                occurs     CDATA     #IMPLIED >

<!ELEMENT string
               ((default | fixed | mask)?, explain?)>
<!ATTLIST string
                datatype   NMTOKEN   #IMPLIED >

<!ELEMENT mixed
               ((element | paramref)+, explain?) >
<!ATTLIST mixed
                name       NMTOKEN   #IMPLIED
                occurs     CDATA     #FIXED "*" >

<!ELEMENT choice
               ((element|choice|sequence|paramref),
                (element|choice|sequence|paramref)+, explain?) >
<!ATTLIST choice
                name       NMTOKEN   #IMPLIED
                occurs     CDATA     #IMPLIED >

<!ELEMENT sequence
               ((element|choice|sequence|paramref),
                (element|choice|sequence|paramref)+, explain?) >
<!ATTLIST sequence
                name       NMTOKEN   #IMPLIED
                occurs     CDATA     #IMPLIED >

<!-- ************************************************************* -->
<!-- ATTRIBUTES  ************************************************* -->
<!-- ************************************************************* -->

<!-- An interface to a named collection of attribute definitions. -->
<!ELEMENT interface
               ((attdef | implements)+, explain?) >
<!ATTLIST interface
                name       NMTOKEN   #REQUIRED >

<!-- Transcludes an interface specification -->
<!ELEMENT implements
               ((attdef | implements)*, explain?) >
<!ATTLIST implements
                name       NMTOKEN   #REQUIRED
                namespace  NMTOKEN   #IMPLIED
                scope      NMTOKEN   #FIXED "interface" >

<!-- An attribute definition has a name and datatype, and must have 
     a presence element "required|implied|inherit|default|fixed" included.
     It may have a namespace associated with it, or inherits? -->
<!ELEMENT attdef
               (enumeration?, (required|implied|inherit|default|fixed), explain?)>
<!ATTLIST attdef
                name       NMTOKEN   #REQUIRED
                namespace  NMTOKEN   #IMPLIED
                datatype   NMTOKEN   "STRING" 
                scope      NMTOKEN   #IMPLIED >

<!ELEMENT default
               (#PCDATA) >
<!ELEMENT fixed
               (#PCDATA) >

<!ELEMENT required
                EMPTY >
<!ELEMENT implied
                EMPTY >
<!ELEMENT inherit
                EMPTY >

<!-- ************************************************************* -->
<!-- DATATYPE  *************************************************** -->
<!-- ************************************************************* -->

<!ELEMENT datatype
               ((enumeration|format|scalar), explain?)+ >
<!ATTLIST datatype
                name       NMTOKEN   #REQUIRED >

<!ELEMENT enumeration
               (option+, explain?) >
<!ATTLIST enumeration
                datatype      NMTOKEN   #IMPLIED
                multiple   (true|false) "false" >

<!ELEMENT option
               (#PCDATA)* > 
<!ATTLIST option
                value      CDATA     #IMPLIED
                label      CDATA     #IMPLIED
                selected  (selected) #IMPLIED
                disabled  (disabled) #IMPLIED > 

<!ELEMENT format
               (mask, explain?) >
<!ATTLIST format
                datatype      NMTOKEN   "string" >

<!ELEMENT scalar
               (mask?, explain?) >
<!ATTLIST scalar
                datatype       NMTOKEN   "number"
                digits         CDATA     #IMPLIED
                decimals       CDATA     #IMPLIED
                minvalue       CDATA     #IMPLIED
                maxvalue       CDATA     #IMPLIED
                minexclusive   CDATA     "0"
                maxexclusive   CDATA     "0" >

<!ELEMENT mask (#PCDATA) >

<!-- ************************************************************* -->
<!-- NAMESPACES  ************************************************* -->
<!-- ************************************************************* -->
<!-- Imports a namespace and provides a shorthand name for a full URN. -->

<!ELEMENT namespace
                (explain?) >
<!ATTLIST namespace
                name          NMTOKEN   #REQUIRED
                namespace     CDATA     #REQUIRED >

<!-- ************************************************************* -->
<!-- ENTITIES  *************************************************** -->
<!-- ************************************************************* -->
<!-- Entities. XML's entity definition and reference mechanisms
     are partially reproduced in SOX. In addition, some SOX-specific
     entity definition and reference mechanisms are also provided. -->

<!ELEMENT include
               (explain?)>
<!ATTLIST include
                datatype   NMTOKEN   #FIXED "schema"
                public     CDATA     #IMPLIED
                system     CDATA     #IMPLIED >

<!ELEMENT parameter
               (element|choice|sequence|paramref) >
<!ATTLIST parameter
                name       NMTOKEN   #REQUIRED >

<!ELEMENT paramref
                (explain?) >
<!ATTLIST paramref
                name       NMTOKEN   #REQUIRED
                namespace  NMTOKEN   #IMPLIED
                scope     (element | namespace)   #REQUIRED >

<!-- Parsed entities. -->
<!ELEMENT textentity
               (#PCDATA)* >
<!ATTLIST textentity
                name       NMTOKEN   #REQUIRED >

<!ELEMENT extentity
               (explain?) >
<!ATTLIST extentity
                name       NMTOKEN   #REQUIRED
                system     CDATA     #REQUIRED  
                public     CDATA     #IMPLIED       
                notation   NMTOKEN   #FIXED "XML" >

<!-- Unparsed entity.  -->
<!ELEMENT entity
               (explain?) >
<!ATTLIST entity
                name       NMTOKEN   #REQUIRED
                system     CDATA     #REQUIRED  
                public     CDATA     #IMPLIED     
                notation   NMTOKEN   #REQUIRED >

<!-- Notation declaration.  -->
<!ELEMENT notation
               (explain?) >
<!ATTLIST notation
                name       NMTOKEN   #REQUIRED
                system     CDATA     #IMPLIED 
                public     CDATA     #IMPLIED >

<!-- ************************************************************* -->
<!-- COMMENT  **************************************************** -->
<!-- ************************************************************* -->

<!ELEMENT comment
               (#PCDATA)>

<!-- ************************************************************* -->
<!-- PROCESSING INSTRUCTIONS  ************************************ -->
<!-- ************************************************************* -->

<!ELEMENT pi   (#PCDATA) >
<!ATTLIST pi
                name       NMTOKEN   #REQUIRED >

HTML Text definitions

<!-- ************************************************************* --> 
<!-- HTML Text: SOX uses HTML element types for convenience.-->
<!-- ************************************************************* -->
<!-- Copyright:     Veo Systems Inc., 1997, 1998
     Written by:    Murray Maloney 
     Date created:  17 Dec 1997
     Date revised:  30 Sep 1998
     Version:       1.0 --> 
<!-- ************************************************************* --> 

<!ENTITY % block  "form | table | p | bq | pre | ol | ul | dl" > 

<!ENTITY % text  "#PCDATA| a | abbr | b | big | br | button 
| checkbox | cite | code | em | fieldset | i | img | label 
| password | q | radio | select | small | span | strike | strong 
| sub | sup | textarea | textfield | tt | u " > 

<!ENTITY % heading  "#PCDATA| a | abbr | b | big | br | cite | code | em 
| i | img | q | small | span | strike | strong | sub | sup | tt | u " > 

<!-- ************************************************************* --> 
<!ELEMENT intro
               (%block;)* > 
<!ELEMENT explain
               (title?, synopsis?, (h4 | h5 | h6 | %block;)*) > 
<!ELEMENT title
               (%heading;)* > 
<!ELEMENT synopsis
               (%heading;)* > 

<!-- ************************************************************* --> 
<!ELEMENT h1   (%heading;)* > 
<!ELEMENT h2   (%heading;)* > 
<!ELEMENT h3   (%heading;)* > 
<!ELEMENT h4   (%heading;)* > 
<!ELEMENT h5   (%heading;)* > 
<!ELEMENT h6   (%heading;)* > 

<!-- ************************************************************* --> 
<!ELEMENT b    (#PCDATA)* > 
<!ELEMENT br   EMPTY > 
<!ELEMENT big  (#PCDATA)* > 
<!ELEMENT i    (#PCDATA)* > 
<!ELEMENT small
               (#PCDATA)* > 
<!ELEMENT sub  (#PCDATA)* > 
<!ELEMENT sup  (#PCDATA)* > 
<!ELEMENT strike
               (#PCDATA)* > 
<!ELEMENT tt   (#PCDATA)* > 
<!ELEMENT u    (#PCDATA)* > 
<!ELEMENT abbr (#PCDATA)* > 
<!ELEMENT cite (#PCDATA)* > 
<!ELEMENT code (#PCDATA)* > 
<!ELEMENT em   (#PCDATA)* > 
<!ELEMENT q    (#PCDATA)* > 
<!ELEMENT span (#PCDATA)* > 
<!ELEMENT strong
               (#PCDATA)* > 

<!-- ************************************************************* --> 
<!ELEMENT a    (%text;)* > 
<!ATTLIST a
                name       CDATA     #IMPLIED
                href       CDATA     #IMPLIED
                title      CDATA     #IMPLIED > 

<!-- ************************************************************* --> 
<!ELEMENT img  (explain?) > 
<!ATTLIST img
                src        CDATA     #REQUIRED 
                alt        CDATA     #REQUIRED 
                longdesc   CDATA     #IMPLIED  
                usemap     CDATA     #IMPLIED > 

<!-- ************************************************************* --> 
<!ELEMENT pre  (%text;)* > 
<!ATTLIST pre
                xml:space (preserve) #REQUIRED > 

<!-- ************************************************************* --> 
<!ELEMENT p    (%text;)* > 
<!ELEMENT bq   (%text;)* > 
<!ELEMENT ol   (lh?, li+) > 
<!ELEMENT ul   (lh?, li+) > 
<!ELEMENT lh   (%heading;)* > 
<!ELEMENT li   ((%block;)*) > 
<!ELEMENT dl   (dh?,(dt,dd)+) > 
<!ELEMENT dh   (%heading;)* > 
<!ELEMENT dt   (%text;)* > 
<!ELEMENT dd   ((%block;)*) > 

<!-- ************************************************************* --> 
<!ELEMENT table
               (thead?, tbody) > 
<!ATTLIST table
                cols       CDATA     #IMPLIED
                width      CDATA     #IMPLIED
                height     CDATA     #IMPLIED
                align      (left|center|right|justify) #IMPLIED
                valign     (top | middle | bottom | baseline) #IMPLIED 
                vspace     CDATA     #IMPLIED
                hspace     CDATA     #IMPLIED
                cellpadding CDATA    #IMPLIED
                cellspacing CDATA    #IMPLIED
                border     CDATA     #IMPLIED
                frame     (box|void|above| below|hsides|vsides|lhs|rhs) #IMPLIED
                rules     (none|groups|rows|cols|all) #IMPLIED > 

<!ELEMENT thead
               (tr)+ > 
<!ATTLIST thead
                align     (left|center|right|justify)  #IMPLIED
                valign    (top|middle|bottom|baseline) #IMPLIED > 

<!ELEMENT tbody
               (tr)+ > 
<!ATTLIST tbody
                align     (left|center|right|justify)  #IMPLIED
                valign    (top|middle|bottom|baseline) #IMPLIED > 

<!ELEMENT tr   (th | td)+ > 
<!ATTLIST tr
                align     (left|center|right|justify)        #IMPLIED
                valign    (top | middle | bottom | baseline) #IMPLIED > 

<!ELEMENT th   (%text;)* > 
<!ATTLIST th
                colspan    CDATA     #IMPLIED
                rowspan    CDATA     #IMPLIED
                width      CDATA     #IMPLIED
                height     CDATA     #IMPLIED
                align     (left|center|right|justify)        #IMPLIED
                valign    (top | middle | bottom | baseline) #IMPLIED > 

<!ELEMENT td   (%text;)* > 
<!ATTLIST td
                colspan    CDATA     #IMPLIED
                rowspan    CDATA     #IMPLIED
                width      CDATA     #IMPLIED
                height     CDATA     #IMPLIED
                align     (left|center|right|justify)        #IMPLIED
                valign    (top | middle | bottom | baseline) #IMPLIED > 
                
<!-- ************************************************************* --> 
<!ELEMENT form (button|checkbox|fieldset|label|password|radio
                |select|textfield|textarea)* > 
<!ATTLIST form
                action    CDATA     #IMPLIED
                method   (get|post) "get" > 

<!ELEMENT fieldset
               (legend?, (label, (button|checkbox|password|radio
                |select|textfield|textarea))*) > 

<!ELEMENT legend
               (%heading;)* > 

<!ELEMENT label
               (%heading;)* > 

<!ELEMENT select
               (option | optgroup)+ > 
<!ATTLIST select
                name       NMTOKEN   #IMPLIED
                multiple  (multiple) #IMPLIED
                disabled  (disabled) #IMPLIED
                size       CDATA     #IMPLIED
                tabindex   CDATA     #IMPLIED
                accesskey  CDATA     #IMPLIED > 

<!ELEMENT optgroup
               (label?, (option | optgroup)+) > 
<!ATTLIST optgroup
                multiple  (multiple) #IMPLIED
                disabled  (disabled) #IMPLIED
                tabindex   CDATA     #IMPLIED
                accesskey  CDATA     #IMPLIED > 

<!ELEMENT button
               (#PCDATA)* > 
<!ATTLIST button
                name       NMTOKEN   #REQUIRED
                value      CDATA     #IMPLIED
                icon       CDATA     #IMPLIED
                type      (button | submit | reset) #REQUIRED
                disabled  (disabled) #IMPLIED
                tabindex   CDATA     #IMPLIED
                accesskey  CDATA     #IMPLIED > 

<!ELEMENT checkbox
               (#PCDATA)* > 
<!ATTLIST checkbox
                name       NMTOKEN   #REQUIRED
                value      CDATA     #IMPLIED
                icon       CDATA     #IMPLIED
                type       NMTOKEN   #FIXED "checkbox"
                checked   (checked) #IMPLIED
                disabled  (disabled) #IMPLIED
                size       CDATA     #IMPLIED
                tabindex   CDATA     #IMPLIED
                accesskey  CDATA     #IMPLIED > 

<!ELEMENT radio
               (#PCDATA)* > 
<!ATTLIST radio
                name       NMTOKEN   #REQUIRED
                value      CDATA     #REQUIRED
                icon       CDATA     #IMPLIED
                type       NMTOKEN   #FIXED "radio"
                checked   (checked) #IMPLIED
                disabled  (disabled) #IMPLIED
                size       CDATA     #IMPLIED
                tabindex   CDATA     #IMPLIED
                accesskey  CDATA     #IMPLIED > 

<!ELEMENT textfield
               (#PCDATA)* > 
<!ATTLIST textfield 
                name       NMTOKEN   #REQUIRED
                value      CDATA     #IMPLIED
                icon       CDATA     #IMPLIED
                type       NMTOKEN   "text"
                disabled  (disabled) #IMPLIED
                readonly  (readonly) #IMPLIED
                size       CDATA     #IMPLIED
                maxlength  CDATA     #IMPLIED
                tabindex   CDATA     #IMPLIED
                accesskey  CDATA     #IMPLIED > 

<!ELEMENT password
               (#PCDATA)* > 
<!ATTLIST password
                name       NMTOKEN   #REQUIRED
                value      CDATA     #IMPLIED
                icon       CDATA     #IMPLIED
                type       NMTOKEN   "password"
                disabled  (disabled) #IMPLIED
                readonly  (readonly) #IMPLIED
                size       CDATA     #IMPLIED
                maxlength  CDATA     #IMPLIED
                tabindex   CDATA     #IMPLIED
                accesskey  CDATA     #IMPLIED > 

<!ELEMENT textarea
               (#PCDATA)* > 
<!ATTLIST textarea
                name       NMTOKEN   #REQUIRED
                icon       CDATA     #IMPLIED
                rows       CDATA     #REQUIRED
                cols       CDATA     #REQUIRED
                disabled  (disabled) #IMPLIED
                readonly  (readonly) #IMPLIED
                tabindex   CDATA     #IMPLIED 
                accesskey  CDATA     #IMPLIED > 


Appendix B: References

[DATETIME]
Date and Time Formats, Misha Wolf and Charles Wicksteed. W3C, 15 September 1997.
See http://www.w3.org/TR/NOTE-datetime-970915
[DCD]
Document Content Description for XML (DCD), Tim Bray et al. W3C, 10 August 1998
See http://www.w3.org/TR/NOTE-dcd
[HTML-4]
HTML 4.0 Specification, Dave Raggett et al. W3C, 1998
See http://www.w3.org/TR/REC-html40
[ISO-31]
ISO 31 -- Quantities and units. International Organization for Standardization
[ISO-639]
ISO 639:1988 -- Codes for the representation of names of languages. International Organization for Standardization
[ISO-3166]
ISO 3166:1993 Countries. International Organization for Standardization
[ISO-8601]
ISO 8601 -- Date and Time. International Organization for Standardization
[ISO-8859-1]
ISO 8859-1:1987 -- Information Processing -- 8-bit single-byte coded graphic character sets -- Part 1: Latin alphabet No. 1
[ISO-10303-11]
ISO10303-11:1994(E) -- Industrial automation systems and integration -- Product data representation and exchange -- Part 11: Description methods: The EXPRESS language reference manual. International Organization for Standardization
[ISO-10646]
ISO10646 -- Information Technology -- Universal Multiple-Octet Coded Character Set (UCS) -- Part 1: Architecture and Basic Multilingual Plane, ISO/IEC 10646-1:1993. The current specification also takes into consideration the first five amendments to ISO/IEC 10646-1:1993.
[RFC-1808]
RFC 1808, Relative Uniform Resource Locators. Internet Engineering Task Force.
See http://ds.internic.net/rfc/rfc1808.txt
[URI]
ID, Uniform Resource Identifiers (URI): Generic Syntax and Semantics
See http://www.ics.uci.edu/pub/ietf/uri/draft-fielding-uri-syntax-01.txt
[URL]
RFC 1738, Uniform Resource Locators (URL). Internet Engineering Task Force.
See http://ds.internic.net/rfc/rfc1738.txt
[URN]
RFC 2141, URN Syntax. Internet Engineering Task Force.
See http://ds.internic.net/rfc/rfc2141.txt
[WAI-PAGEAUTH]
WAI Accessibility Guidelines: Page Authoring, Gregg Vanderheiden et al. W3C, 14-Apr-1998
See http://www.w3.org/TR/WD-WAI-PAGEAUTH
[WEBARCH-EXTLANG]
Web Architecture: Extensible Languages, Tim Berners-Lee and Dan Connolly. W3C, 10 Feb 1998
See http://www.w3.org/TR/NOTE-webarch-extlang
[WEBSGML]
Proposed TC for WebSGML Adaptations for SGML", C. F. Goldfarb, ed., 14 June 1997.
See http://www.sgmlsource.com/8879rev/n1929.htm
[XLink]
XML Linking Language (XLink), Eve Maler and Steve DeRose, W3C, 3 March 1998
See http://www.w3.org/TR/WD-xlink
[XML]
Extensible Markup Language (XML) 1.0, Tim Bray, et al. W3C, 10 February 1998
See http://www.w3.org/TR/REC-xml
[XML-Data]
XML-Data, Andrew Layman, et al. W3C, 05 January 1998.
See http://www.w3.org/TR/1998/NOTE-XML-data-0105
[XML-Namespaces]
Namespaces in XML, Tim Bray et al. W3C, 1998
See http://www.w3.org/TR/WD-xml-names
[XPointer]
XML Pointer Language (XPointer), Eve Maler and Steve DeRose, W3C, 3 March 1998
See http://www.w3.org/TR/WD-xptr

Appendix C: Datatype masks

A mask is a datatype format constraint. A mask consists of symbols, groups of symbols, and patterns, any of which may be modified by occurrence specifiers. Each symbol is a placeholder that stands for a character or a class of characters. Date and time masks tokens are taken from those defined in [ISO-8601]

A or a
A single alphabetic character
B or b
Any one of the boolean characters (0 or 1)
D
A single digit representing the day of the week, in the range 1-7 (Monday-Sunday) (ISO8601-5.1.2)
DD
Two digits representing a day in a month in the Gregorian calendar, in the range 01-31 (ISO8601-5.1.2)
DDD
Three digits representing a day in a year in the Gregorian calendar, in the range 001-366 (ISO8601-5.1.2)
E
The character "E", used to indicate floating point numbers
hh
Two digits representing hours in a day, in the range 00-24 (ISO8601-5.1.2)
MM
Two digits representing a month in the Gregorian calendar, in the range 01-12 (January-December) (ISO8601-5.1.2)
mm
Two digits representing minutes in an hour, in the range 00-59 (ISO8601-5.1.2)
N
Any valid XML name character
n
An integer number consisting of one or more digits.
P
The character "P", used as a "period designator" to indicate the duration of a period of time. (ISO8601-5.1.2)
p
Any one of the punctuation characters (. or : or ; or ,)
Q or q
Any one of the quote characters (" or ' or `)
S
Indicates a signed number. The characters "+" or "-" must appear in this position
ss
Two digits representing seconds in an minute, in the range 00-59 (ISO8601-5.1.2)
s
One or more digits representing a decimal fraction of a second (ISO8601-5.1.2)
T
The character "T", used as a "time designator" to indicate the start of date time of day field. (ISO8601-5.1.2)
U or u
Any character that is valid in a [URI], [URL], or [URN]
W
The character "W", used as a "week designator" to indicate the start of date week field. (ISO8601-5.1.2)
ww
Two digits representing the week number in a year, in the range 1-52 (ISO8601-5.1.2)
X or x
Any character
YYYY
Four digits of a year (ISO8601-5.1.2)
Z
The leftmost leading numeric character that can be replaced by a space character when the content of the Z position is the numeral 0
space
A single blank character
!, @, #, %, _, = /, {, }, :, ;, -,. and ,
Represent themselves
#
Any numeric character
$
A currency symbol
0
The single numeric character "0"
1
The single numeric character "1"
2
The single numeric character "2"
3
The single numeric character "3"
4
The single numeric character "4"
5
The single numeric character "5"
6
The single numeric character "6"
7
The single numeric character "7"
8
The single numeric character "8"
9
The single numeric character "9"
(...)
Represents a grouping of the symbols found between the parentheses. Within parentheses the meaning of mask symbols apply.
[...]
Represents one of the characters found between the square brackets. Within square brackets any character, except for "-" represents itself. The character "-" indicates a range of characters beginning with the character to the left of the "-" and ending with the character to the right.
*
Indicates that the preceding character, or group, may occur zero or more times.
+
Indicates that the preceding character, or group, may occur one or more times.
?
Indicates that the preceding character, or group, may occur zero or one time.
{n,m}
Indicates that the preceding character, or group, must occur at least n times and no more than m times.
n must be a positive integer or zero
m may be an integer greater than n, or
m may be the character "*", indicating that the maximum is unbounded

Appendix D: Datatype library

Derived scalar datatypes

All derived scalar datatypes have an XML parse type of string.

byte
Single byte. A specialization of int.
Minimum value: -128
Maximum value: 127
Format: S#*
double
A decimal value. A specialization of number.
Minimum value: -1.17549435 * 10E308
Maximum value: 1.17549435 * 10E308
float
A decimal value. A specialization of number.
Minimum value: -3.40282347 * 10E38
Maximum value: 3.40282347 * 10E38
int
A signed integer value. A specialization of number.
Minimum value: -2,147,483,648
Maximum value: 2,147,483,647
Format: S#*
long
A decimal value. A specialization of number.
Minimum value: -9,223,372,036,854,775,808
Maximum value: 9,223,372,036,854,775,807

Date and time datatypes

The following date and time datatypes are derived from ISO 8601 -- Date and Time [ISO-8601] and are informed by Date and time formats [DATETIME].

datetime
ISO 8601 (5.4.1.a) extended calendar date and local time format
Format: YYYY-MM-DDThh:mm:ss
XML parse type: string
datetime.tz
ISO 8601 (5.4.2) extended calendar date and local time format, with time zone designator.
Format: YYYY-MM-DDThh:mm:ssShh(:mm)?
XML parse type: string
time.tz
ISO 8601 (5.3.3.1) extended local time and UTC offset format.
Format: hh:mm:ssShh(:mm)?
XML parse type: string
time.UTC
ISO 8601 (5.3.3) extended UTC time format.
Format: hh:mm(:ss)?[Z]
XML parse type: string
hour
ISO 8601 (5.3.1.2) hour format.
Format: hh
Minimum: 0
Maximum: 24
XML parse type: string
minute
ISO 8601 (5.3.1.4.b) minute format.
Format: -mm
Minimum: 0
Maximum: 59
XML parse type: string
second
ISO 8601 (5.3.1.4.g) second format.
Format: ss(.s)?
Minimum: 0
Maximum: 59.99
XML parse type: string
year
ISO 8601 (5.2.1.2.b) specific year format.
Format: YYYY
Minimum: 0
Maximum: unspecified
XML parse type: string
year-and-day
ISO 8601 (5.2.2.1) extended ordinal date format.
Format: YYYY-DDD
Minimum: 1
Maximum: 366
XML parse type: string
month
ISO 8601 (5.2.1.3.e) month format.
Format: --MM
Minimum: 1
Maximum: 12
XML parse type: string
day-of-a-month
ISO 8601 (5.2.1.3.d) day-of-a-month format.
Format: -MM-DD
Minimum: 1
Maximum: 31
XML parse type: string
week
ISO 8601 (5.2.3.3-Www) week format.
Format: YYYY-Www
Minimum: 1
Maximum: 53
XML parse type: string
day-of-any-week
ISO 8601 (5.2.3.3.g) day-of-any-week format.
Format: ---D
Minimum: 1 (Monday)
Maximum: 7 (Sunday)
XML parse type: string

Enumerated datatypes

countries
ISO 3166 country codes [ISO-3166]
Format: AA
XML parse type: nmtoken
currencies
ISO currency codes
Format: AAA
XML parse type: nmtoken
lang
ISO language codes [ISO-639]
Format: AA
XML parse type: nmtoken
units
ISO 31 unit identifiers [ISO-31]
Format: A+
XML parse type: nmtoken

URI datatypes

URL
Uniform Resource Locator
Format: U*
XML parse type: string
URN
Universal Resource Name
Format: [u][r][n]:U*
XML parse type: string
email
An email address
Format: [m][a][i][l][t][o]:X@X
XML parse type: string
system
XML system identifier.
Format: X*
XML parse type: string

Non-normative Appendixes

Appendix E: Memorandum schema

<schema name="memo" namespace="http://www.veosystems.com/schemas/memo.xml">

<h1>Memo Document Type</h1>
<h2>Definitions</h2>
<intro>
<p>...</p>
<ul>
<li>...</li>
<li>...</li>
</ul>
</intro>

<h3>Memo element type</h3>

<elementtype name="memo">
<explain>
<title>Memo Document</title>
<synopsis>A simple, useful memo.</synopsis>
<help>
<p>Fill in attributes, enter paragraphs, lists and images, and press SEND.</p>
</help>
<p>A memo consists of six required fields and a body.</p>
</explain>

    <model>
        <sequence>
            <element name="to"/>
            <element name="from"/>
            <element name="cc"/>
            <element name="subject"/>
            <element name="file"/>
            <element name="date"/>
            <element name="body"/>
        </sequence>
    </model>
</elementtype>

<h3>Memo fields</h3>

<elementtype name="to">      <model><string/></model>    </elementtype>
<elementtype name="from">    <model><string/></model>    </elementtype>
<elementtype name="cc">      <model><string/></model>    </elementtype>
<elementtype name="subject"> <model><string/></model>    </elementtype>

<elementtype name="file">
    <model><string datatype="number"/></model>
</elementtype>
<elementtype name="date">
    <model><string datatype="date"/></model>
</elementtype>

<elementtype name="body">
    <model>
        <choice occurs="1,*">
            <element name="p"/>
            <element name="list"/>
            <element name="image"/>
        </choice>
    </model>
</elementtype>

<elementtype name="p">
    <model>
        <string/>
    </model>
</elementtype> 

<elementtype name="list">
    <model>
        <element name="item" occurs="3,9"/>
    </model>
</elementtype>

<elementtype name="item">
    <instanceof name="p"/>
</elementtype>

<elementtype name="image">
    <empty/>
        <attdef name="src" datatype="URI">
            <required/>
        </attdef>
</elementtype>

</schema>


Appendix F: Entities and notations

As in any collaborative work, some of the decisions that found their way into the SOX specification were fraught with technical differences of opinion. In particular, the authors and other collaborators had a hard time coming to terms with entities and notations. In the end we agreed to document support for both, while agreeing to disagree about whether another approach might be more suitable. However, we fully expect that the split that we encountered will be reflected in the outside world, so we are including herewith the minority opinion:

We foresee the spread of XML engendering the creation of large repositories of entities by different organizations. An instance corresponding to a Schema might legitimately choose to reference entities from a variety of these repositories (not all of which were necessarily known when the Schema was created). If we follow the approach of current DTDs, then we have the following alternatives:

  1. Include, directly or indirectly, definitions for all referenceable entities within the Schema directly. This is potentially huge in comparison to the rest of the Schema.
  2. Dynamically modify the Schema to include the entities or entity repositories we are interested in. This requires recipients of an instance to reread the Schema anytime there is such a change (and requires some kind of alert mechanism, or the Schema must always be read).
  3. Place additional declarations in the internal subset. This requires maintaining some degree of DTD functionality, and complicates dynamically composing documents, where the start of the document is not necessarily modifiable when the decision is made to reference an entity.

None of these alternatives is entirely acceptable. In addition, there is a serious issue of name clashes among entities defined in the various repositories, which can be a serious problem if the entities need to be defined within the Schema itself.

Given the characteristics of the issue, the clearest solution to the problem is to extend namespaces to cover entities as well as well as element and attribute names. Doing so provides a means to declare a collection of entity names (as a subset of whatever the referenced namespace is) and a way to reference entities without fear of name clashes.

In addition, this mechanism makes it possible to handle all text and unparsed entities outside of the Schema itself. Rather than provide both an inadequate mechanism for compatibility as well as a more flexible one for future development, [the minority opinion was to] have left entity declarations entirely out of SOX.

Removing entity declarations from the language requires supplying an alternative mechanism for supporting entities. Part of that is accomplished through extending the namespace mechanism to include entity references. Another part is by describing the storage objects which will hold entity references, the entity repositories.

An entity repository is an XML instance which declares some number of entities. These are either simple text entities (the equivalent of an internal text entity), external parsed entities, or unparsed entities. It is also possible to include other repositories. Each entity has a name attribute of type ID, so the names must all be unique within the repository.

When an XML DTD is converted to a SOX document, a repository is created with all then entities defined in the DTD. This file is also merged with the Schema if it is desirable to generate a DTD from a Schema. The repository then contains the entity portion of the Schema namespace, as referenced by instances.

Both a SOX document and a DTD defining the structure of a repository have been elided.

Within an entity definition file at www.veosystems.com:

<textentity name="astring">this is a text string </textentity>

<extentity  name="anentity" 
            public="urn:veo:text:anentity" 
            system="http://www.veosystems.com/anentity.xml" />

<datatype   name="gif" >
  <notation public="image/gif" />
</datatype>

<entity     name="animage" 
            notation="gif" 
            system="http://www.veosystems.com/animage.gif" />

Within a document

<elem xmlns:ents="http://www.veosystems.com" 
      img="ents:animage" >
&ents:astring;
&ents:anentity;
</elem>

Appendix G: Grateful acknowledgments

We gratefully acknowledge:

A note is made here in memory of Yuri Rubinsky, who was instrumental in developing and promoting the precursor to XML -- SGML on the Web.


Glossary

attribute
a property of an element with a name and value
base element type
a parameterized element type that enable reuse and support element inheritance.
choice content model atom
a content model atom which specifies selection among two or more content model atoms
comment
an XML comment using element markup
content model
describes the syntactical structure of an element type
content model atom

an element, string, mixed, choice, sequence or parameter content model atom

choice content model atom
specifies two or more content model atoms
element content model atom
references a defined element type by name, and optional namespace
mixed content model atom
specifies any mix of string content and named elements
parameter content model atom
specifies a content model atom
paramref content model atom
references a defined parameter definition by name, and optional namespace
sequence content model atom
specifies two or more content model atoms
string content model atom
specifies string content that may be specialized by datatype or format specification
content specification
any content specification
specifies that any mix of string content and known elements is permitted
empty content specification
specifies that content is not permitted
extends content specification
specifies inheritance (extends or specializes) of a named element
instanceof content specification
specifies inheritance of a named element
wfxml content specification
specifies well-formed XML content
context
the name of the parent object
current namespace
the namespace, or schema, currently in scope
datatype
constraints that are applicable to data content in elements and attribute values
defining element

The elements used to define SOX objects.

attdef
defines an attribute
specifies attribute datatype, optional enumeration, and presence
comment
an XML comment using element markup
datatype
defines a data type
elementtype
defines an element type
specifies content model and attributes
entity
defines an external unparsed entity
specifies a resource and its notation type
extentity
defines an external parsed entity
specifies a resource
interface
defines an attribute interface
specifies one or more attribute definitions
namespace
defines and imports a namespace
specifies a namespace and resource identifiers
notation
defines a notation and associates it with a resource
pi
an XML processing instruction in element markup
schema
defines a SOX document
specifies namespace and resource identifiers
textentity
defines an internal parsed entity, specifies replacement text
element
a reference to a named element type
element content
a content model which allows only child elements and does not allow string content
element content model atom
see content model atom
element type
definition of an element
empty element
see content specification
enumerated datatype
a datatype which specifies an enumerated list of values
entity
a virtual storage unit
enumeration
a list of valid attribute or datatype values
external unparsed entity
a named non-XML entity stored outside of the document entity
external parsed entity
a named XML entity stored outside of the document entity
false
one of the boolean values which may be interpreted `0'
fixed
specifies that an attribute value is pre-defined and immutable
format datatypes
datatypes, derived from non-number datatypes, which have an associated mask
implements
specifies attribute inheritance and specialization
imported namespace
a named set of objects which is available as a resource
included module
an external SOX resource which is sourced into a schema
instance of
specifies element inheritance (is-a)
interface
a named collection of attribute definitions
introduction
a container of documentation elements
mixed content
a content model which allows child elements and string content
module
an external SOX resource
name
an object identifier which matches [XML] name, except that colon (:) is not allowed
namespace
a collection of names, identified by a URI
namespace identifier
a shorthand name for a namespace URI
notation
identifies by name the format of unparsed entities
number
an intrinsic datatype
object
an attribute, datatype, element, entity, interface, notation, or parameter
occurrence
minimum and maximum number of uses of a content model atom or datatype mask token
parameter
a named content model atom that may be bound to an element definition
processing instruction
allow documents to contain instructions for applications
required
specifies that an attribute value must be specified in the document instance
root element
document element, no part of which appears in the content of any other element
scalar datatypes
datatypes derived from the intrinsic number datatype
scope
for attributes, element or interface, depending on where the attribute is defined.
for parameters, namespace or element, depending on where the parameter is defined.
sequence content model atom
a content model atom which specifies a sequence of two or more content model atoms
string content
a content model which allows character data and which does not allow child elements
string content model atom
a content model atom which specifies string content
true
one of the boolean values which may be interpreted as `1'