XML Schema
The W3C XML Schema 1.0 Recommendation defines an XML schema language. Its salient characteristics are:
- Unlike the DTD language defined in XML 1.0 (and in ISO 8879,
the defining document for SGML), it uses XML syntax rather than a special non-XML syntax. XML Schema documents are thus easier to process using standard XML tools than DTDs are.
- It defines a set of simple types or datatypes for use in attribute
values and simple element content (i.e. for elements with only character children); these include all the types most commonly found in programming languages and in database management systems, as well as a few others included for historical or other reasons.
- It also provides a notion of complex types (for use with
elements which may contain child elements); complex types can be constrained by attribute declarations and content models.
- It distinguishes systematically between the generic identifiers
(element type names, or 'tag names') written in angle brackets in the XML source, on the one hand, and the types assigned to elements, on the other. (This is sometimes referred to as 'the tag/type distinction'.)
- It allows for explicit relations between types. New
simple types may be derived by restricting existing simple types. New complex types may be restrictions, or extensions, of existing types.
- It defines explicitly what information generated as a by-product
of validation may be made accessible to downstream applications; since this information is described as a set of augmentations to the input XML information set, the result of schema-validity assessment is described as a post-schema-validation infoset or PSVI.
- It provides for wildcards in content models, which can match
any element at all, any element in a particular namespace, any element in a namespace other than the target namespace of the schema document, etc. Wildcards may be 'black-box' wildcards (no examination or validation of their contents), 'white-box' wildcards (all contents must be declared and valid), or 'lax' wildcards (if a child element has a declaration, it will be validated; if not, it's not an error).
- Instead of treating validity of documents as a simple all-or-nothing
Boolean value, it provides discrete validity information for each element and attribute validated.
Note: 'XML Schema' is the name of the language defined by the W3C Rec. 'XML schema' is a common noun in English denoting a schema (in whatever formalism) for an XML vocabulary. To avoid confusion between the two, some people prefer to use the names 'XSD' or 'WXS' (W3C XML Schema) for the language defined in the Rec. or to use the full name 'W3C XML Schema' whenever confusion might otherwise arise.
What follows is a sketch of a possible skeleton set of topics related to XML Schema, to help encourage the development of a useful wiki on the subject.
All of these pages need to be drafted. You can help!
XML Schema software
Different kinds of software may be 'schema-aware'. It would be useful to have separate wiki pages with discussions of each type and pointers to specific software of the type.
Among the most obvious class of schema-aware software are:
- schema-based validators
- schema-aware XML editors and editing tools
- data binding tools (for marshalling and de-marshalling
between XML and programming-language data structures)
- form generators
- schema-writing and maintenance tools
- schema conversion tools
- tools for exploring or displaying information about schemas
- schema analyzers
- schema-aware XSLT and XQuery engines
- general toolkits
For more detail, see the page on XML Schema software.
Interoperability issues
Any schema-aware software is likely to be aimed at a particular application type or application domain; language features that don't match up neatly with the assumptions of the particular domain may be omitted or neglected. Among the features which are either unsupported or supported less conveniently than other features are:
- mixed content: since most object-oriented languages lack
any good structures for representing mixed content, many data-binding tools either don't support mixed content at all, or support it poorly or grudgingly.
- recursive elements: many users report that their tools
have trouble with recursion. (It's not obvious why this should be a problem, but it appears to be.)
- choices: these pose a problem for many data-binding tools;
in languages with variant record types, there is a natural representation, but in others?
- substitution groups: data binding tools often provide
poor support for substitution groups; they can be represented using class/subclass relations, but that entails writing classes not only for all types, but also for all elements, which may be undesirable for other reasons.
- UPA (unique particle attribution constraint): some tools
for generating schemas fail to check for this and generate non-deterministic content models; other tools feel compelled to accept such illegal schema documents
Schema technical issues
Perhaps some of these should be discussed in this page; others should probably be in separate pages.
- Co-occurrence constraints
- schema vs schema document
- assembling a schema
- schemas and object-orientation
- schemas and the relational model
- schemas and Web Services
- XML Schema 1.0 and document-oriented vocabularies
- sources of variation among schema processors
- the 'unique particle attribution' (UPA) constraint:
history, consequences, rationale
- the 'element declarations consistent' (EDC) constraint
- versioning schema-defined languages
- versioning XML Schema itself
- ...
Other schema languages
Other languages that may be used to constrain data in XML or other forms:
- DTDs
- Relax NG
- Schematron
- SQL Schemas
Some languages appear to be of mostly historical interest now (some of these may belong in the list above, if they are still actively used and developed)
- XML Data and XML Data Reduced (XDR)
- SOX (Schema for Object-0riented XML)
- DCD (Document Content Description for XML)
- DDML (Document Definition Markup Language)
- Trex