XML Schema

The W3C XML Schema 1.0 Recommendation defines an XML schema language. Its salient characteristics are:

Unlike the DTD language defined in XML 1.0 (and in ISO 8879,

 the defining document for SGML), it uses XML syntax rather
 than a special non-XML syntax.  XML Schema documents are
 thus easier to process using standard XML tools than DTDs are.

It defines a set of simple types or datatypes for use in attribute

 values and simple element content (i.e. for elements with only
 character children); these include all the types most commonly
 found in programming languages and in database management
 systems, as well as a few others included for historical or
 other reasons.

It also provides a notion of complex types (for use with

 elements which may contain child elements); complex types
 can be constrained by attribute declarations and content models.

It distinguishes systematically between the generic identifiers

 (element type names, or 'tag names') written in angle brackets
 in the XML source, on the one hand, and the types assigned to 
 elements, on the other.  (This is sometimes referred to as
 'the tag/type distinction'.)

It allows for explicit relations between types. New

 simple types may be derived by restricting existing simple types.
 New complex types may be restrictions, or extensions, of
 existing types.

It defines explicitly what information generated as a by-product

 of validation may be made accessible to downstream applications;
 since this information is described as a set of augmentations
 to the input XML information set, the result of schema-validity
 assessment is described as a post-schema-validation infoset
 or PSVI.

It provides for wildcards in content models, which can match

 any element at all, any element in a particular namespace,
 any element in a namespace other than the target namespace of the
 schema document, etc.  Wildcards may be 'black-box' wildcards
 (no examination or validation of their contents), 'white-box'
 wildcards (all contents must be declared and valid), or
 'lax' wildcards (if a child element has a declaration, it will
 be validated; if not, it's not an error).

Instead of treating validity of documents as a simple all-or-nothing

 Boolean value, it provides discrete validity information for
 each element and attribute validated.

Note: 'XML Schema' is the name of the language defined by the W3C Rec. 'XML schema' is a common noun in English denoting a schema (in whatever formalism) for an XML vocabulary. To avoid confusion between the two, some people prefer to use the names 'XSD' or 'WXS' (W3C XML Schema) for the language defined in the Rec. or to use the full name 'W3C XML Schema' whenever confusion might otherwise arise.

What follows is a sketch of a possible skeleton set of topics related to XML Schema, to help encourage the development of a useful wiki on the subject.

All of these pages need to be drafted. You can help!

XML Schema software

Different kinds of software may be 'schema-aware'. It would be useful to have separate wiki pages with discussions of each type and pointers to specific software of the type.

Among the most obvious class of schema-aware software are:

schema-based validators
schema-aware XML editors and editing tools
data binding tools (for marshalling and de-marshalling

between XML and programming-language data structures)

form generators
schema-writing and maintenance tools
schema conversion tools
tools for exploring or displaying information about schemas
schema analyzers
schema-aware XSLT and XQuery engines
general toolkits

For more detail, see the page on XML Schema software.

Interoperability issues

Any schema-aware software is likely to be aimed at a particular application type or application domain; language features that don't match up neatly with the assumptions of the particular domain may be omitted or neglected. Among the features which are either unsupported or supported less conveniently than other features are:

mixed content: since most object-oriented languages lack

 any good structures for representing mixed content, many
 data-binding tools either don't support mixed content at all,
 or support it poorly or grudgingly.

recursive elements: many users report that their tools

 have trouble with recursion.  (It's not obvious why this 
 should be a problem, but it appears to be.)

choices: these pose a problem for many data-binding tools;

 in languages with variant record types, there is a natural
 representation, but in others?

substitution groups: data binding tools often provide

 poor support for substitution groups; they can be represented
 using class/subclass relations, but that entails writing
 classes not only for all types, but also for all elements,
 which may be undesirable for other reasons.

UPA (unique particle attribution constraint): some tools

 for generating schemas fail to check for
 this and generate non-deterministic content models; other
 tools feel compelled to accept such illegal schema documents

Schema technical issues

Perhaps some of these should be discussed in this page; others should probably be in separate pages.

Co-occurrence constraints
schema vs schema document
assembling a schema
schemas and object-orientation
schemas and the relational model
schemas and Web Services
XML Schema 1.0 and document-oriented vocabularies
sources of variation among schema processors
the 'unique particle attribution' (UPA) constraint:

 history, consequences, rationale

the 'element declarations consistent' (EDC) constraint
versioning schema-defined languages
versioning XML Schema itself
...

Other schema languages

Other languages that may be used to constrain data in XML or other forms:

DTDs
Relax NG
Schematron
SQL Schemas

Some languages appear to be of mostly historical interest now (some of these may belong in the list above, if they are still actively used and developed)

XML Data and XML Data Reduced (XDR)
SOX (Schema for Object-0riented XML)
DCD (Document Content Description for XML)
DDML (Document Definition Markup Language)
Trex

XML Schema software

Interoperability issues

Schema technical issues

Other schema languages

Resources: