This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 5732 - Provide a simplified syntax for XSD 1.1
Summary: Provide a simplified syntax for XSD 1.1
Status: NEW
Alias: None
Product: XML Schema
Classification: Unclassified
Component: Structures: XSD Part 1 (show other bugs)
Version: Future
Hardware: PC Windows 3.1
: P2 normal
Target Milestone: ---
Assignee: David Ezell
QA Contact: XML Schema comments list
URL:
Whiteboard:
Keywords: unclassified
Depends on:
Blocks:
 
Reported: 2008-06-06 14:15 UTC by David Ezell
Modified: 2012-12-04 00:54 UTC (History)
3 users (show)

See Also:


Attachments

Description David Ezell 2008-06-06 14:15:29 UTC
When XSD 1.0 was approved, the conventional wisdom was that the syntax could be complex, since "tools" would appear to make the job of editing Schema documents easier.  While tools have indeed appeared, people have continued to desire to hand edit schema documents (along with other web documents).

Over the years various proposals for a simplified sytax for XSD have appeared.  This issue provides a starting point for discussion.
Comment 1 Paolo Marinelli 2008-09-19 23:58:58 UTC
[This message was originally sent by me to the XML Schema Interest Group  (http://lists.w3.org/Archives/Member/w3c-xml-schema-ig/2008Sep/0022.html). But as also suggested during the last telecon, it is better to copy it here in order to make it public. Michael Kay's reply will be copied as an additional comment to this bug (http://lists.w3.org/Archives/Member/w3c-xml-schema-ig/2008Sep/0023.html)]

One of the points of criticism about XSD concerns its syntax. In particular, the normative XML-based syntax of schema documents is often considered too verbose, and it is argued that as such it has a low degree of human-readability. As pointed out by Erike Wilde, development tools provide mechanisms to ease the creation of XSD schema documents. In particular, such mechanisms typically consist of graphical interfaces hiding the underlying syntax and allowing to work with visual objects representing XSD components. The problem is that each tool provides its own graphical interface. And thus schema authors are required to learn new interfaces each time they switch to a different tool [1].

A completely different approach is represented by the development of XSD alternative syntaxes. In the literature there are some proposals going on that direction: XSCS (XML Schema Compact Syntax) [2], DTD++ 2.0 [3] and Extended DTD [4]. While XSCS and DTD++ has been proposed as alternative syntaxes for XML Schema, the aims of Extended DTD is to improve the expressivity of DTD and not to be an alternative notation for XML Schema. For this reason here we will not investigate it further.

Both XSCS and DTD++ are non-XML syntaxes, as the verbosity of the normative XSD syntax is at least in part due to the fact of being XML-based. However, neither XSCS nor DTD++ 2.0 are up-to-date to version 1.1 of XSD. Indeed, XSCS was designed as a compact syntax for XSD 1.0, while DTD++ 2.0 for SchemaPath (an extension of XSD 1.0 adding constructs for conditional type assignments in element declarations). Consequently, neither XSCS nor DTD++ has support for the new features introduced by XSD 1.1, e.g., assertions, types alternatives, open content, versioning, and so on. Thus, as such they cannot be directly used as alternative syntaxes for XSD 1.1.

The aim of this mail is not to propose a new alternative syntax for XML Schema 1.1 but rather to promote a discussion about the requirements that an alternative notation for XSD 1.1 should meet.

So here is a proposal of requirement list I wrote after some discussions within our research team here in Bologna and led by Fabio Vitali.

1. The new syntax should be at least partially non-XML. The normative XSD syntax has been designed to grasp every feature of the XSD model. So attempts to find alternative full-XML syntaxes would likely end up with notations very similar to the normative one (unless the new syntax is aimed at covering only a limited subset of the XSD features).

2. The new syntax should be defined in terms of the XSD model. The compact syntax for XSD 1.1 should not be designed merely as a non-XML reformulation of the normative notation, but as a way to reformulate the conceptual model. Nonetheless, the compact syntax should express all the XSD semantics. In order to be considered a reasonable alternative to the normative syntax,  a compact syntax for XSD should be able to represent all the features of XSD, either in a completely compact form, or (as for example happens with XPath), with a combination of compact and non-compact forms

3. The new syntax should be as close as possible to DTD. DTDs represented the official syntax for XML schemas for many years (before in SGML, and then in XML) and they are still widely used. Also when someone want to write by hand the content model of an XSD complex type on a sheet of paper, he/she probably resort to a DTD syntax. So we believe that the DTD syntax is widely known in the community of schema authors and consequently we believe it is worth designing a notation where the constructs are defined where possible using the conventions adopted by DTDs for similar purposes.

4. The new syntax should be DTD-compatible. Every DTD should be a legal schema in the new syntax. On the one hand, this requirement can be seen as a strengthening of the previous point. But on the other hand, jointly with the second requirement it also has a practical advantage: every DTD can be parsed into an XSD model without the need of intermediate conversion tools.

5. The new syntax should follow a flat structure. XML straightforwardly allows to represent nested structures. The XSD normative syntax makes use of this capability in a number of situations, the most obvious of which are anonymous type definitions and local declarations. We say that if a notation allows to "natively" represent nested structures then it has a deep structure. For instance, the RELAX NG compact syntax has a deep structure: each component is delimited by a pair of open and closed curly brackets.  On the other hand, DTDs does not have a deep structure. Indeed, roughly speaking, a DTD mainly is a sequence of element declarations and it is not possible to nest element declarations within content model definitions. Thus we say that DTD has a flat structure. From the requirements 3 and 4 (similarity and compatibility with DTDs), it follows that the new syntax for XSD should adopt a flat structure. But it is not the only reason why we are in favour of a flat structure. Indeed, we believe that a notation designed on the same lines of the RELAX NG compact syntax would result just in a re-encoding of XML without tags.

6. The new syntax may provide an escaping mechanism. The similarity to DTDs is a requirement we place. At the same time, we recognize that representing all the XSD features in a DTD-like notation might end up with an involved syntax. For this reason we believe that it could be useful to provide the possibility to express some XSD features in the normative XML representation. Clearly, the majority of the XSD features should be expressed in compact syntax, and only subtle aspects should require the use of the normative notation. In any case, we believe that the alternative syntax should at least support the following XSD features without resorting to the XML representation:
  a. Element and attribute declarations
  b. Simple and complex type definitions
  c. Attribute and model group definitions
  d. Derivations (in all forms)

With those design issues in mind, we are developing a compact syntax for XSD 1.1 and based on the work we already did in the context of DTD++ 2.0. Although we don't have an official name yet, here we call such a new syntax DTD++ 3.0. The work is still in progress, but we believe DTD++ 3.0 has reached an assessable state.

However we are not interested here in presenting DTD++ 3.0. We think that for the moment the discussion should focus on the requirements listed above.

Regards,
Paolo Marinelli


REFERENCES

[1] Wilde, Erik. A Compact Syntax for W3C XML Schema. XML.com: XML From the Inside Out. August 27, 2003. http://www.xml.com/pub/a/2003/08/27/xscs.html.

[2] A Compact XML Schema Syntax. Wilde, Erik and Sitllhard, Kilian. London, UK. 2003. XML Europe 2003.

[3] DTD++ 2.0: adding support for co-constraints. Fiorello, Davide, et al. Montreal, Quebec. 2004. Extreme Markup Languages 2004.

[4] Making DTD a Truly Powerful Schema Language. Wei, Shan and Liu, Mengchi. April 1, 2005, APWeb, pp. 333-338.
Comment 2 Michael Kay 2008-09-20 08:59:02 UTC
I think I would prefer an alternative approach.

Firstly, I would prefer to see an XML syntax. Perhaps with some micro-syntax within the elements and attributes, but within an XML skeleton. I think there are many advantages of this in terms of making schemas readable and writable by software; it also means that we can use the same schema composition mechanism. Also, the experience with XQuery of designing a language that is both unambiguous and extensible is not one I would want to repeat. It should be possible to convert between compact syntax and full syntax using XSLT.

Secondly, I would like to see an XML syntax that is essentially an abbreviated form of the current syntax, with all the abbreviations being
optional: so users have full flexibility on how much "compact" and how much "verbose" syntax they use in the same schema document.

One advantage of this approach is that users are never cut off from the full functionality of the language, while at the same time we don't have to guarantee that every possible construct has a compact form.

I believe that it's possible using this approach to come up with a compact syntax that is a lot more usable than the current syntax while still being recognizably XSD.

To take an example, here is an element declaration from the schema for XSLT
2.0:

<xs:element name="analyze-string" substitutionGroup="xsl:instruction">
  <xs:complexType>
    <xs:complexContent>
      <xs:extension base="xsl:element-only-versioned-element-type">
        <xs:sequence>
          <xs:element ref="xsl:matching-substring" minOccurs="0"/>
          <xs:element ref="xsl:non-matching-substring" minOccurs="0"/>
          <xs:element ref="xsl:fallback" minOccurs="0"
maxOccurs="unbounded"/>
        </xs:sequence>
        <xs:attribute name="select" type="xsl:expression" use="required"/>
        <xs:attribute name="regex" type="xsl:avt" use="required"/>
        <xs:attribute name="flags" type="xsl:avt" default=""/>
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>
</xs:element>

I think this could be reduced to:

<xs:element name="analyze-string" substitutionGroup="xsl:instruction"
            extends="xsl:element-only-versioned-element-type">
    <xs:attribute name="select" type="xsl:expression"/>
    <xs:attribute name="regex" type="xsl:avt"/>
    <xs:attribute name="flags" type="xsl:avt" default=""/>
    <xs:element ref="xsl:matching-substring" occurs="?"/>
    <xs:element ref="xsl:non-matching-substring" occurs="?"/>
    <xs:element ref="xsl:fallback" occurs="*"/> 
</xs:element>

I think this is a sufficient level of simplification. A key point is that this is not an alternative syntax, it is the result of applying a number of optional abbreviations: for example allowing xs:sequence to be omitted, xs:extension promoted as an attribute to a containing element, xs:complexContent to be omitted, xs:complexType to be omitted, attributes to appear before elements, etc. I would hope that we can apply these abbreviation rules systematically so that they are easily memorable for someone who knows (or, like me, half-remembers) the current syntax.

Of course I haven't tried to work out the detail. I would like to do some things that are not mere syntactic sugar, such as allowing a restriction of a content model to be expressed in terms of the differences from the base type (as is currently allowed for attributes but not for elements).


Michael Kay

Comment 3 Gioele Barabucci 2008-09-22 09:23:43 UTC
Maybe the term "compact syntax" describes the target of the
requirements Paolo Marinelli outlined better than "simplified syntax".

While the role of a simplified syntax is to ease the learning, and maybe
the writing, of XML schemata, a compact syntax aims to shorten the time
needed to write or sketch an XML schema or to modify an existing XML
schema.

These are some use cases I found for a *compact* syntax:

* Sketch on paper: Alice and Bob are asked to write a schema for a
certain class of documents. They sketch the schema using sheets of paper
and a dashboard. They discuss and jot down many different possible
designs.

A compact syntax is required because they are writing code by hand, and
we all know that writing XML by hand on paper is a tedious task.

* Prototyping: Lucas is trying to "reverse engineering" the schema
of a document. He does so in little steps, using an test-and-fail
approach. This involves rewriting the content model of a certain
element many times before going on to work on the next element.

In this case a compact syntax permits Lucas to write less code while
he goes through all the needed iterations. At the end he will probably
convert his work into a well commented schema, but in the meantime he
saves time by not dealing with the verbosity of XML and the (not so
easy) syntax of XML Schema.

* Extending DTD: Petra uses a 10-year old DTD with many constraints
stated inside the comments of the DTD. Frequently she forgets to satisfy
a constraint that says that the <accounts> element must contain at least
two account numbers, but less that ten for legacy reasons. The DTD-based
validator cannot spot this error.

A compact syntax based on DTD permits to add little snippets of XML
Schema into existing DTDs, bringing the functionality of XML Schema
where required (in this case the maxLength facet for a list of numbers)
without the need for a complete rewrite of old DTDs into full-fledged
schemata using the XML syntax of XML Schema.

As you can see all these use cases are related to manual writing of a
schema. I think there is little interest for a compact non-XML syntax
when it comes to automated generation of schemata.

Finally, a DTD-based syntax permits to embed an XML Schema in an
XML document using the <!DOCTYPE> declaration.


Gioele Barabucci