W3C Architecture Domain

Is there a W3C icon for XML?

XML

Activity statements provide a managerial overview of W3C's work in each area, covering: an introduction to the activity, the goals of W3C work, the accomplishments to date, and future plans. They are designed to be read from beginning to end, to be informative and interesting. The introductory section serves to set the scene and to explain any technical concepts used in subsequent sections. Where necessary the explanation is expanded into a short tutorial. The role of W3C is given, also the benefits to the Web community, accomplishments to date and a summary of what the future holds.

Work on XML is being managed as part of W3C's Architecture domain.

Introduction

XML - the eXtensible Markup Language - is a simple and very flexible language based on SGML. Although originally envisaged to meet the challenges involved in large-scale publishing, XML is now beginning to have an increasingly important role to play in the markup of data on the Web.

Goals of W3C work on XML

To provide a light weight flexible markup language that is well suited to the needs of Web applications for marking up documents and data. The following sections will explain why XML is needed and how it works.

What is XML?

Many Web users are already familiar with HTML which uses a number of codes or "tags" to mark up components of a document. When the document arrives at your desk, these codes are used by the browser to decide which parts of the document are which. Common tags are, for example, P for 'paragraph', H1 for 'heading 1' and UL for "unordered list".

This idea of using codes like this to indicate the functional parts of documents forms the basis of SGML - the Standard Generalized Markup Language which is an ISO standard (ISO 8879:1986) for marking up documents. HTML is a sub-set of SGML: the full SGML specification would be very expensive to implement and goes beyond the needs of the great majority of Web users.

XML has many similarities to HTML, but constitutes a more flexible way of marking up data. Like HTML, XML is based on SGML. The difference between the two, very simply put, is that XML merely describes a syntax for markup: the names of the tags are not set in concrete and authors can "invent" them as appropriate. Whereas HTML provides a fixed repertoire of named tags like P, H2, UL and so on, XML simply spells out the rules for using angle brackets and other notation to specify a markup language of your own design. The tag names and what they actually mean are up to you and your browser!

Another way of looking at XML is that it is not a markup language in the strict sense of the word: it is more a meta-language to let you design your own markup languages for a multitude of functions. Indeed it is this aspect of XML which has attracted so much interest.

There are also many subtle differences between the syntax of HTML and XML which make the latter much easier to process by computer.

A simple example of XML

    <customer-details>
        <name>
            Acme Pharmaceuticals Co.
        </name>
        <address>
            <street/>7301 Smokey Boulevard
            <city/>Smallville
            <state/>Indiana
            <zip-code/>94571
        </address>
    </customer-details>

Tags are either containers with matching start and end tags, for instance: customer-details, name and address, or they are "empty" and end with "/>" such as: street, city, state and zip-code. This makes XML very easy to process.

What will XML be used for?

XML is an ISO-compliant subset of SGML, and, following the philosophy of SGML, clearly separates syntax from other processing behaviours. The formatting of data using XML is therefore quite separate from the programs that process it. XML is a low-level language on top of which other languages, and finally applications can be built. Let's consider what these applications may be.

The role of W3C in this area

W3C's work on XML has aimed to specify the language and how it is processed by applications. Both aspects are needed to ensure that different applications handle XML in a uniform way.

The XML working group has been involved in:

Work on XML is managed as part of W3C's Architecture domain.

What the future holds

Work is currently underway or planned on the following:

With the focus on XML shifting from publishing to the representation of data, W3C is considering ways to enlarge the community from which we solicit requirements. For instance, what special needs apply when using XML on mobile or embedded devices, for which memory and processing power is at a premium?

Further Information

A wealth of information relating to the XML Activity can be found on the XML home page. There are links to drafts and specifications, to events and publications, and to XML Working Groups and Discussion Forums.