XML: What's in it for your website?

Steven Pemberton, W3C and CWI, Amsterdam

Chair W3C HTML Working Group

Co-chair W3C Forms Working Group

If we've got HTML, why do we need XML?

Most websites are a terrible mess! Only a tiny percentage validate as correct HTML (see validator.w3.org to try your site out)

One of the reasons that web designers need to test their sites on different browsers is because browsers 'correct' bad HTML in different ways

XML ensures that markup is better than it used to be, but you still need to validate.

Writing tools for websites is hell!

Future tools will only work with XML, and will start to create a compelling reason to use XML rather than HTML

If we've got XML, why do we need XHTML?

Indeed.

We asked the same question of a selection of industry representatives:

XHTML

XHTML is a family of XML-based markup languages.

Currently:

And soon:

XHTML 2

We are currently working on the new member of the XHTML family, in our minds the real XHTML.

Our aims are:

In fact as I will show, many of these things are intertwined.

Generic XML

By 'generic XML' we mean: if a facility exists in XML technologies, and it is suitable, use it and not a special-purpose XHTML facility. Try to get missing functionality added to XML.

Examples:

Major missing functionality: Linking (XLink insufficient for XHTML's needs).

Advantages: less variability; more interoperability; much of XHTML 2 works already; opportunity to make a cleaner break.

Less presentation, more structure

Remove all presentation-only markup.

Use stylesheets to define presentation.

Advantages: possible to author once, and display on different devices; better presentation possibilities; device presentation not hardwired; CSS has support for devices; more accessibility.

Power of CSS currently seriously underappreciated.

(Note: doesn't require CSS to be implemented; just uses its model)

Separating Content and Presentation: Author Advantages

Separating Content and Presentation: Webmaster Advantages

Separating Content and Presentation: Reader Advantages

Separating Content and Presentation: Implementor Advantages

More structure

Add more semantically-oriented markup to make documents richer.

Examples: <line> element instead of <br>. <section> and<h> elements instead of <h1> <h2> etc

Not

I think that I shall never see<br>
A poem lovely as a tree

but

<line>I think that I shall never see</line>
<line>A poem lovely as a tree</line>

Advantages: more presentational opportunities (folding, marquee, numbering)

More usability

As an example of poor usability, current frames are a disaster!

Usability of frames [continued]

Currently devising XFrames, a replacement for Frames.

Advantages: all of the above.

More accessibility

This largely falls out automatically from other aims.

For instance <h1>, <h2> etc are mostly terrible for accessibility, because no one uses them right, and it is hard to work out document structure from so little information.

Structure/Accessibility example

<h2>Chapter 1</h2>
...
<h3>Section 1</h3>
...

is now:

<section>
<h>Chapter 1</h>
...
   <section>
   <h>Section 1</h>
   ...
   </section>
</section>

More structure gives more accessibility. So does device independence.

Internationalization

Less scripting

Observe how scripting is currently used.

Identify missing markup/functionality.

Add it where possible; try to cover 80% mark

Examples: menus for navigation; forms data checking; folding presentation.

Advantages: more devices, more presentational variations, better search

More device independence

Less scripting

No hard-wired presentation

Events

New Forms

Events

Current HTML events are a disaster

Problems include:

<a href="..." onclick="location.href='http://www.htmlhelp.com/cgi-bin/validate.cgi?url=' +  document.location.href;">

XML Events

XML markup binding to DOM2 Events

Extensible for new event types

'Abstract' events can replace the old device-dependent ones (e.g.'activate' instead of 'click')

Independent of scripting language

Can entwine event markup in document, or can separate it out

Advantages: more types of events, other types of scripting (e.g. declarative)

<a onclick="...javascript..." ...>

becomes

<a ev:event="activate" ev:handler="#myhandler" ...>

XForms

HTML forms have proven their worth.

XForms improves this:

Essentially defines two separate parts: the 'real' form (data, data types and submission details), and form controls bound to the data.

A language may define its own set of form controls.

XForms 'native' form controls are device-independent, and accessible.

Example Form Control

<input ref="order/shipTo/street">
  <label>Street</label>
  <hint>Please enter the number and street name</hint>
</input>

A user agent has a default presentation.

If the user agent supports it, a stylesheet can be used to define other presentations.

Another example control

<select ref="icecream/flavors">
  <label>Flavours</label>
  <item><label>Vanilla</label><value>v</value></item>
  <item><label>Strawberry</label><value>s</value></item>
  <item><label>Chocolate</label><value>c</value></item>
</select>

This example covers both radio-button style selection, and menu selection: not encoded in the control.

The instance data is just XML

<xforms:instance>
    <payment>
      <method/>
      <number/>
      <expiry/>
    </payment>
</xforms:instance>

and later:

<select1 ref="method">
  <label>Select Payment Method</label>
  <item><label>Cash</label><value>cash</value></item>
  <item><label>Credit</label><value>cc</value></item>
</select1>
<input ref="number"><label>Credit Card Number</label>
</input>
<input ref="expiry"><label>Expiration Date</label>
</input>

The instance data is also remotely retrievable

<instance src="http://www.example.com/instance"/>

which means you can edit an XHTML page with an XHTML page!

An advantage of using XML!

What we are trying to achieve

More usability

Division of content and presentation

More accessibility

More thought put into websites

But can we achieve this?

The Kiss of the Spiderbot

"Google is, for all intents, a blind user. A billionaire blind user with tens of millions of friends, all of whom hang on his every word. I suspect Google will have a stronger impact than [laws] in building accessible websites."

...

"In a world where Google likely has a valuation several orders of magnitude higher than any chrome such as flash, graphics, audio, interactivity, or "personalization", I see a heady revision."

Karsten M. Self

Arachnophilia

Things to avoid:

More information

About correct use of standards: www.webstandards.org

About designing websites using standards, structure, stylesheets, etc.: www.alistapart.com

About XHTML: www.w3.org/Markup

Members of W3C can also look at www.w3.org/Markup/Group.

To get involved, join W3C!

Conclusions

HTML was originally designed as a structure description language, not a presentational language.

The design of XHTML is truly 'radical': taking HTML back to its roots.

Device independence and accessibility are surprisingly closely related, as are accessibility and usability.

Using XML through XHTML promises future advantages: start planning now.

Even though website builders may not yet know it, XML, device independence, accessibility and usability have a major economic argument in their favour. Spread the word!