Web Markup Language (WML)

This is for discussion purposes only, and has been produced on a purely personal basis.

This document defines a light weight subset of XML, which is much simpler to implement than XML 1.0. It is designed to meet the needs of applications which need the basic features of tags and attributes without frills like document type declarations, marked sections, entity declarations, comments or processing instructions.

WML assumes a layering model. The network transport, or file storage may use appropriate conventions for line breaks, tab size and character encodings, e.g. US ASCII, UTF-8, or ISO 2022-JP, but this is hidden from the WML parser, which formally operates on a simple stream of Unicode characters, with line breaks defined by newline characters. Tabs, carriage returns and other control characters are never present in this character stream.

Many people continue to edit markup with text editors. So an important goal is to keep this easy, for instance allowing attributes to be given on successive lines. Tags and attribute names are always in lower case, thereby removing a source of confusion.

An WML document conforms to the following grammar:

WML         ::= element

element     ::= EmptyTag | StartTag (Datachar| element)+ EndTag

EmptyTag    ::= '<'  Name Attribute* '/>'
StartTag    ::= '<'  Name Attribute* '>'
EndTag      ::= '</' Name '>'

Attribute   ::= White+ Name  White* '=' White* '"' Datachar* '"'

Datachar    ::= '&amp;' | '&lt;' | '&gt;' | '&quot;' | '&#' Hex+ ';'
                 | (Char excluding ('&' | '<' | '>' | '"'))

Name        ::= Letter (Letter|Digit|'-'|':'|'.')*

Letter      ::= 'a' to 'z'
Digit       ::= '0' to '9'

Hex         ::= Digit | 'a' | 'b' | 'c' | 'd' | 'e' | 'f'

White       ::= ' ' | newline

Char        ::=  Unicode characters including escape and newline, 
                 but otherwise excluding values below 32, and
                 between 128 and 159.

Numeric character entities such as &#c9; (upper case acute E) can be used to represent characters which can't be represented directly. Whitespace handling is left to applications, but it is suggested that to allow for pretty printing with nested indents, applications should be liberal in the way they handle whitespace.

The name WML was first conceived by James Clark for his canonical XML format. The above grammar evolved from James' work when subjected to the pressures imposed by layering and ease of authoring.

Dave Raggett 30th January 1998.