Initial Status

From MicroXML Community Group
Jump to: navigation, search

The following grammar (using the same syntax as the XML 1.0 Recommendation) describes an ultra-minimal subset of XML, which is intended only to include features that everybody can easily agree belong in MicroXML.

# Documents
document ::= s* element s*
# Elements
element ::= startTag content endTag
content ::= (element | dataChar | charRef)*
startTag ::= '<' name (s+ attribute)* s* '>'
endTag ::= '</' name s* '>'
# Attributes
attribute ::= name s* '=' s* attributeValue
attributeValue ::= '"' ((dataChar - '"') | charRef)* '"'
                   | "'" ((dataChar - "'") | charRef)* "'"
# Data characters
dataChar ::= char - ('<' | '&' | '>')
# Character references
charRef ::= hexCharRef | namedCharRef
hexCharRef ::= '&#x' [0-9a-fA-F]+ ';'
namedCharRef ::= '&' charName ';'
charName ::= 'amp' | 'lt' | 'gt' | 'quot' | 'apos'
# Names
name ::= nameStartChar nameChar*
nameStartChar ::= [A-Z] | [a-z] | "_"
nameChar ::= nameStartChar | [0-9] | "-" | "."
# White space
s ::= #x9 | #xA | #xD | #x20
# Characters
char ::= s | ([#x21-#x10FFFF] - forbiddenChar)
forbiddenChar ::= surrogateChar | #FFFE | #FFFF
surrogateChar ::= [#xD800-#xDFFF]

For possible features that could be added, see the issues list.

The following is a simple data model, which is designed to be very close to JsonML. This defines the data model as a grammar over a particular kind of tree; these trees have one atomic type, a character (equivalent to a Unicode code-point), and two composite types, arrays and maps. In the following, [...] denotes arrays, and {...} denotes maps:

document ::= element
element ::= [name, attributes, content]
attributes ::= { (name => attributeValue)* }
attributeValue ::= [ char* ]
content ::= [ (char | element)* ]
name ::= [ nameStartChar, nameChar* ]
char, nameStartChar, nameChar ::= <single character as in the grammar for the syntax>