Warning:
This wiki has been archived and is now read-only.

Current Status

From MicroXML Community Group

Jump to: navigation, search

Syntax

This grammar adds back to the Initial Status the things that appear to have attracted consensus, specifically:

comments;
empty element syntax;
Unicode characters in names.

It also adds back processing instructions before the root element only, though there is no consensus on PIs yet, in order to match the grammar in the 2012-09-08 Editor's Draft. Note that the production numbers will remain stable until there is consensus.

# Documents
[1] document ::= (comment | pi | s)* element (comment | s)*

# Elements
[4] element ::= startTag content endTag
              | emptyElementTag
[5] content ::= (element | comment | pi | dataChar | charRef)*
[6] startTag ::= '<' name (s+ attribute)* s* '>'
[7] emptyElementTag ::= '<' name (s+ attribute)* s* '/>'
[8] endTag ::= '</' name s* '>'

# Attributes
[9] attribute ::= attributeName s* '=' s* attributeValue
[10] attributeValue ::= '"' ((attributeValueChar - '"') | charRef)* '"'
                      | "'" ((attributeValueChar - "'") | charRef)* "'"
[11] attributeValueChar ::= char - ('<'|'>'|'&')
[12] attributeName ::= name - 'xmlns'

# Data characters
[13] dataChar ::= char - ('<'|'&'|'>')

# Character references
[14] charRef ::= hexCharRef | namedCharRef
[16] hexCharRef ::= '&#x' [0-9a-fA-F]+ ';'
[17] namedCharRef ::= '&' charName ';'
[18] charName ::= 'amp' | 'lt' | 'gt' | 'quot' | 'apos'

# Comments
[19] comment ::= '<!--' ((char - '-') | ('-' (char - '-')))* '-->'

# Processing Instructions
[22] pi ::= '<?' target (s+ attribute)* s* '?>'
[23] target = name - (('X' | 'x') ('M' | 'm') ('L' | 'l'))

# Names
[24] name ::= nameStartChar nameChar*
[25] nameStartChar ::= [A-Z] | [a-z] | "_" | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D]
                     | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF]
                     | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
[26] nameChar ::= nameStartChar | [0-9] | "-" | "." | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]

# White space
[27] s ::= #x9 | #xA | #x20

# Characters
[28] char ::= s | ([#x21-#x10FFFF] - forbiddenChar)
[29] forbiddenChar ::= [#x7F-#x9F] | surrogateCodePoint
                     | [#xFDD0-#xFDEF] | [#xFFFE-#xFFFF] | [#x1FFFE-#x1FFFF]
                     | [#x2FFFE-#x2FFFF] | [#x3FFFE-#x3FFFF] | [#x4FFFE-#x4FFFF]
                     | [#x5FFFE-#x5FFFF] | [#x6FFFE-#x6FFFF] | [#x7FFFE-#x7FFFF]
                     | [#x8FFFE-#x8FFFF] | [#x9FFFE-#x9FFFF] | [#xAFFFE-#xAFFFF]
                     | [#xBFFFE-#xBFFFF] | [#xCFFFE-#xCFFFF] | [#xDFFFE-#xDFFFF]
                     | [#xEFFFE-#xEFFFF] | [#xFFFFE-#xFFFFF] | [#x10FFFE-#x10FFFF]
[30] surrogateCodePoint ::= [#xD800-#xDFFF]

Data Model

The data model for MicroXML is defined as a grammar over a particular kind of tree; these trees have one atomic type, a character (equivalent to a Unicode code-point), and two composite types, arrays and maps. In the following, [...] denotes arrays, and {...} denotes maps:

document ::= [element, pi*]
element ::= [name, attributes, content]
pi ::= [name, attributes]
attributes ::= { (name => attributeValue)* }
attributeValue ::= [ char* ]
content ::= [ (char | element)* ]
name ::= [ nameStartChar, nameChar* ]
char, nameStartChar, nameChar ::= <single character as in the grammar for the syntax>

Note that comments are not in the data model.

Parsing

These points appear to have consensus:

UTF-8 only
Newline normalization as in XML
No attribute value normalization: literal newlines in attribute values are preserved
No requirement for draconian error handling

Retrieved from "https://www.w3.org/community/microxml/wiki/index.php?title=Current_Status&oldid=47"

Current Status

Syntax

Data Model

Parsing

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Navigation

Tools