Building a Tokenizer for XPath or XQuery

1 Introduction

The XQuery language is, for the most part, a super-set of the smaller XPath language ^[1]. XPath is designed to be a simple addressing and expression language, while XQuery includes prologue declarations, function, variable, and extension declarations, XML element construction, and advanced iteration constructs. These languages do not contain reserved keywords, which is to say data references in the form of QNames are designed to never overlap with keywords in the language. In order to achieve the combination of expression language, directory-style addressing, prologue declarations, type matching, and XML element construction, the meaning of the tokens are extremely context dependent. For instance, the string div may be an element reference, or may be a division operator. Likewise, the * might be a wildcard element reference, may be a multiplication operator, or may be a kleene operator.

As illustration, the following is a very unlikely, but legal, XQuery example:

declare namespace namespace = "http://example.com"; 
declare union <union>for gibberish {
   for $for in for return <for>***div div</for>
}</union>, 
if(if) then then else else- +-++-**-* instance 
of element(*)* * * **---++div- div -div

1.1 Structure of Parsers

XPath and XQuery parsers may be either top-down or bottom-up. Traditionally, the parsing process is decomposed into subcomponents:

[Definition: Lexical scanner, which assigns symbols known as tokens to units in the input, in essence recognizing the basic words and punctuation of the language. Typically, the lexer uses regular expressions as a specification for how to do this tokenization. The rules that govern the lexical structure of the language, sometimes known as microsyntax, should be simple, regular, and as context-free as possible.]
[Definition: Parser, which arranges the tokens into a hierarchical structure. Typically, the parser uses a BNF or variant description (EBNF in the case of XPath/XQuery) as a specification for this parsing.]

Since the tokenization of XPath/XQuery is so context dependent, this breakdown may not be as simple as with other programming languages. Furthermore, the granularity of tokenization is an issue.

1.2 Lexical Scanning Strategies

We can speculate on an enumeration of some strategies for implementing a lexical scanner as follows:

[Definition: Scan-while-parse scanner. Using this strategy, the parser would evaluate the character stream into tokens, while at the same time branching to the respective hierarchical context. Thus the words can be discovered in context. Complex parsers are typically not built in this fashion, but, given the issue of context sensitivity, this may be a reasonable approach. This would probably be implemented as a hand-constructed recursive-descent parser. ]
[Definition: Fuzzy token scanner. Using this strategy, a typical simple lexical scanner is implemented, but the meaning of the tokens are often left somewhat ambiguous. Thus, the word div might be assigned the symbol QNameOrKeyword. The words can be disambiguated during the context sensitive parse process. This approach would probably be a hand-constructed, Top-down parser.]
[Definition: Two-pass scanner. This is the same as the Fuzzy token scanner, but the ambiguous words can be disambiguated by a second scan, using some rules (probably expressed as a context-free grammar) that reflect the basic structure of the EBNF (call this the three-pass approach). The three-pass approach could conceivably be implemented via a parser constructor program, with the addition of a custom disambiguator in between the lexer and parser. This method could be used for either Top-down or Bottom-up techniques.]
[Definition: State-driven scanner. This approach recognizes sets of words only in certain states, each word or punctuation possibly transitioning to a next state. In other words, it implements lexical recognition as transition tables for a push-down automata.]

1.2.1 Token Granularity

In XPath/XQuery, the meaning of a word may often be determined by the words and punctuation ahead of it. Thus, the word if is only the "if" keyword if it is followed by a left-parenthesis (ignoring whitespace). Likewise, the word declare may be a QName, or may be part of a NamespaceDecl in the prolog, if followed by a namespace word. Inversely, namespace is determined to be a keyword verses a QName by it's appearance after declare. In the XQuery and XPath specifications, in the EBNF appendix, the notation "< ... >" is used to indicate a grouping of terminals that together may help disambiguate the individual symbols.

A scanner might take one of two approaches for assigning token units to the character stream:

[Definition: Short tokens. Using this approach, declare namespace would be considered two tokens, the exact symbols assigned based on lexical look-ahead (and look-behind).] In this case, a parser that has at least a look-ahead of two tokens is required (in particular, a two-token look-ahead should be sufficient for a LALR parser).
[Definition: Long tokens. Using this approach, declare namespace would be considered a single token.] In this case, a parser that has a look-ahead of only one token can be implemented.

1.3 Parser Strategies

The general strategies for implementing the parser are as follows:

Recursive-descent parser. For our purposes, this term means a hand-constructed, recursive descent parser.
[Definition: LL Parser (Left to right, Leftmost derivation, hence LL). ] The XPath/XQuery EBNF is expressed as an LL(1) grammar, so an LL parser can use the EBNF from the specifications fairly directly.
[Definition: LALR Parser (Look Ahead Left to right, Rightmost derivation, hence LALR). ] The XPath/XQuery EBNF is expressed as an LL(1) grammar, so an LALR parser must use what is essentially an inverted form of the original EBNF.

2 Lexical Processing

2.1 Lexical States

Many lexical scanner generators will allow specifications of tokens that are recognized based on some condition. It is possible to condition the XQuery and XPath token recognition in a lexical scanner in terms of a series of states and transitions between those states.

[Definition: A lexical state is a condition where a defined set of patterns are recognized.] The tables below define the complete lexical rules for [XPath/XQuery]. Each two column table corresponds to a lexical state and lists in the first column the tokens that are recognized when in that state. Any token not listed should result in a syntax error when in that state. When a given token is recognized in the given state, the transition to the next state and/or a lexical action is given in the right column. [Definition: A lexical action is an action that occurs as an side-effect of a pattern recognition.] Following are a list of lexical actions used in the tables:

pushState(): The current state, before the transition, is pushed onto an abstract stack.
pushState(STATE): A STATE parameter is pushed onto an abstract stack.
popState(): The most recently pushed state is removed from the top of the stack, and becomes the current state.
input_stream.backup(n): This is a special action that backs up the character stream by n characters. See the description of the OCCURRENCEINDICATOR state for more information.
(maintain state): This simply means that the current state is maintained with no modification.

The lexical states have, in many cases, close connection to the parser productions. However, just because a token is recognized in a certain lexical state, does not mean it will be legal in the current EBNF production.

The XQuery EBNF has some places where whitespace has explicit specification, which is to say, whitespace is not ignored in these states: START_TAG, END_TAG, ELEMENT_CONTENT, XML_COMMENT, PROCESSING_INSTRUCTION, PROCESSING_INSTRUCTION_CONTENT, CDATA_SECTION, QUOT_ATTRIBUTE_CONTENT, and APOS_ATTRIBUTE_CONTENT.

Editorial note
The following tables were hand-constructed and have not, at the time of this writing, been exhaustively verified against all possible paths that may be legal in the XQuery and XPath EBNFs, which is to say, it is possible they contain bugs.

2.1.1 XQuery Lexical States

The DEFAULT State

This state is for patterns that occur at the beginning of an expression or subexpression.

Pattern

Transition To State

DecimalLiteral, "..", ".", DoubleLiteral, IntegerLiteral, NotNumber, <NCName ":" "*">, QName, ")", <"*" ":" NCName>, "*", StringLiteral, <"declare" "construction">, <"declare" "default" "order">

OPERATOR

<"declare" "default" "collation">, <"declare" "namespace">, <"module" "namespace">, <"declare" "base-uri">

NAMESPACEDECL

<"declare" "default" "element">, <"declare" "default" "function">, <"import" "schema">, <"import" "module">, <"declare" "copy-namespaces">

NAMESPACEKEYWORD

"$", <"for" "$">, <"let" "$">, <"some" "$">, <"every" "$">

VARNAME

<"declare" "variable" "$">

VARNAME

<")" "as">

ITEMTYPE

<"element" "(">, <"attribute" "(">, <"schema-element" "(">, <"schema-attribute" "(">, <"comment" "(">, <"text" "(">, <"node" "(">, <"document-node" "(">

pushState(OPERATOR)

KINDTEST

<"processing-instruction" "(">

"<!--"

"<?"

PROCESSING_INSTRUCTION

"<![CDATA["

"<"

<"declare" "boundary-space">

XMLSPACE_DECL

"}"

popState()

<"validate" "{">, <"validate" ValidationMode>

pushState(OPERATOR)

DEFAULT

<"typeswitch" "(">

DEFAULT

<"element" "{">, <"attribute" "{">

pushState(OPERATOR)

DEFAULT

<"attribute" QName "{">, <"element" QName "{">, <"document" "{">, <"text" "{">, <"processing-instruction" "{">, <"processing-instruction" NCName "{">, <"comment" "{">

pushState(OPERATOR)

<"declare" "function">

DEFAULT

"(:"

pushState()

EXPR_COMMENT

"{", <"ordered" "{">, <"unordered" "{">

pushState(OPERATOR)

DEFAULT

<"declare" "ordering">

<"xquery" "version">

"(#"

<"declare" "option">

<"at" URILiteral>

";", ",", "(", <QName "(">, <"if" "(">, "-", "+", "//", "/", <"ancestor-or-self" "::">, <"ancestor" "::">, <"attribute" "::">, <"child" "::">, <"descendant-or-self" "::">, <"descendant" "::">, <"following-sibling" "::">, <"following" "::">, <"parent" "::">, <"preceding-sibling" "::">, <"preceding" "::">, <"self" "::">, "@"

DEFAULT

The DECLAREORDERING State

Special state to recognize declare ordering specific keywords.

Pattern

Transition To State

"ordered", "unordered"

DEFAULT

The OPERATOR State

This state is for patterns that are defined for operators.

Pattern

Transition To State

"{"

pushState(OPERATOR)

DEFAULT

";", "then", "else", "external", "and", "at", ":=", ",", "div", "=", "except", "eq", "ge", "gt", "le", "lt", "ne", ">=", ">>", ">", "idiv", "intersect", "in", "is", "[", "<=", "<<", "<", "-", "mod", "*", "!=", <"order" "by">, <"stable" "order" "by">, "or", "+", "return", "satisfies", "//", "/", "to", "union", "|", "where", ("preserve" | "strip")

DEFAULT

<"castable" "as">, <"cast" "as">

SINGLETYPE

<"instance" "of">, <"treat" "as">, "case", "as", <")" "as">

ITEMTYPE

"}"

popState()

"$", <"for" "$">, <"let" "$">

VARNAME

"(:"

pushState()

EXPR_COMMENT

")", "?", <"empty" "greatest">, <"empty" "least">, "ascending", "descending", "default", "]"

OPERATOR

"collation"

URITOOPERATOR

StringLiteral, NotOperatorKeyword

(maintain state)

The XQUERYVERSION State

This state is for recognition of XQuery version specific keywords.

Pattern

Transition To State

";"

DEFAULT

StringLiteral, "encoding"

XQUERYVERSION

The NAMESPACEDECL State

This state occurs inside of a namespace declaration, and is needed to recognize a NCName that is to be used as the prefix, as opposed to allowing a QName to occur. (Otherwise, the difference between NCName and QName are ambiguous.)

Pattern

Transition To State

URILiteral, ",", <"at" URILiteral>

"(:"

"=", NCName

";"

The URITOOPERATOR State

This state is to recognize a URILiteral that transits to the OPERATOR state.

Pattern

Transition To State

URILiteral

OPERATOR

The NAMESPACEKEYWORD State

This state occurs at places where the keyword "namespace" is expected, which would otherwise be ambiguous compared to a QName. QNames can not occur in this state.

Pattern

Transition To State

URILiteral

NAMESPACEDECL

"inherit", "no-inherit"

"namespace"

"(:"

<"default" "element">, "preserve", "no-preserve", ","

NAMESPACEKEYWORD

The XMLSPACE_DECL State

This state occurs at places where the keywords "preserve" and "strip" is expected to support "declare xmlspace". QNames can not occur in this state.

Pattern

Transition To State

"preserve", "strip"

DEFAULT

"(:"

pushState()

EXPR_COMMENT

The SINGLETYPE State

This state distinguishes tokens that can occur only inside the SingleType production.

Pattern

Transition To State

"(:"

The ITEMTYPE State

This state distinguishes tokens that can occur only inside the ItemType production.

Pattern

Transition To State

"$"

VARNAME

<"void" "(" ")">, QName

OPERATOR

"(:"

pushState()

EXPR_COMMENT

<"element" "(">, <"attribute" "(">, <"schema-element" "(">, <"schema-attribute" "(">, <"comment" "(">, <"text" "(">, <"node" "(">, <"document-node" "(">

pushState(OCCURRENCEINDICATOR)

KINDTEST

<"processing-instruction" "(">

pushState(OCCURRENCEINDICATOR)

KINDTESTFORPI

QName, <"item" "(" ")">

"(#"

";", "then", "else"

<"at" URILiteral>

"external", "and", "at", ":=", ",", "div", "=", "except", "eq", "ge", "gt", "le", "lt", "ne", ">=", ">>", ">", "idiv", "intersect", "in", "is", "[", "(", "<=", "<<", "<", "-", "mod", "!=", <"order" "by">, <"stable" "order" "by">, "or", "return", "satisfies", "to", "union", "|", "where"

DEFAULT

<"castable" "as">, <"cast" "as">

SINGLETYPE

<"instance" "of">, <"treat" "as">, "case", "as", <")" "as">

ITEMTYPE

The KINDTEST State

This state is for the psuedo-parameters for the KindTest productions.

Pattern

Transition To State

"{"

")"

"*", QName

<"element" "(">, <"schema-element" "(">

"(:"

The KINDTESTFORPI State

This state is similar to KINDTEST, but recognizes NCNames instead of QNames.

Pattern

Transition To State

")"

popState()

"(:"

pushState()

EXPR_COMMENT

NCName, StringLiteral

KINDTESTFORPI

The CLOSEKINDTEST State

This state is expecting to close a KINDTEST sequence.

Pattern

Transition To State

")"

","

"{"

"?"

"(:"

The OCCURRENCEINDICATOR State

This special state is needed to distinguish occurrence indicators that appear in the SequenceType production. For instance, compare "foo instance of baz*" to "baz*foo". In the first case, the "*" is interpreted as an occurrence indicator, and in the second case, it must be interpreted as a multiplication operator. But, when in the OCCURRENCEINDICATOR state, if anything else other than "?", "*", and "+", those symbols must be interpreted in the OPERATOR state. For instance, this would occur with the expression "foo instance of baz and $x", with the operator "and". This backing up of the lexical characters in order to reset the state, is symbolized by the notation "input_stream.backup(1)". NotOccurrenceIndicator is a special symbol for any character that is not an occurrence indicator.

Pattern

Transition To State

[NotOccurrenceIndicator]

input_stream.backup(1)

"?", "*", "+"

"(:"

The OPTION State

This state is entered in the prolog for an option declaration, and recognizes a QName that transits to a DEFAULT state rather than a OPERATOR state.

Pattern

Transition To State

QName

DEFAULT

The PRAGMA State

This state is entered in a a pragma expression, and recognizes a QName that transits to a PRAGMACONTENTS state rather than a OPERATOR state.

Pattern

Transition To State

QName

PRAGMACONTENTS

The PRAGMACONTENTS State

This state recognizes characters in pragma content, and transits out of this state when a '#)' pattern is recognized.

Pattern

Transition To State

"#)"

OPERATOR

S, Char

(maintain state)

The VARNAME State

This state differentiates variable names from qualified names. This allows only the pattern of a QName to be recognized when otherwise ambiguities could occur.

Pattern

Transition To State

"(:"

The START_TAG State

This state allows attributes in the native XML syntax, and marks the beginning of an element construction. Element constructors also push the current state, popping it at the conclusion of an end tag. In the START_TAG state, the string ">" is recognized as a token which is associated with the transition to the original state.

Pattern

Transition To State

"/>"

popState()

">"

ELEMENT_CONTENT

'"'

QUOT_ATTRIBUTE_CONTENT

"'"

APOS_ATTRIBUTE_CONTENT

"="

START_TAG

S, QName

(maintain state)

The ELEMENT_CONTENT State

This state allows XML-like content, without these characters being misinterpreted as expressions. The character "{" marks a transition to the DEFAULT state, i.e. the start of an embedded expression, and the "}" character pops back to the ELEMENT_CONTENT state. To allow curly braces to be used as character content, a double left or right curly brace is interpreted as a single curly brace character. The string "</" is interpreted as the beginning of an end tag, which is associated with a transition to the END_TAG state.

Pattern

Transition To State

"</"

"{"

"<!--"

"<?"

PROCESSING_INSTRUCTION

"<![CDATA["

"<"

PredefinedEntityRef, CharRef, "{{", "}}"

(maintain state)

The END_TAG State

When the end tag is terminated, the state is popped to the state that was pushed at the start of the corresponding start tag.

Pattern

Transition To State

">"

popState()

S, QName

(maintain state)

The XML_COMMENT State

The "<--" token marks the beginning of an XML Comment, and the "-->" token marks the end. This allows no special interpretation of other characters in this state.

Pattern

Transition To State

"-->"

popState()

(Char - '-'), <'-' (Char - '-')>

XML_COMMENT

The EXPR_COMMENT State

The "(:" token marks the beginning of an expression Comment, and the ":)" token marks the end. This allows no special interpretation of other characters in this state.

Pattern

Transition To State

":)"

"(:"

The PROCESSING_INSTRUCTION State

In this state, only patterns that are legal in a processing instruction name are recognized.

Pattern

Transition To State

PROCESSING_INSTRUCTION_CONTENT

"?>"

popState()

PITarget

PROCESSING_INSTRUCTION

The PROCESSING_INSTRUCTION_CONTENT State

In this state, only characters are that are legal in processing instruction content are recognized.

Pattern

Transition To State

"?>"

popState()

Char

PROCESSING_INSTRUCTION_CONTENT

The CDATA_SECTION State

In this state, only lexemes that are legal in a CDATA section are recognized.

Pattern

Transition To State

"]]>"

popState()

Char

CDATA_SECTION

The QUOT_ATTRIBUTE_CONTENT State

This state allows content legal for attributes. The character "{" marks a transition to the DEFAULT state, i.e. the start of an embedded expression, and the "}" character pops back to the original state. To allow curly braces to be used as character content, a double left or right curly brace is interpreted as a single curly brace character. This state is the same as APOS_ATTRIBUTE_CONTENT, except that apostrophes are allowed without escaping, and an unescaped quote marks the end of the state.

Pattern

Transition To State

'"'

START_TAG

"{"

pushState()

DEFAULT

EscapeQuot, QuotAttrContentChar

QUOT_ATTRIBUTE_CONTENT

PredefinedEntityRef, CharRef, "{{", "}}"

(maintain state)

The APOS_ATTRIBUTE_CONTENT State

This state is the same as QUOT_ATTRIBUTE_CONTENT, except that quotes are allowed, and an unescaped apostrophe marks the end of the state.

Pattern

Transition To State

"'"

START_TAG

"{"

pushState()

DEFAULT

EscapeApos, AposAttrContentChar

APOS_ATTRIBUTE_CONTENT

PredefinedEntityRef, CharRef, "{{", "}}"

(maintain state)

2.1.2 XPath Lexical States

The DEFAULT State

This state is for patterns that occur at the beginning of an expression or subexpression.

Pattern

Transition To State

DecimalLiteral, "..", ".", DoubleLiteral, IntegerLiteral, NotNumber, <NCName ":" "*">, QName, ")", <"*" ":" NCName>, "*", StringLiteral

OPERATOR

"$", <"for" "$">, <"some" "$">, <"every" "$">

VARNAME

<"element" "(">, <"attribute" "(">, <"schema-element" "(">, <"schema-attribute" "(">, <"comment" "(">, <"text" "(">, <"node" "(">, <"document-node" "(">

pushState(OPERATOR)

KINDTEST

<"processing-instruction" "(">

"(:"

",", "(", <QName "(">, <"if" "(">, "-", "+", "//", "/", <"ancestor-or-self" "::">, <"ancestor" "::">, <"attribute" "::">, <"child" "::">, <"descendant-or-self" "::">, <"descendant" "::">, <"following-sibling" "::">, <"following" "::">, <"namespace" "::">, <"parent" "::">, <"preceding-sibling" "::">, <"preceding" "::">, <"self" "::">, "@"

DEFAULT

The OPERATOR State

This state is for patterns that are defined for operators.

Pattern

Transition To State

"then", "else", "and", ",", "div", "=", "except", "eq", "ge", "gt", "le", "lt", "ne", ">=", ">>", ">", "idiv", "intersect", "in", "is", "[", "<=", "<<", "<", "-", "mod", "*", "!=", "or", "+", "return", "satisfies", "//", "/", "to", "union", "|"

DEFAULT

<"castable" "as">, <"cast" "as">

SINGLETYPE

<"instance" "of">, <"treat" "as">

"$", <"for" "$">

"(:"

")", "?", "]"

StringLiteral, NotOperatorKeyword

(maintain state)

The SINGLETYPE State

This state distinguishes tokens that can occur only inside the SingleType production.

Pattern

Transition To State

"(:"

The ITEMTYPE State

This state distinguishes tokens that can occur only inside the ItemType production.

Pattern

Transition To State

<"void" "(" ")">, QName

OPERATOR

"(:"

pushState()

EXPR_COMMENT

<"element" "(">, <"attribute" "(">, <"schema-element" "(">, <"schema-attribute" "(">, <"comment" "(">, <"text" "(">, <"node" "(">, <"document-node" "(">

pushState(OCCURRENCEINDICATOR)

KINDTEST

<"processing-instruction" "(">

pushState(OCCURRENCEINDICATOR)

KINDTESTFORPI

QName, <"item" "(" ")">

OCCURRENCEINDICATOR

"then", "else"

DEFAULT

"and", ",", "div", "=", "except", "eq", "ge", "gt", "le", "lt", "ne", ">=", ">>", ">", "idiv", "intersect", "in", "is", "[", "(", "<=", "<<", "<", "-", "mod", "!=", "or", "return", "satisfies", "to", "union", "|"

DEFAULT

<"castable" "as">, <"cast" "as">

SINGLETYPE

<"instance" "of">, <"treat" "as">

ITEMTYPE

The KINDTEST State

This state is for the psuedo-parameters for the KindTest productions.

Pattern

Transition To State

")"

popState()

"*", QName

CLOSEKINDTEST

<"element" "(">, <"schema-element" "(">

"(:"

The KINDTESTFORPI State

This state is similar to KINDTEST, but recognizes NCNames instead of QNames.

Pattern

Transition To State

")"

popState()

"(:"

pushState()

EXPR_COMMENT

NCName, StringLiteral

KINDTESTFORPI

The CLOSEKINDTEST State

This state is expecting to close a KINDTEST sequence.

Pattern

Transition To State

")"

","

"?"

"(:"

The OCCURRENCEINDICATOR State

Pattern

Transition To State

[NotOccurrenceIndicator]

input_stream.backup(1)

"?", "*", "+"

"(:"

The VARNAME State

This state differentiates variable names from qualified names. This allows only the pattern of a QName to be recognized when otherwise ambiguities could occur.

Pattern

Transition To State

"(:"

The EXPR_COMMENT State

The "(:" token marks the beginning of an expression Comment, and the ":)" token marks the end. This allows no special interpretation of other characters in this state.

Pattern

Transition To State

":)"

"(:"

A XPath/XQuery Grammar

A.1 EBNF

For convenience, the basic EBNF of the XPath and XQuery grammars are reproduced here. For further details please consult the respective specifications.

The following grammars use the same simple Extended Backus-Naur Form (EBNF) notation as [XML 1.0] with the following minor differences. The notation "< ... >" is used to indicate a grouping of terminals that together may help disambiguate the individual symbols. To help readability, this "< ... >" notation is absent in the EBNF in the main body of this document. For further details concerning the EBNF notation used, see Section A.1 EBNF^XQ in [XQuery 1.0: An XML Query Language] or Section A.1 EBNF^XP in [XML Path Language (XPath) 2.0].

Comments on grammar productions are between '/*' and '*/' symbols. A 'gn:' prefix means a 'Grammar Note', and is meant as a clarification for parsing rules, and are explained in Section A.1.1 Grammar Notes^XQ in [XQuery 1.0: An XML Query Language] or Section A.1.1 Grammar Notes^XP in [XML Path Language (XPath) 2.0]. A 'ws:' prefix explains the whitespace rules for the production, the details of which are explained in Section A.2.2 Whitespace Rules^XQ in [XQuery 1.0: An XML Query Language] or Section A.2.2 Whitespace Rules^XP in [XML Path Language (XPath) 2.0].

A.1.1 XPath EBNF

[1]	`XPath`	::=	`Expr`
[2]	`Expr`	::=	`ExprSingle ("," ExprSingle)*`
[3]	`ExprSingle`	::=	`ForExpr \| QuantifiedExpr \| IfExpr \| OrExpr`
[4]	`ForExpr`	::=	`SimpleForClause "return" ExprSingle`
[5]	`SimpleForClause`	::=	`<"for" "$"> VarName "in" ExprSingle ("," "$" VarName "in" ExprSingle)*`
[6]	`QuantifiedExpr`	::=	`(<"some" "$"> \| <"every" "$">) VarName "in" ExprSingle ("," "$" VarName "in" ExprSingle)* "satisfies" ExprSingle`
[7]	`IfExpr`	::=	`<"if" "("> Expr ")" "then" ExprSingle "else" ExprSingle`
[8]	`OrExpr`	::=	`AndExpr ( "or" AndExpr )*`
[9]	`AndExpr`	::=	`ComparisonExpr ( "and" ComparisonExpr )*`
[10]	`ComparisonExpr`	::=	`RangeExpr ( (ValueComp \| GeneralComp \| NodeComp) RangeExpr )?`
[11]	`RangeExpr`	::=	`AdditiveExpr ( "to" AdditiveExpr )?`
[12]	`AdditiveExpr`	::=	`MultiplicativeExpr ( ("+" \| "-") MultiplicativeExpr )*`
[13]	`MultiplicativeExpr`	::=	`UnionExpr ( ("" \| "div" \| "idiv" \| "mod") UnionExpr )`
[14]	`UnionExpr`	::=	`IntersectExceptExpr ( ("union" \| "\|") IntersectExceptExpr )*`
[15]	`IntersectExceptExpr`	::=	`InstanceofExpr ( ("intersect" \| "except") InstanceofExpr )*`
[16]	`InstanceofExpr`	::=	`TreatExpr ( <"instance" "of"> SequenceType )?`
[17]	`TreatExpr`	::=	`CastableExpr ( <"treat" "as"> SequenceType )?`
[18]	`CastableExpr`	::=	`CastExpr ( <"castable" "as"> SingleType )?`
[19]	`CastExpr`	::=	`UnaryExpr ( <"cast" "as"> SingleType )?`
[20]	`UnaryExpr`	::=	`("-" \| "+")* ValueExpr`
[21]	`ValueExpr`	::=	`PathExpr`
[22]	`GeneralComp`	::=	`"=" \| "!=" \| "<" \| "<=" \| ">" \| ">="`	/ gn: lt^XP /
[23]	`ValueComp`	::=	`"eq" \| "ne" \| "lt" \| "le" \| "gt" \| "ge"`
[24]	`NodeComp`	::=	`"is" \| "<<" \| ">>"`
[25]	`PathExpr`	::=	`("/" RelativePathExpr?) \| ("//" RelativePathExpr) \| RelativePathExpr`	/ gn: leading-lone-slash^XP /
[26]	`RelativePathExpr`	::=	`StepExpr (("/" \| "//") StepExpr)*`
[27]	`StepExpr`	::=	`AxisStep \| FilterExpr`
[28]	`AxisStep`	::=	`(ForwardStep \| ReverseStep) PredicateList`
[29]	`ForwardStep`	::=	`(ForwardAxis NodeTest) \| AbbrevForwardStep`
[30]	`ForwardAxis`	::=	`<"child" "::"> \| <"descendant" "::"> \| <"attribute" "::"> \| <"self" "::"> \| <"descendant-or-self" "::"> \| <"following-sibling" "::"> \| <"following" "::"> \| <"namespace" "::">`
[31]	`AbbrevForwardStep`	::=	`"@"? NodeTest`
[32]	`ReverseStep`	::=	`(ReverseAxis NodeTest) \| AbbrevReverseStep`
[33]	`ReverseAxis`	::=	`<"parent" "::"> \| <"ancestor" "::"> \| <"preceding-sibling" "::"> \| <"preceding" "::"> \| <"ancestor-or-self" "::">`
[34]	`AbbrevReverseStep`	::=	`".."`
[35]	`NodeTest`	::=	`KindTest \| NameTest`
[36]	`NameTest`	::=	`QName \| Wildcard`
[37]	`Wildcard`	::=	`"" \| <NCName ":" ""> \| <"*" ":" NCName>`	/ ws: explicit^XP /
[38]	`FilterExpr`	::=	`PrimaryExpr PredicateList`
[39]	`PredicateList`	::=	`Predicate*`
[40]	`Predicate`	::=	`"[" Expr "]"`
[41]	`PrimaryExpr`	::=	`Literal \| VarRef \| ParenthesizedExpr \| ContextItemExpr \| FunctionCall`
[42]	`Literal`	::=	`NumericLiteral \| StringLiteral`
[43]	`NumericLiteral`	::=	`IntegerLiteral \| DecimalLiteral \| DoubleLiteral`
[44]	`VarRef`	::=	`"$" VarName`
[45]	`ParenthesizedExpr`	::=	`"(" Expr? ")"`
[46]	`ContextItemExpr`	::=	`"."`
[47]	`FunctionCall`	::=	`<QName "("> (ExprSingle ("," ExprSingle)*)? ")"`	/ gn: parens^XP /
				/ gn: reserved-function-names^XP /
[48]	`SingleType`	::=	`AtomicType "?"?`
[49]	`SequenceType`	::=	`(ItemType OccurrenceIndicator?) \| <"void" "(" ")">`
[50]	`OccurrenceIndicator`	::=	`"?" \| "*" \| "+"`	/ gn: occurrence-indicators^XP /
[51]	`ItemType`	::=	`AtomicType \| KindTest \| <"item" "(" ")">`
[52]	`AtomicType`	::=	`QName`
[53]	`KindTest`	::=	`DocumentTest \| ElementTest \| AttributeTest \| SchemaElementTest \| SchemaAttributeTest \| PITest \| CommentTest \| TextTest \| AnyKindTest`
[54]	`AnyKindTest`	::=	`<"node" "("> ")"`
[55]	`DocumentTest`	::=	`<"document-node" "("> (ElementTest \| SchemaElementTest)? ")"`
[56]	`TextTest`	::=	`<"text" "("> ")"`
[57]	`CommentTest`	::=	`<"comment" "("> ")"`
[58]	`PITest`	::=	`<"processing-instruction" "("> (NCName \| StringLiteral)? ")"`
[59]	`AttributeTest`	::=	`<"attribute" "("> (AttribNameOrWildcard ("," TypeName)?)? ")"`
[60]	`AttribNameOrWildcard`	::=	`AttributeName \| "*"`
[61]	`SchemaAttributeTest`	::=	`<"schema-attribute" "("> AttributeDeclaration ")"`
[62]	`AttributeDeclaration`	::=	`AttributeName`
[63]	`ElementTest`	::=	`<"element" "("> (ElementNameOrWildcard ("," TypeName "?"?)?)? ")"`
[64]	`ElementNameOrWildcard`	::=	`ElementName \| "*"`
[65]	`SchemaElementTest`	::=	`<"schema-element" "("> ElementDeclaration ")"`
[66]	`ElementDeclaration`	::=	`ElementName`
[67]	`AttributeName`	::=	`QName`
[68]	`ElementName`	::=	`QName`
[69]	`TypeName`	::=	`QName`

[70]	`IntegerLiteral`	::=	`Digits`
[71]	`DecimalLiteral`	::=	`("." Digits) \| (Digits "." [0-9]*)`	/ ws: explicit^XP /
[72]	`DoubleLiteral`	::=	`(("." Digits) \| (Digits ("." [0-9]*)?)) [eE] [+-]? Digits`	/ ws: explicit^XP /
[73]	`StringLiteral`	::=	`('"' (('"' '"') \| [^"])* '"') \| ("'" (("'" "'") \| [^'])* "'")`	/ ws: explicit^XP /
[74]	`VarName`	::=	`QName`
[75]	`Digits`	::=	`[0-9]+`
[76]	`Comment`	::=	`"(:" (CommentContents \| Comment)* ":)"`	/ ws: explicit^XP /
				/ gn: comments^XP /
[77]	`CommentContents`	::=	`(Char+ - (Char* ':)' Char*))`
[78]	`QName`	::=	`[http://www.w3.org/TR/REC-xml-names/#NT-QName]^Names`	/ gn: xml-version^XP /
[79]	`NCName`	::=	`[http://www.w3.org/TR/REC-xml-names/#NT-NCName]^Names`	/ gn: xml-version^XP /
[80]	`Char`	::=	`[http://www.w3.org/TR/REC-xml#NT-Char]^XML`	/ gn: xml-version^XP /

A.1.2 XQuery EBNF

[1]	`Module`	::=	`VersionDecl? (MainModule \| LibraryModule)`
[2]	`VersionDecl`	::=	`<"xquery" "version"> StringLiteral ("encoding" StringLiteral)? Separator`
[3]	`MainModule`	::=	`Prolog QueryBody`
[4]	`LibraryModule`	::=	`ModuleDecl Prolog`
[5]	`ModuleDecl`	::=	`<"module" "namespace"> NCName "=" URILiteral Separator`
[6]	`Prolog`	::=	`((Setter \| Import \| NamespaceDecl \| DefaultNamespaceDecl) Separator)* ((VarDecl \| FunctionDecl \| OptionDecl) Separator)*`
[7]	`Setter`	::=	`BoundarySpaceDecl \| DefaultCollationDecl \| BaseURIDecl \| ConstructionDecl \| OrderingModeDecl \| EmptyOrderDecl \| CopyNamespacesDecl`
[8]	`Import`	::=	`SchemaImport \| ModuleImport`
[9]	`Separator`	::=	`";"`
[10]	`NamespaceDecl`	::=	`<"declare" "namespace"> NCName "=" URILiteral`
[11]	`BoundarySpaceDecl`	::=	`<"declare" "boundary-space"> ("preserve" \| "strip")`
[12]	`DefaultNamespaceDecl`	::=	`(<"declare" "default" "element"> \| <"declare" "default" "function">) "namespace" URILiteral`
[13]	`OptionDecl`	::=	`<"declare" "option"> QName StringLiteral`
[14]	`OrderingModeDecl`	::=	`<"declare" "ordering"> ("ordered" \| "unordered")`
[15]	`EmptyOrderDecl`	::=	`<"declare" "default" "order"> (<"empty" "greatest"> \| <"empty" "least">)`
[16]	`CopyNamespacesDecl`	::=	`<"declare" "copy-namespaces"> PreserveMode "," InheritMode`
[17]	`PreserveMode`	::=	`"preserve" \| "no-preserve"`
[18]	`InheritMode`	::=	`"inherit" \| "no-inherit"`
[19]	`DefaultCollationDecl`	::=	`<"declare" "default" "collation"> URILiteral`
[20]	`BaseURIDecl`	::=	`<"declare" "base-uri"> URILiteral`
[21]	`SchemaImport`	::=	`<"import" "schema"> SchemaPrefix? URILiteral (<"at" URILiteral> ("," URILiteral)*)?`
[22]	`SchemaPrefix`	::=	`("namespace" NCName "=") \| (<"default" "element"> "namespace")`
[23]	`ModuleImport`	::=	`<"import" "module"> ("namespace" NCName "=")? URILiteral (<"at" URILiteral> ("," URILiteral)*)?`
[24]	`VarDecl`	::=	`<"declare" "variable" "$"> VarName TypeDeclaration? ((":=" ExprSingle) \| "external")`
[25]	`ConstructionDecl`	::=	`<"declare" "construction"> ("preserve" \| "strip")`
[26]	`FunctionDecl`	::=	`<"declare" "function"> <QName "("> ParamList? (")" \| (<")" "as"> SequenceType)) (EnclosedExpr \| "external")`
[27]	`ParamList`	::=	`Param ("," Param)*`
[28]	`Param`	::=	`"$" VarName TypeDeclaration?`
[29]	`EnclosedExpr`	::=	`"{" Expr "}"`
[30]	`QueryBody`	::=	`Expr`
[31]	`Expr`	::=	`ExprSingle ("," ExprSingle)*`
[32]	`ExprSingle`	::=	`FLWORExpr \| QuantifiedExpr \| TypeswitchExpr \| IfExpr \| OrExpr`
[33]	`FLWORExpr`	::=	`(ForClause \| LetClause)+ WhereClause? OrderByClause? "return" ExprSingle`
[34]	`ForClause`	::=	`<"for" "$"> VarName TypeDeclaration? PositionalVar? "in" ExprSingle ("," "$" VarName TypeDeclaration? PositionalVar? "in" ExprSingle)*`
[35]	`PositionalVar`	::=	`"at" "$" VarName`
[36]	`LetClause`	::=	`<"let" "$"> VarName TypeDeclaration? ":=" ExprSingle ("," "$" VarName TypeDeclaration? ":=" ExprSingle)*`
[37]	`WhereClause`	::=	`"where" ExprSingle`
[38]	`OrderByClause`	::=	`(<"order" "by"> \| <"stable" "order" "by">) OrderSpecList`
[39]	`OrderSpecList`	::=	`OrderSpec ("," OrderSpec)*`
[40]	`OrderSpec`	::=	`ExprSingle OrderModifier`
[41]	`OrderModifier`	::=	`("ascending" \| "descending")? (<"empty" "greatest"> \| <"empty" "least">)? ("collation" URILiteral)?`
[42]	`QuantifiedExpr`	::=	`(<"some" "$"> \| <"every" "$">) VarName TypeDeclaration? "in" ExprSingle ("," "$" VarName TypeDeclaration? "in" ExprSingle)* "satisfies" ExprSingle`
[43]	`TypeswitchExpr`	::=	`<"typeswitch" "("> Expr ")" CaseClause+ "default" ("$" VarName)? "return" ExprSingle`
[44]	`CaseClause`	::=	`"case" ("$" VarName "as")? SequenceType "return" ExprSingle`
[45]	`IfExpr`	::=	`<"if" "("> Expr ")" "then" ExprSingle "else" ExprSingle`
[46]	`OrExpr`	::=	`AndExpr ( "or" AndExpr )*`
[47]	`AndExpr`	::=	`ComparisonExpr ( "and" ComparisonExpr )*`
[48]	`ComparisonExpr`	::=	`RangeExpr ( (ValueComp \| GeneralComp \| NodeComp) RangeExpr )?`
[49]	`RangeExpr`	::=	`AdditiveExpr ( "to" AdditiveExpr )?`
[50]	`AdditiveExpr`	::=	`MultiplicativeExpr ( ("+" \| "-") MultiplicativeExpr )*`
[51]	`MultiplicativeExpr`	::=	`UnionExpr ( ("" \| "div" \| "idiv" \| "mod") UnionExpr )`
[52]	`UnionExpr`	::=	`IntersectExceptExpr ( ("union" \| "\|") IntersectExceptExpr )*`
[53]	`IntersectExceptExpr`	::=	`InstanceofExpr ( ("intersect" \| "except") InstanceofExpr )*`
[54]	`InstanceofExpr`	::=	`TreatExpr ( <"instance" "of"> SequenceType )?`
[55]	`TreatExpr`	::=	`CastableExpr ( <"treat" "as"> SequenceType )?`
[56]	`CastableExpr`	::=	`CastExpr ( <"castable" "as"> SingleType )?`
[57]	`CastExpr`	::=	`UnaryExpr ( <"cast" "as"> SingleType )?`
[58]	`UnaryExpr`	::=	`("-" \| "+")* ValueExpr`
[59]	`ValueExpr`	::=	`ValidateExpr \| PathExpr \| ExtensionExpr`
[60]	`GeneralComp`	::=	`"=" \| "!=" \| "<" \| "<=" \| ">" \| ">="`	/ gn: lt^XQ /
[61]	`ValueComp`	::=	`"eq" \| "ne" \| "lt" \| "le" \| "gt" \| "ge"`
[62]	`NodeComp`	::=	`"is" \| "<<" \| ">>"`
[63]	`ValidateExpr`	::=	`(<"validate" "{"> \| (<"validate" ValidationMode> "{")) Expr "}"`
[64]	`ExtensionExpr`	::=	`Pragma+ "{" Expr? "}"`
[65]	`Pragma`	::=	`"(#" S? QName PragmaContents "#)"`	/ ws: explicit^XQ /
[66]	`PragmaContents`	::=	`(Char* - (Char* '#)' Char*))`
[67]	`PathExpr`	::=	`("/" RelativePathExpr?) \| ("//" RelativePathExpr) \| RelativePathExpr`	/ gn: leading-lone-slash^XQ /
[68]	`RelativePathExpr`	::=	`StepExpr (("/" \| "//") StepExpr)*`
[69]	`StepExpr`	::=	`AxisStep \| FilterExpr`
[70]	`AxisStep`	::=	`(ForwardStep \| ReverseStep) PredicateList`
[71]	`ForwardStep`	::=	`(ForwardAxis NodeTest) \| AbbrevForwardStep`
[72]	`ForwardAxis`	::=	`<"child" "::"> \| <"descendant" "::"> \| <"attribute" "::"> \| <"self" "::"> \| <"descendant-or-self" "::"> \| <"following-sibling" "::"> \| <"following" "::">`
[73]	`AbbrevForwardStep`	::=	`"@"? NodeTest`
[74]	`ReverseStep`	::=	`(ReverseAxis NodeTest) \| AbbrevReverseStep`
[75]	`ReverseAxis`	::=	`<"parent" "::"> \| <"ancestor" "::"> \| <"preceding-sibling" "::"> \| <"preceding" "::"> \| <"ancestor-or-self" "::">`
[76]	`AbbrevReverseStep`	::=	`".."`
[77]	`NodeTest`	::=	`KindTest \| NameTest`
[78]	`NameTest`	::=	`QName \| Wildcard`
[79]	`Wildcard`	::=	`"" \| <NCName ":" ""> \| <"*" ":" NCName>`	/ ws: explicit^XQ /
[80]	`FilterExpr`	::=	`PrimaryExpr PredicateList`
[81]	`PredicateList`	::=	`Predicate*`
[82]	`Predicate`	::=	`"[" Expr "]"`
[83]	`PrimaryExpr`	::=	`Literal \| VarRef \| ParenthesizedExpr \| ContextItemExpr \| FunctionCall \| Constructor \| OrderedExpr \| UnorderedExpr`
[84]	`Literal`	::=	`NumericLiteral \| StringLiteral`
[85]	`NumericLiteral`	::=	`IntegerLiteral \| DecimalLiteral \| DoubleLiteral`
[86]	`VarRef`	::=	`"$" VarName`
[87]	`ParenthesizedExpr`	::=	`"(" Expr? ")"`
[88]	`ContextItemExpr`	::=	`"."`
[89]	`OrderedExpr`	::=	`<"ordered" "{"> Expr "}"`
[90]	`UnorderedExpr`	::=	`<"unordered" "{"> Expr "}"`
[91]	`FunctionCall`	::=	`<QName "("> (ExprSingle ("," ExprSingle)*)? ")"`	/ gn: parens^XQ /
				/ gn: reserved-function-names^XQ /
[92]	`Constructor`	::=	`DirectConstructor \| ComputedConstructor`
[93]	`DirectConstructor`	::=	`DirElemConstructor \| DirCommentConstructor \| DirPIConstructor`
[94]	`DirElemConstructor`	::=	`"<" QName DirAttributeList ("/>" \| (">" DirElemContent* "</" QName S? ">"))`	/ ws: explicit^XQ /
				/ gn: lt^XQ /
[95]	`DirAttributeList`	::=	`(S (QName S? "=" S? DirAttributeValue)?)*`	/ ws: explicit^XQ /
[96]	`DirAttributeValue`	::=	`('"' (EscapeQuot \| QuotAttrValueContent)* '"') \| ("'" (EscapeApos \| AposAttrValueContent)* "'")`	/ ws: explicit^XQ /
[97]	`QuotAttrValueContent`	::=	`QuotAttrContentChar \| CommonContent`
[98]	`AposAttrValueContent`	::=	`AposAttrContentChar \| CommonContent`
[99]	`DirElemContent`	::=	`DirectConstructor \| ElementContentChar \| CDataSection \| CommonContent`
[100]	`CommonContent`	::=	`PredefinedEntityRef \| CharRef \| "{{" \| "}}" \| EnclosedExpr`
[101]	`DirCommentConstructor`	::=	`"<!--" DirCommentContents "-->"`	/ ws: explicit^XQ /
[102]	`DirCommentContents`	::=	`((Char - '-') \| <'-' (Char - '-')>)*`	/ ws: explicit^XQ /
[103]	`DirPIConstructor`	::=	`"<?" PITarget (S DirPIContents)? "?>"`	/ ws: explicit^XQ /
[104]	`DirPIContents`	::=	`(Char* - (Char* '?>' Char*))`	/ ws: explicit^XQ /
[105]	`CDataSection`	::=	`"<![CDATA[" CDataSectionContents "]]>"`	/ ws: explicit^XQ /
[106]	`CDataSectionContents`	::=	`(Char* - (Char* ']]>' Char*))`	/ ws: explicit^XQ /
[107]	`ComputedConstructor`	::=	`CompDocConstructor \| CompElemConstructor \| CompAttrConstructor \| CompTextConstructor \| CompCommentConstructor \| CompPIConstructor`
[108]	`CompDocConstructor`	::=	`<"document" "{"> Expr "}"`
[109]	`CompElemConstructor`	::=	`(<"element" QName "{"> \| (<"element" "{"> Expr "}" "{")) ContentExpr? "}"`
[110]	`ContentExpr`	::=	`Expr`
[111]	`CompAttrConstructor`	::=	`(<"attribute" QName "{"> \| (<"attribute" "{"> Expr "}" "{")) Expr? "}"`
[112]	`CompTextConstructor`	::=	`<"text" "{"> Expr "}"`
[113]	`CompCommentConstructor`	::=	`<"comment" "{"> Expr "}"`
[114]	`CompPIConstructor`	::=	`(<"processing-instruction" NCName "{"> \| (<"processing-instruction" "{"> Expr "}" "{")) Expr? "}"`
[115]	`SingleType`	::=	`AtomicType "?"?`
[116]	`TypeDeclaration`	::=	`"as" SequenceType`
[117]	`SequenceType`	::=	`(ItemType OccurrenceIndicator?) \| <"void" "(" ")">`
[118]	`OccurrenceIndicator`	::=	`"?" \| "*" \| "+"`	/ gn: occurrence-indicators^XQ /
[119]	`ItemType`	::=	`AtomicType \| KindTest \| <"item" "(" ")">`
[120]	`AtomicType`	::=	`QName`
[121]	`KindTest`	::=	`DocumentTest \| ElementTest \| AttributeTest \| SchemaElementTest \| SchemaAttributeTest \| PITest \| CommentTest \| TextTest \| AnyKindTest`
[122]	`AnyKindTest`	::=	`<"node" "("> ")"`
[123]	`DocumentTest`	::=	`<"document-node" "("> (ElementTest \| SchemaElementTest)? ")"`
[124]	`TextTest`	::=	`<"text" "("> ")"`
[125]	`CommentTest`	::=	`<"comment" "("> ")"`
[126]	`PITest`	::=	`<"processing-instruction" "("> (NCName \| StringLiteral)? ")"`
[127]	`AttributeTest`	::=	`<"attribute" "("> (AttribNameOrWildcard ("," TypeName)?)? ")"`
[128]	`AttribNameOrWildcard`	::=	`AttributeName \| "*"`
[129]	`SchemaAttributeTest`	::=	`<"schema-attribute" "("> AttributeDeclaration ")"`
[130]	`AttributeDeclaration`	::=	`AttributeName`
[131]	`ElementTest`	::=	`<"element" "("> (ElementNameOrWildcard ("," TypeName "?"?)?)? ")"`
[132]	`ElementNameOrWildcard`	::=	`ElementName \| "*"`
[133]	`SchemaElementTest`	::=	`<"schema-element" "("> ElementDeclaration ")"`
[134]	`ElementDeclaration`	::=	`ElementName`
[135]	`AttributeName`	::=	`QName`
[136]	`ElementName`	::=	`QName`
[137]	`TypeName`	::=	`QName`

[138]	`IntegerLiteral`	::=	`Digits`
[139]	`DecimalLiteral`	::=	`("." Digits) \| (Digits "." [0-9]*)`	/ ws: explicit^XQ /
[140]	`DoubleLiteral`	::=	`(("." Digits) \| (Digits ("." [0-9]*)?)) [eE] [+-]? Digits`	/ ws: explicit^XQ /
[141]	`URILiteral`	::=	`StringLiteral`
[142]	`StringLiteral`	::=	`('"' (PredefinedEntityRef \| CharRef \| ('"' '"') \| [^"&])* '"') \| ("'" (PredefinedEntityRef \| CharRef \| ("'" "'") \| [^'&])* "'")`	/ ws: explicit^XQ /
[143]	`PITarget`	::=	`[http://www.w3.org/TR/REC-xml#NT-PITarget]^XML`	/ gn: xml-version^XQ /
[144]	`VarName`	::=	`QName`
[145]	`ValidationMode`	::=	`"lax" \| "strict"`
[146]	`Digits`	::=	`[0-9]+`
[147]	`PredefinedEntityRef`	::=	`"&" ("lt" \| "gt" \| "amp" \| "quot" \| "apos") ";"`	/ ws: explicit^XQ /
[148]	`CharRef`	::=	`[http://www.w3.org/TR/REC-xml#NT-CharRef]^XML`	/ gn: xml-version^XQ /
[149]	`EscapeQuot`	::=	`'""'`
[150]	`EscapeApos`	::=	`"''"`
[151]	`ElementContentChar`	::=	`Char - [{}<&]`
[152]	`QuotAttrContentChar`	::=	`Char - ["{}<&]`
[153]	`AposAttrContentChar`	::=	`Char - ['{}<&]`
[154]	`Comment`	::=	`"(:" (CommentContents \| Comment)* ":)"`	/ ws: explicit^XQ /
				/ gn: comments^XQ /
[155]	`CommentContents`	::=	`(Char+ - (Char* ':)' Char*))`
[156]	`QName`	::=	`[http://www.w3.org/TR/REC-xml-names/#NT-QName]^Names`	/ gn: xml-version^XQ /
[157]	`NCName`	::=	`[http://www.w3.org/TR/REC-xml-names/#NT-NCName]^Names`	/ gn: xml-version^XQ /
[158]	`S`	::=	`[http://www.w3.org/TR/REC-xml#NT-S]^XML`	/ gn: xml-version^XQ /
[159]	`Char`	::=	`[http://www.w3.org/TR/REC-xml#NT-Char]^XML`	/ gn: xml-version^XQ /

B References

B.1 Main References

XML 1.0: World Wide Web Consortium. Extensible Markup Language (XML) 1.0. W3C Recommendation. See http://www.w3.org/TR/2000/REC-xml-20001006
XML Path Language (XPath) 2.0: XML Path Language (XPath) 2.0, Anders Berglund, Mary F. Fernández, Scott Boag, et. al., Editors. World Wide Web Consortium, 4 Apr 2005. This version is http://www.w3.org/TR/2005/WD-xpath20-20050404. The latest version is available at http://www.w3.org/TR/xpath20.
XQuery 1.0: An XML Query Language: XQuery 1.0: An XML Query Language, Scott Boag, Don Chamberlin, Mary F. Fernández, et. al., Editors. World Wide Web Consortium, 4 Apr 2005. This version is http://www.w3.org/TR/2005/WD-xquery-20050404/. The latest version is available at http://www.w3.org/TR/xquery.

C Glossary (Non-Normative)

fuzzy token scanner: Fuzzy token scanner. Using this strategy, a typical simple lexical scanner is implemented, but the meaning of the tokens are often left somewhat ambiguous. Thus, the word div might be assigned the symbol QNameOrKeyword. The words can be disambiguated during the context sensitive parse process. This approach would probably be a hand-constructed, Top-down parser.
LALR Parser: LALR Parser (Look Ahead Left to right, Rightmost derivation, hence LALR).
lexical action: A lexical action is an action that occurs as an side-effect of a pattern recognition.
lexical scanner: Lexical scanner, which assigns symbols known as tokens to units in the input, in essence recognizing the basic words and punctuation of the language. Typically, the lexer uses regular expressions as a specification for how to do this tokenization. The rules that govern the lexical structure of the language, sometimes known as microsyntax, should be simple, regular, and as context-free as possible.
lexical state: A lexical state is a condition where a defined set of patterns are recognized.
LL Parser: LL Parser (Left to right, Leftmost derivation, hence LL).
long token: Long tokens. Using this approach, declare namespace would be considered a single token.
parser: Parser, which arranges the tokens into a hierarchical structure. Typically, the parser uses a BNF or variant description (EBNF in the case of XPath/XQuery) as a specification for this parsing.
Scan-while-parse scanner: Scan-while-parse scanner. Using this strategy, the parser would evaluate the character stream into tokens, while at the same time branching to the respective hierarchical context. Thus the words can be discovered in context. Complex parsers are typically not built in this fashion, but, given the issue of context sensitivity, this may be a reasonable approach. This would probably be implemented as a hand-constructed recursive-descent parser.
short token: Short tokens. Using this approach, declare namespace would be considered two tokens, the exact symbols assigned based on lexical look-ahead (and look-behind).
state-driven scanner: State-driven scanner. This approach recognizes sets of words only in certain states, each word or punctuation possibly transitioning to a next state. In other words, it implements lexical recognition as transition tables for a push-down automata.
two-pass scanner: Two-pass scanner. This is the same as the Fuzzy token scanner, but the ambiguous words can be disambiguated by a second scan, using some rules (probably expressed as a context-free grammar) that reflect the basic structure of the EBNF (call this the three-pass approach). The three-pass approach could conceivably be implemented via a parser constructor program, with the addition of a custom disambiguator in between the lexer and parser. This method could be used for either Top-down or Bottom-up techniques.

End Notes

[1]: XQuery does not include the ForExpr production and <"namespace" "::"> axis.

Building a Tokenizer for XPath or XQuery

W3C Working Draft 4 April 2005

Abstract

Status of this Document

Table of Contents

Appendices

1 Introduction

1.1 Structure of Parsers

1.2 Lexical Scanning Strategies

1.2.1 Token Granularity

1.3 Parser Strategies

2 Lexical Processing

2.1 Lexical States

2.1.1 XQuery Lexical States

2.1.2 XPath Lexical States

A XPath/XQuery Grammar

A.1 EBNF

A.1.1 XPath EBNF

A.1.2 XQuery EBNF

B References

B.1 Main References

C Glossary (Non-Normative)

End Notes