A XQuery Grammar

A.1 EBNF

The following grammar uses the same simple Extended Backus-Naur Form (EBNF) notation as [XML 1.0] with the following minor differences. The notation "< ... >" is used to indicate a grouping of terminals that together may help disambiguate the individual symbols. To help readability, this "< ... >" notation is absent in the EBNF in the main body of this document. This appendix is the normative version of the EBNF.

Comments on grammar productions are between '/*' and '*/' symbols - please note that these comments are normative. A 'gn:' prefix means a 'Grammar Note', and is meant as a clarification for parsing rules, and is explained in A.1.1 Grammar Notes . A 'ws:' prefix explains the whitespace rules for the production, the details of which are explained in A.2.2 Whitespace Rules

[1]    Module    ::=    VersionDecl? (MainModule | LibraryModule)
[2]    VersionDecl    ::=    <"xquery" "version"> StringLiteral ("encoding" StringLiteral)? Separator
[3]    MainModule    ::=    Prolog QueryBody
[4]    LibraryModule    ::=    ModuleDecl Prolog
[5]    ModuleDecl    ::=    <"module" "namespace"> NCName "=" StringLiteral URILiteral Separator
[6]    Prolog    ::=    (Setter Separator)* ((Import | NamespaceDecl | DefaultNamespaceDecl) Separator)* ((VarDecl | FunctionDecl | OptionDecl) Separator)*
[7]    Setter    ::=    XMLSpaceDeclBoundarySpaceDecl | DefaultCollationDecl | BaseURIDecl | ConstructionDecl | OrderingModeDecl | EmptyOrderingDecl | InheritNamespacesDeclEmptyOrderDecl | CopyNamespacesDecl
[8]    Import    ::=    SchemaImport | ModuleImport
[9]    Separator    ::=    ";"
[10]    NamespaceDecl    ::=    <"declare" "namespace"> NCName "=" StringLiteral URILiteral
[11]    XMLSpaceDecl BoundarySpaceDecl    ::=    <"declare" "xmlspaceboundary-space"> ("preserve" | "strip")
[12]    DefaultNamespaceDecl    ::=    (<"declare" "default" "element"> | <"declare" "default" "function">) "namespace" StringLiteral URILiteral
[13]    OptionDecl    ::=    <"declare" "option"> QName StringLiteral
[14]    OrderingModeDecl    ::=    <"declare" "ordering"> ("ordered" | "unordered")
[14[15]    EmptyOrderingDecl EmptyOrderDecl    ::=    <"declare" "default" "order"> (<"empty" "greatest"> | <"empty" "least">)
[15[16]    InheritNamespacesDecl CopyNamespacesDecl    ::=    <"declare" "inherit-namespaces"> ("yes" | "no") copy-namespaces"> PreserveMode "," InheritMode
[17]    PreserveMode    ::=    "preserve" | "no-preserve"
[16[18]    InheritMode    ::=    "inherit" | "no-inherit"
[19]    DefaultCollationDecl    ::=    <"declare" "default" "collation"> StringLiteralURILiteral
[1720]    BaseURIDecl    ::=    <"declare" "base-uri"> StringLiteralURILiteral
[1821]    SchemaImport    ::=    <"import" "schema"> SchemaPrefix? StringLiteral URILiteral (<"at" StringLiteralURILiteral> ("," StringLiteralURILiteral)*)?
[1922]    SchemaPrefix    ::=    ("namespace" NCName "=") | (<"default" "element"> "namespace")
[2023]    ModuleImport    ::=    <"import" "module"> ("namespace" NCName "=")? StringLiteral URILiteral (<"at" StringLiteralURILiteral> ("," StringLiteralURILiteral)*)?
[2124]    VarDecl    ::=    <"declare" "variable" "$"> VarName TypeDeclaration? ((":=" ExprSingle) | "external")
[2225]    ConstructionDecl    ::=    <"declare" "construction"> ("preserve" | "strip")
[2326]    FunctionDecl    ::=    <"declare" "function"> <QName "("> ParamList? (")" | (<")" "as"> SequenceType)) (EnclosedExpr | "external")
[2427]    ParamList    ::=    Param ("," Param)*
[2528]    Param    ::=    "$" VarName TypeDeclaration?
[2629]    EnclosedExpr    ::=    "{" Expr "}"
[2730]    QueryBody    ::=    Expr
[2831]    Expr    ::=    ExprSingle ("," ExprSingle)*
[2932]    ExprSingle    ::=    FLWORExpr
| QuantifiedExpr
| TypeswitchExpr
| IfExpr
| OrExpr
[3033]    FLWORExpr    ::=    (ForClause | LetClause)+ WhereClause? OrderByClause? "return" ExprSingle
[3134]    ForClause    ::=    <"for" "$"> VarName TypeDeclaration? PositionalVar? "in" ExprSingle ("," "$" VarName TypeDeclaration? PositionalVar? "in" ExprSingle)*
[3235]    PositionalVar    ::=    "at" "$" VarName
[3336]    LetClause    ::=    <"let" "$"> VarName TypeDeclaration? ":=" ExprSingle ("," "$" VarName TypeDeclaration? ":=" ExprSingle)*
[3437]    WhereClause    ::=    "where" ExprSingle
[3538]    OrderByClause    ::=    (<"order" "by"> | <"stable" "order" "by">) OrderSpecList
[3639]    OrderSpecList    ::=    OrderSpec ("," OrderSpec)*
[3740]    OrderSpec    ::=    ExprSingle OrderModifier
[3841]    OrderModifier    ::=    ("ascending" | "descending")? (<"empty" "greatest"> | <"empty" "least">)? ("collation" StringLiteralURILiteral)?
[3942]    QuantifiedExpr    ::=    (<"some" "$"> | <"every" "$">) VarName TypeDeclaration? "in" ExprSingle ("," "$" VarName TypeDeclaration? "in" ExprSingle)* "satisfies" ExprSingle
[4043]    TypeswitchExpr    ::=    <"typeswitch" "("> Expr ")" CaseClause+ "default" ("$" VarName)? "return" ExprSingle
[4144]    CaseClause    ::=    "case" ("$" VarName "as")? SequenceType "return" ExprSingle
[4245]    IfExpr    ::=    <"if" "("> Expr ")" "then" ExprSingle "else" ExprSingle
[4346]    OrExpr    ::=    AndExpr ( "or" AndExpr )*
[4447]    AndExpr    ::=    ComparisonExpr ( "and" ComparisonExpr )*
[4548]    ComparisonExpr    ::=    RangeExpr ( (ValueComp
| GeneralComp
| NodeComp) RangeExpr )?
[4649]    RangeExpr    ::=    AdditiveExpr ( "to" AdditiveExpr )?
[4750]    AdditiveExpr    ::=    MultiplicativeExpr ( ("+" | "-") MultiplicativeExpr )*
[4851]    MultiplicativeExpr    ::=    UnionExpr ( ("*" | "div" | "idiv" | "mod") UnionExpr )*
[4952]    UnionExpr    ::=    IntersectExceptExpr ( ("union" | "|") IntersectExceptExpr )*
[5053]    IntersectExceptExpr    ::=    InstanceofExpr ( ("intersect" | "except") InstanceofExpr )*
[5154]    InstanceofExpr    ::=    TreatExpr ( <"instance" "of"> SequenceType )?
[5255]    TreatExpr    ::=    CastableExpr ( <"treat" "as"> SequenceType )?
[5356]    CastableExpr    ::=    CastExpr ( <"castable" "as"> SingleType )?
[5457]    CastExpr    ::=    UnaryExpr ( <"cast" "as"> SingleType )?
[5558]    UnaryExpr    ::=    ("-" | "+")* ValueExpr
[5659]    ValueExpr    ::=    ValidateExpr | PathExpr | ExtensionExpr
[5760]    GeneralComp    ::=    "=" | "!=" | "<" | "<=" | ">" | ">=" /* gn: lt */
[5861]    ValueComp    ::=    "eq" | "ne" | "lt" | "le" | "gt" | "ge"
[5962]    NodeComp    ::=    "is" | "<<" | ">>"
[6063]    ValidateExpr    ::=    (<"validate" "{"> | (<"validate" ValidationMode> "{")) Expr "}" /* gn: validate */
[61[64]    ExtensionExpr    ::=    Pragma+ "{" Expr? "}"
[65]    Pragma    ::=    "(#" S? QName PragmaContents "#)" /* ws: explicit */
[66]    PragmaContents    ::=    (Char* - (Char* '#)' Char*))
[67]    PathExpr    ::=    ("/" RelativePathExpr?)
| ("//" RelativePathExpr)
| RelativePathExpr
/* gn: leading-lone-slash */
[6268]    RelativePathExpr    ::=    StepExpr (("/" | "//") StepExpr)*
[6369]    StepExpr    ::=    AxisStep | FilterExpr
[6470]    AxisStep    ::=    (ForwardStep | ReverseStep) PredicateList
[6571]    ForwardStep    ::=    (ForwardAxis NodeTest) | AbbrevForwardStep
[6672]    ForwardAxis    ::=    <"child" "::">
| <"descendant" "::">
| <"attribute" "::">
| <"self" "::">
| <"descendant-or-self" "::">
| <"following-sibling" "::">
| <"following" "::">
[6773]    AbbrevForwardStep    ::=    "@"? NodeTest
[6874]    ReverseStep    ::=    (ReverseAxis NodeTest) | AbbrevReverseStep
[6975]    ReverseAxis    ::=    <"parent" "::">
| <"ancestor" "::">
| <"preceding-sibling" "::">
| <"preceding" "::">
| <"ancestor-or-self" "::">
[7076]    AbbrevReverseStep    ::=    ".."
[7177]    NodeTest    ::=    KindTest | NameTest
[7278]    NameTest    ::=    QName | Wildcard
[7379]    Wildcard    ::=    "*"
| <NCName ":" "*">
| <"*" ":" NCName>
/* ws: explicit */
[7480]    FilterExpr    ::=    PrimaryExpr PredicateList
[7581]    PredicateList    ::=    Predicate*
[7682]    Predicate    ::=    "[" Expr "]"
[7783]    PrimaryExpr    ::=    Literal | VarRef | ParenthesizedExpr | ContextItemExpr | FunctionCall | Constructor | OrderedExpr | UnorderedExpr
[7884]    Literal    ::=    NumericLiteral | StringLiteral
[7985]    NumericLiteral    ::=    IntegerLiteral | DecimalLiteral | DoubleLiteral
[8086]    VarRef    ::=    "$" VarName
[8187]    ParenthesizedExpr    ::=    "(" Expr? ")"
[8288]    ContextItemExpr    ::=    "."
[8389]    OrderedExpr    ::=    <"ordered" "{"> Expr "}"
[8490]    UnorderedExpr    ::=    <"unordered" "{"> Expr "}"
[8591]    FunctionCall    ::=    <QName "("> (ExprSingle ("," ExprSingle)*)? ")" /* gn: parens */
/* gn: reserved-function-names */
[8692]    Constructor    ::=    DirectConstructor
| ComputedConstructor
[8793]    DirectConstructor    ::=    DirElemConstructor
| DirCommentConstructor
| DirPIConstructor
[8894]    DirElemConstructor    ::=    "<" QName DirAttributeList ("/>" | (">" DirElemContent* "</" QName S? ">")) /* ws: explicit */
/* gn: lt */
[8995]    DirAttributeList    ::=    (S (QName S? "=" S? DirAttributeValue)?)* /* ws: explicit */
[9096]    DirAttributeValue    ::=    ('"' (EscapeQuot | QuotAttrValueContent)* '"')
| ("'" (EscapeApos | AposAttrValueContent)* "'")
/* ws: explicit */
[9197]    QuotAttrValueContent    ::=    QuotAttrContentChar
| CommonContent
[9298]    AposAttrValueContent    ::=    AposAttrContentChar
| CommonContent
[9399]    DirElemContent    ::=    DirectConstructor
| ElementContentChar
| CDataSection
| CommonContent
[94100]    CommonContent    ::=    PredefinedEntityRef | CharRef | "{{" | "}}" | EnclosedExpr
[95101]    DirCommentConstructor    ::=    "<!--" DirCommentContents "-->" /* ws: explicit */
[96102]    DirCommentContents    ::=    ((Char - '-') | <'-' (Char - '-')>)* /* ws: explicit */
[97103]    DirPIConstructor    ::=    "<?" PITarget (S DirPIContents)? "?>" /* ws: explicit */
[98104]    DirPIContents    ::=    (Char* - (Char* '?>' Char*)) /* ws: explicit */
[99105]    CDataSection    ::=    "<![CDATA[" CDataSectionContents "]]>" /* ws: explicit */
[100106]    CDataSectionContents    ::=    (Char* - (Char* ']]>' Char*)) /* ws: explicit */
[101107]    ComputedConstructor    ::=    CompDocConstructor
| CompElemConstructor
| CompAttrConstructor
| CompTextConstructor
| CompCommentConstructor
| CompPIConstructor
[102108]    CompDocConstructor    ::=    <"document" "{"> Expr "}"
[103109]    CompElemConstructor    ::=    (<"element" QName "{"> | (<"element" "{"> Expr "}" "{")) ContentExpr? "}"
[10410]    ContentExpr    ::=    Expr
[10511]    CompAttrConstructor    ::=    (<"attribute" QName "{"> | (<"attribute" "{"> Expr "}" "{")) Expr? "}"
[10612]    CompTextConstructor    ::=    <"text" "{"> Expr "}"
[10713]    CompCommentConstructor    ::=    <"comment" "{"> Expr "}"
[10814]    CompPIConstructor    ::=    (<"processing-instruction" NCName "{"> | (<"processing-instruction" "{"> Expr "}" "{")) Expr? "}"
[10915]    SingleType    ::=    AtomicType "?"?
[110116]    TypeDeclaration    ::=    "as" SequenceType
[111117]    SequenceType    ::=    (ItemType OccurrenceIndicator?)
| <"empty" "(" ")">
[112118]    OccurrenceIndicator    ::=    "?" | "*" | "+" /* gn: occurrence-indicators */
[113119]    ItemType    ::=    AtomicType | KindTest | <"item" "(" ")">
[11420]    AtomicType    ::=    QName
[11521]    KindTest    ::=    DocumentTest
| ElementTest
| AttributeTest
| SchemaElementTest
| SchemaAttributeTest
| PITest
| CommentTest
| TextTest
| AnyKindTest
[11622]    AnyKindTest    ::=    <"node" "("> ")"
[11723]    DocumentTest    ::=    <"document-node" "("> (ElementTest | SchemaElementTest)? ")"
[11824]    TextTest    ::=    <"text" "("> ")"
[11925]    CommentTest    ::=    <"comment" "("> ")"
[120126]    PITest    ::=    <"processing-instruction" "("> (NCName | StringLiteral)? ")"
[121127]    AttributeTest    ::=    <"attribute" "("> (AttribNameOrWildcard ("," TypeName)?)? ")"
[122128]    AttribNameOrWildcard    ::=    AttributeName | "*"
[123129]    SchemaAttributeTest    ::=    <"schema-attribute" "("> AttributeDeclaration ")"
[12430]    AttributeDeclaration    ::=    AttributeName
[12531]    ElementTest    ::=    <"element" "("> (ElementNameOrWildcard ("," TypeName "?"?)?)? ")"
[12632]    ElementNameOrWildcard    ::=    ElementName | "*"
[12733]    SchemaElementTest    ::=    <"schema-element" "("> ElementDeclaration ")"
[12834]    ElementDeclaration    ::=    ElementName
[12935]    AttributeName    ::=    QName
[130136]    ElementName    ::=    QName
[131137]    TypeName    ::=    QName
[132138]    IntegerLiteral    ::=    Digits
[133139]    DecimalLiteral    ::=    ("." Digits) | (Digits "." [0-9]*) /* ws: explicit */
[13440]    DoubleLiteral    ::=    (("." Digits) | (Digits ("." [0-9]*)?)) [eE] [+-]? Digits /* ws: explicit */
[135[141]    URILiteral    ::=    StringLiteral
[142]    StringLiteral    ::=    ('"' (PredefinedEntityRef | CharRef | ('"' '"') | [^"&])* '"') | ("'" (PredefinedEntityRef | CharRef | ("'" "'") | [^'&])* "'") /* ws: explicit */
[13643]    PITarget    ::=    [http://www.w3.org/TR/REC-xml#NT-PITarget] /* gn: xml-version */
[13744]    VarName    ::=    QName
[13845]    ValidationMode    ::=    "lax" | "strict"
[13946]    Digits    ::=    [0-9]+
[140147]    PredefinedEntityRef    ::=    "&" ("lt" | "gt" | "amp" | "quot" | "apos") ";" /* ws: explicit */
[141148]    CharRef    ::=    [http://www.w3.org/TR/REC-xml#NT-CharRef] /* gn: xml-version */
[142149]    EscapeQuot    ::=    '""'
[14350]    EscapeApos    ::=    "''"
[14451]    ElementContentChar    ::=    Char - [{}<&]
[14552]    QuotAttrContentChar    ::=    Char - ["{}<&]
[14653]    AposAttrContentChar    ::=    Char - ['{}<&]
[147[154]    Pragma    ::=    "(::" S? "pragma" S QName (S ExtensionContents)? "::)" /* ws: explicit */
[148]    MUExtension    ::=    "(::" S? "extension" S QName (S ExtensionContents)? "::)" /* ws: explicit */
[149]    ExtensionContents    ::=    (Char* - (Char* '::)' Char*))
[150]    Comment    ::=    "(:" (CommentContents | Comment)* ":)" /* ws: explicit */
/* gn: comments */
[151155]    CommentContents    ::=    (Char+ - (Char* ':)' Char*))
[152156]    QName    ::=    [http://www.w3.org/TR/REC-xml-names/#NT-QName] /* gn: xml-version */
[153157]    NCName    ::=    [http://www.w3.org/TR/REC-xml-names/#NT-NCName] /* gn: xml-version */
[154158]    S    ::=    [http://www.w3.org/TR/REC-xml#NT-S] /* gn: xml-version */
[155159]    Char    ::=    [http://www.w3.org/TR/REC-xml#NT-Char] /* gn: xml-version */

The following term definitions will be helpful in defining precisely this exposition.

[Definition: Each rule in the grammar defines one symbol , in the form]

symbol ::= expression

[Definition: A terminal is a single unit of the grammar that can not be further subdivided, and is specified in the EBNF by a character or characters in quotes, or a regular expression.] The following expressions are used to match strings of one or more characters in a terminal:

#xN

where N is a hexadecimal integer, the expression matches the character in [ISO/IEC 10646] whose canonical (UCS-4) code value, when interpreted as an unsigned binary number, has the value indicated. The number of leading zeros in the #xN form is insignificant; the number of leading zeros in the corresponding code value is governed by the character encoding in use and is not significant for XML.

[a-zA-Z], [#xN-#xN]

matches any Char with a value in the range(s) indicated (inclusive).

[abc], [#xN#xN#xN]

matches any Char with a value among the characters enumerated. Enumerations and ranges can be mixed in one set of brackets.

[^a-z], [^#xN-#xN]

matches any Char with a value outside the range indicated.

[^abc], [^#xN#xN#xN]

matches any Char with a value not among the characters given. Enumerations and ranges of forbidden values can be mixed in one set of brackets.

"string"

matches a literal string matching that given inside the double quotes.

'string'

matches a literal string matching that given inside the single quotes.

[http://www.w3.org/TR/REC-example/#NT-Example]

matches a production defined in the external specification as per the provided reference. For the purposes of this secification, the entire unit is defined as a terminal.

[Definition: A production combines symbols to form more complex patterns. ] The following productions serve as examples, where A and B represent simple expressions:

(expression)

expression is treated as a unit and may be combined as described in this list.

A?

matches A or nothing; optional A .

A B

matches A followed by B . This operator has higher precedence than alternation; thus A B | C D is identical to (A B) | (C D) .

A | B

matches A or B but not both.

A - B

matches any string that matches A but does not match B .

A+

matches one or more occurrences of A .Concatenation has higher precedence than alternation; thus A+ | B+ is identical to (A+) | (B+) .

A*

matches zero or more occurrences of A . Concatenation has higher precedence than alternation; thus A* | B* is identical to (A*) | (B*)

A.1.1 Grammar Notes

This section contains general notes on the EBNF productions, which may be helpful in understanding how to create a parser based on this EBNF, how to read the EBNF, and generally call out issues with the syntax. The notes below are referenced from the right side of the production, with the notation: /* gn: <id> */ .

grammar-note: parens

A look-ahead of one character is required to distinguish function patterns from a QName or keyword followed by a Pragma, MUExtension or Comment. For example: address (: this may be empty :) may be mistaken for a call to a function named "address" unless this lookahead is employed. Another example is for (: whom the bell :) $tolls = 3 return $tolls , where the keyword "for" must not be mistaken for a function name.

grammar-note: lt

A tokenizer must be aware of the context in which the "<" pattern appears, in order to distinguish the "<" comparison operator from the "<" tag open symbol. The "<" comparison operator can not occur in the same places as a "<" tag open pattern.

grammar-note: validate

The ValidateExpr in the exposition, which does not use the "< ... >" token grouping, presents the production in a much simplified, and understandable, form. The ValidateExpr presented in the appendix is technically correct, but structurally hard to understand, because of limitations of the "< ... >" token grouping.

grammar-note: leading-lone-slash

The "/" presents an issue because it occurs either as a stand alone unit or as a leading prefix that expects a pattern to follow such as a QName or "*". Both of these patterns also may occur as patterns which are recognized in contexts where operators may occur. Thus, expressions such as "/ * 5" can easily be confused with the path expression "/*". Therefore, a stand-alone slash on the right hand side of an operator, will need to be parenthesized in order to stand alone, as in "(/) * 5". "5 * /", on the other hand, is legal syntax.

grammar-note: comments

Expression comments are allowed inside expressions everywhere that ignorable whitespace is allowed. Note that expression comments are not allowed in constructor content.

Comments can nest within each other, as long as all "(:" and ":)" patterns are balanced, no matter where they occur within the outer comment.

Note:

Lexical analysis may typically handle nested comments by incrementing a counter for each "(:" pattern, and decrementing the counter for each ":)" pattern. The comment does not terminate until the counter is back to zero.

Following are some illustrative examples:

  • for (: set up loop :) $i in $x return $i will parse correctly, ignoring the comment.

  • 5 instance (: strange place for a comment :) of xs:integer will also parse correctly, ignoring the comment.

  • <eg (: an example:)> $i//title </eg> will cause a syntax error.

  • <eg> (: an example:) </eg> will parse correctly, but characters inside the element is element content, and not an expression comment.

See Comments , Pragmas and Extensions for further information and examples.

grammar-note: xml-version

An implementation's choice (see to support the [XML 1.0] and [XML Names], or [XML 1.1] and [XML Names 1.1] lexical specification determines the external document from which to obtain the definition for this production. For convenience, XML 1.0 references are always used. In some cases, the XML 1.0 and XML 1.1 definitions may be exactly the same. Also please note that these external productions follow the whitespace rules of their respective specifications, and not the rules of this specification, in particular 2.6.5 XML and Names 1.1 Feature ) to support the [XML 1.0] and [XML Names], or [XML 1.1] and [XML Names 1.1] lexical specification determines the external document from which to obtain the definition for this production. For convenience, XML 1.0 references are always used. In some cases, the XML 1.0 and XML 1.1 definitions may be exactly the same. Also please note that these external productions follow the whitespace rules of their respective specifications, and not the rules of this specification, in particular A.2.2.1 Default Whitespace Handling . Thus prefix : localname is not a valid QName for purposes of this specification, just as it is not permitted in a textual XML document. Also, comments are not permissible on either side of the colon.

grammar-note: reserved-function-names

Some unprefixed function names may be confused with expression syntax by the parser. For instance, if(foo) could either be a function name invocation or an incomplete IfExpr. Therefore it is not legal syntax for a user to invoke functions with these names. See A.3 Reserved Function Names for a list of these names.

grammar-note: occurrence-indicators

The '+' and '*' Kleene operators are tightly bound to the SequenceType expression, and have higher precedence than other uses of these symbols. Any occurrence of '+' and '*', as well as '?', following a sequence type is assumed to be an occurrence indicator. Thus, 4 treat as item() + - 5 is interpreted as (4 treat as item()+) - 5 , so that the '+' would be an occurrence operator, and the '-' in this case would be a subtraction operator. To have this interpreted as an addition operator of a negative number, the form (4 treat as item()) + -5 would have to be used, so that the SequenceType expression is bounded by a parenthisis.

A.2 Lexical structure

It is implementation-defined whether the lexical rules of [XML 1.0] and [XML Names]are followed, or alternatively, the lexical rules of [XML 1.1] and [XML Names 1.1] are followed.

Note:

Implementations that support the full [XML 1.1] character set may wish, for purposes of interoperability, to provide a mode that follows only the [XML 1.0] and [XML Names] lexical rules.

When patterns are simple string matches, the strings are embedded directly into the EBNF. In other cases, named terminals are used.

It is up to an implementation to decide on the exact tokenization strategy, which may be different depending on the parser construction. In the EBNF, the notation "< ... >" is used to indicate a grouping of terminals that together may help disambiguate the individual symbols.

When tokenizing, the longest possible match that is valid in the current context is preferred .

All keywords are case sensitive. Keywords are not reserved—that is, any QName may duplicate a keyword except as noted in A.3 Reserved Function Names .

Editorial note  
The tables found in A.2.4 Lexical Rules of the previous draft have been made non-normative and removed from this document. They will be subsequently be re-published in a non-normative W3C Note.

A.2.1 Terminal Types

The entire set of terminals in XQuery 1.0 may be divided into two major classes, those that can act as token delimiters, and those that can not.

[Definition: A delimiting terminal may delimit adjacent non-delimiting terminals.] The following is the list of delimiting terminals:

, "<?", "?>", "::", "*", "$", "?", ")", "(", "{", ":", "/", "//", "=", "!=", "<=", "<<", ">=", ">>", ":=", "<", ">", "-", "+", "|", "@", "[", "]", ",", ";", "%%%", """, ".", "..", "<![CDATA[", "]]>", PredefinedEntityRef, CharRef, "/>", "</", "{{", "}}", EscapeQuot, EscapeApos, "'", Pragma, "(::", "::)", comment">Comment, "(:", ":)", "<!--", "-->", S, "}"

[Definition: Non-delimiting terminals generally start and end with alphabetic characters or digits. Adjacent non-delimiting terminals must be delimited by a delimiting terminal.] The following is the list of non-delimiting terminals:

DecimalLiteral, DoubleLiteral, StringLiteral, "xquery", "version", "encoding", "at", "module", "namespace", "child", "descendant", "parent", "attribute", "self", "descendant-or-self", "ancestor", "following-sibling", "preceding-sibling", "following", "preceding", "ancestor-or-self", "declare", "function", "option", "ordering", "ordered", "unordered", "default", "order", "external", "or", "and", "div", "idiv", "mod", "in", ValidationMode, "construction", "satisfies", "return", "then", "else", "boundary-space", "base-uri", "preserve", "strip", "copy-namespaces", "no-preserve", "inherit-namespaces", "yes"", "no-inherit", "to", "where", "collation", "intersect", "union", "external", "or", "and", "div", "idiv", "mod"except", "as", "case", "instance", "of", "castable", "item", "element", "schema-element", "schema-attribute", "processing-instruction", "comment", "text", "empty", "import", "schema", "is", "eq", "ne", "gt", "ge", "lt", "le", "some", "every", "for", "let", "cast", "treat", "validate", ValidationModedigits">Digits, "construction", "satisfies", "return", "then", "else", "xmlspace", "base-uri", "preserve"document-node", "document", "node", "if", "typeswitch", "by", "strip", "to", "where", "collation", "intersect", "union", "except"stable", "as", "case", "instance", "of", "castable", "item", "element", "schema-element", "schema-attribute", "processing-instruction", "comment", "text", "empty", "import", "schema", "is", "eq", "ne", "gt", "ge", "lt"cending", "descending", "greatest", "le", "some", "every", "for", "let", "cast", "treat"ast", "validate"variable", Digits, "document-node", "document", "node", "if", "typeswitch", "by", "stable", "ascending", "descending", "greatest", "least", "variable", ExtensionContents, commentcontents">CommentContents, "pragma", "extension", QName, NCName

A.2.2 Whitespace Rules

A.2.2.1 Default Whitespace Handling

[Definition: Whitespace characters are defined by [http://www.w3.org/TR/REC-xml#NT-S] when these characters occur outside of a StringLiteral.]

[Definition: Unless otherwise specified (see A.2.2.2 Explicit Whitespace Handling ), Ignorable whitespace may occur between terminals, and is not significant to the parse tree. For readability, whitespace may be used in most expressions even though not explicitly notated in the EBNF. All allowable whitespace that is not explicitly specified in the EBNF is ignorable whitespace, and converse, this term does not apply to whitespace that is explicitly specified. ] Whitespace is allowed before the first token and after the last token of an expressiona module. Whitespace is optional between delimiting terminals . Whitespace is required to prevent two adjacent non-delimiting terminal from being (mis-)recognized as one. Comments may also act as "whitespace" to prevent two adjacent tokens from being recognized as one.. Some illustrative examples are as follows: , Pragmas, and MUExtensions may also act as "whitespace" to prevent two adjacent tokens from being recognized as one.. Some illustrative examples are as follows:

  • foo- foo is a syntax error. "foo-" would be recognized as a QName.

  • foo -foo parses the same as foo - foo , two QNames separated by a subtraction operator. The parser would match the first "foo" as a QName. The parser would then expect a to match an operator, which is satisfied by "-" (but not "-foo"). The last foo would then be matched as another QName.

  • foo(: This is a comment :)- foo also parses the same as foo - foo . This is because the comment prevents the two adjacent tokens from being recognized as one.

  • foo-foo parses as a single QName. This is because "-" is a valid character in a QName. When used as an operator after the characters of a name, the "-" must be separated from the name, e.g. by using whitespace or parentheses.

  • 10div 3 results in a syntax error, since the "10" and the "div" would both be non-delimiting terminals and must be separated by delimiting terminals in order to be recognized.

  • 10 div3 also results in a syntax error, since the "div" and the "3" would both be non-delimiting terminals and must be separated by delimiting terminals in order to be recognized.

  • 10div3 also results in a syntax error, since the "10", "div" and the "3" would all be non-delimiting terminals and must be separated by delimiting terminals in order to be recognized.

A.2.3 Comments , Pragmas and Extensions

Pragmas and MUExtensions may be used anywhere that ignorable whitespace is allowed. Within a Pragma or MUExtension, the extension content may consist of any sequence of characters that does not include the sequence "::)". Pragmas, and MUExtensions are not allowed to nest. Comments are allowed to nest, though the content of a comment must have balanced comment delimiters without regard to structure. Some illustrative examples:

A.3 Reserved Function Names

The following is a list of names are not recognized as function names that must not be used as user function names in an unprefixed form, because these functions could be confused with expression syntax. Users should not have unprefixed invocations of functions with these names, and if they want to protect themselves from future changes they should use the prefixed form, or put a distinctive string in their function names takes precedence.

  • attribute

  • comment

  • document-node

  • element

  • item

  • empty

  • if

  • item

  • node

  • processing-instruction

  • schema-attribute

  • schema-element

  • text

  • typeswitch

A.4 Precedence Order

The grammar defines built-in precedence, among the operators of XQuery. These operators are summarised here in order of their precedence from lowest to highest. Operators that have a lower precedence number cannot be contained by operators with a higher precedence number. The associativity column indicates the order in which is summarised here. Operators that have a lower operators of equal precedence number can not be contained by operators with a higher precedence number. Operators may contain other operators with the same precedence number on the right-hand-side. In the cases where a number of operators in an expression are at the same precedence level, the operators are applied from left to right. .

The operators in order of increasing precedence are:

# Operator Associativity
1 , (comma) left-to-right
2 := (assignment) right-to-left
3 for, some, every, typeswitch, if left-to-right
34 or left-to-right
45 and left-to-right
56 eq, ne, lt, le, gt, ge, =, !=, <, <=, >, >=, is, <<, >> left-to-right
67 to left-to-right
78 +, - left-to-right
89 *, div, idiv, mod left-to-right
910 union, | left-to-right
1011 intersect, except left-to-right
1112 instance of left-to-right
1213 treat left-to-right
1314 castable left-to-right
1415 cast left-to-right
1516 -(unary), +(unary) right-to-left
1617 ?, *(OccurrenceIndicator), +(OccurrenceIndicator) left-to-right
1618 validate, /, // left-to-right
1719 [ ] , ( ) [ ], ( ), {} left-to-right