The following grammar uses the same simple Extended Backus-Naur Form (EBNF) notation as [XML 1.0] with the following minor differences. The notation "< ... >" is used to indicate a grouping of terminals that together may help disambiguate the individual symbols. To help readability, this "< ... >" notation is absent in the EBNF in the main body of this document. This appendix is the normative version of the EBNF.
Comments on grammar productions are between '/*' and '*/' symbols - please note that these comments are normative. A 'gn:' prefix means a 'Grammar Note', and is meant as a clarification for parsing rules, and is explained in A.1.1 Grammar Notes . A 'ws:' prefix explains the whitespace rules for the production, the details of which are explained in A.2.2 Whitespace Rules
| [1] |
Module
|
::= |
VersionDecl? (MainModule | LibraryModule)
|
|
| [2] |
VersionDecl
|
::= |
<"xquery" "version"> StringLiteral ("encoding" StringLiteral)? Separator
|
|
| [3] |
MainModule
|
::= |
Prolog QueryBody
|
|
| [4] |
LibraryModule
|
::= |
ModuleDecl Prolog
|
|
| [5] |
ModuleDecl
|
::= |
<"module" "namespace"> NCName "="
StringLiteral
URILiteral
Separator
|
|
| [6] |
Prolog
|
::= |
(Setter Separator)* ((Import | NamespaceDecl | DefaultNamespaceDecl) Separator)* ((VarDecl | FunctionDecl | OptionDecl) Separator)*
|
|
| [7] |
Setter
|
::= |
XMLSpaceDeclBoundarySpaceDecl | DefaultCollationDecl | BaseURIDecl | ConstructionDecl | OrderingModeDecl | EmptyOrderingDecl | InheritNamespacesDeclEmptyOrderDecl | CopyNamespacesDecl
|
|
| [8] |
Import
|
::= |
SchemaImport | ModuleImport
|
|
| [9] |
Separator
|
::= |
";"
|
|
| [10] |
NamespaceDecl
|
::= |
<"declare" "namespace"> NCName "=" StringLiteral URILiteral
|
|
| [11] |
XMLSpaceDecl
BoundarySpaceDecl
|
::= |
<"declare" "xmlspaceboundary-space"> ("preserve" | "strip")
|
|
| [12] |
DefaultNamespaceDecl
|
::= |
(<"declare" "default" "element"> | <"declare" "default" "function">) "namespace" StringLiteral URILiteral
|
|
| [13] |
OptionDecl
|
::= |
<"declare" "option"> QName StringLiteral
|
|
| [14] |
OrderingModeDecl
|
::= |
<"declare" "ordering"> ("ordered" | "unordered")
|
|
| [14[15] |
EmptyOrderingDecl
EmptyOrderDecl
|
::= |
<"declare" "default" "order"> (<"empty" "greatest"> | <"empty" "least">)
|
|
| [15[16] |
InheritNamespacesDecl
CopyNamespacesDecl
|
::= |
<"declare" "
| |
| [17] |
PreserveMode
|
::= |
"preserve" | "no-preserve"
|
|
| [16[18] |
InheritMode
|
::= |
"inherit" | "no-inherit"
|
|
| [19] |
DefaultCollationDecl
|
::= |
<"declare" "default" "collation"> StringLiteralURILiteral
|
|
| [1720] |
BaseURIDecl
|
::= |
<"declare" "base-uri"> StringLiteralURILiteral
|
|
| [1821] |
SchemaImport
|
::= |
<"import" "schema"> SchemaPrefix? StringLiteral URILiteral (<"at" StringLiteralURILiteral> ("," StringLiteralURILiteral)*)?
|
|
| [1922] |
SchemaPrefix
|
::= |
("namespace" NCName "=") | (<"default" "element"> "namespace")
|
|
| [2023] |
ModuleImport
|
::= |
<"import" "module"> ("namespace" NCName "=")? StringLiteral
URILiteral (<"at" StringLiteralURILiteral> ("," StringLiteralURILiteral)*)?
|
|
| [2124] |
VarDecl
|
::= |
<"declare" "variable" "$"> VarName TypeDeclaration? ((":=" ExprSingle) | "external")
|
|
| [2225] |
ConstructionDecl
|
::= |
<"declare" "construction"> ("preserve" | "strip")
|
|
| [2326] |
FunctionDecl
|
::= |
<"declare" "function"> <QName "("> ParamList? (")" | (<")" "as"> SequenceType)) (EnclosedExpr | "external")
|
|
| [2427] |
ParamList
|
::= |
Param ("," Param)*
|
|
| [2528] |
Param
|
::= |
"$" VarName TypeDeclaration?
|
|
| [2629] |
EnclosedExpr
|
::= |
"{" Expr "}"
|
|
| [2730] |
QueryBody
|
::= |
Expr
|
|
| [2831] |
Expr
|
::= |
ExprSingle ("," ExprSingle)*
|
|
| [2932] |
ExprSingle
|
::= |
FLWORExpr
|
|
| [3033] |
FLWORExpr
|
::= |
(ForClause | LetClause)+ WhereClause? OrderByClause? "return" ExprSingle
|
|
| [3134] |
ForClause
|
::= |
<"for" "$"> VarName TypeDeclaration? PositionalVar? "in" ExprSingle ("," "$" VarName TypeDeclaration? PositionalVar? "in" ExprSingle)*
|
|
| [3235] |
PositionalVar
|
::= |
"at" "$" VarName
|
|
| [3336] |
LetClause
|
::= |
<"let" "$"> VarName TypeDeclaration? ":=" ExprSingle ("," "$" VarName TypeDeclaration? ":=" ExprSingle)*
|
|
| [3437] |
WhereClause
|
::= |
"where" ExprSingle
|
|
| [3538] |
OrderByClause
|
::= |
(<"order" "by"> | <"stable" "order" "by">) OrderSpecList
|
|
| [3639] |
OrderSpecList
|
::= |
OrderSpec ("," OrderSpec)*
|
|
| [3740] |
OrderSpec
|
::= |
ExprSingle OrderModifier
|
|
| [3841] |
OrderModifier
|
::= |
("ascending" | "descending")? (<"empty" "greatest"> | <"empty" "least">)? ("collation" StringLiteralURILiteral)?
|
|
| [3942] |
QuantifiedExpr
|
::= |
(<"some" "$"> | <"every" "$">) VarName TypeDeclaration? "in" ExprSingle ("," "$" VarName TypeDeclaration? "in" ExprSingle)* "satisfies" ExprSingle
|
|
| [4043] |
TypeswitchExpr
|
::= |
<"typeswitch" "("> Expr ")" CaseClause+ "default" ("$" VarName)? "return" ExprSingle
|
|
| [4144] |
CaseClause
|
::= |
"case" ("$" VarName "as")? SequenceType "return" ExprSingle
|
|
| [4245] |
IfExpr
|
::= |
<"if" "("> Expr ")" "then" ExprSingle "else" ExprSingle
|
|
| [4346] |
OrExpr
|
::= |
AndExpr ( "or" AndExpr )*
|
|
| [4447] |
AndExpr
|
::= |
ComparisonExpr ( "and" ComparisonExpr )*
|
|
| [4548] |
ComparisonExpr
|
::= |
RangeExpr ( (ValueComp
|
|
| [4649] |
RangeExpr
|
::= |
AdditiveExpr ( "to" AdditiveExpr )?
|
|
| [4750] |
AdditiveExpr
|
::= |
MultiplicativeExpr ( ("+" | "-") MultiplicativeExpr )*
|
|
| [4851] |
MultiplicativeExpr
|
::= |
UnionExpr ( ("*" | "div" | "idiv" | "mod") UnionExpr )*
|
|
| [4952] |
UnionExpr
|
::= |
IntersectExceptExpr ( ("union" | "|") IntersectExceptExpr )*
|
|
| [5053] |
IntersectExceptExpr
|
::= |
InstanceofExpr ( ("intersect" | "except") InstanceofExpr )*
|
|
| [5154] |
InstanceofExpr
|
::= |
TreatExpr ( <"instance" "of"> SequenceType )?
|
|
| [5255] |
TreatExpr
|
::= |
CastableExpr ( <"treat" "as"> SequenceType )?
|
|
| [5356] |
CastableExpr
|
::= |
CastExpr ( <"castable" "as"> SingleType )?
|
|
| [5457] |
CastExpr
|
::= |
UnaryExpr ( <"cast" "as"> SingleType )?
|
|
| [5558] |
UnaryExpr
|
::= |
("-" | "+")* ValueExpr
|
|
| [5659] |
ValueExpr
|
::= |
ValidateExpr | PathExpr | ExtensionExpr
|
|
| [5760] |
GeneralComp
|
::= |
"=" | "!=" | "<" | "<=" | ">" | ">="
|
/* gn: lt */ |
| [5861] |
ValueComp
|
::= |
"eq" | "ne" | "lt" | "le" | "gt" | "ge"
|
|
| [5962] |
NodeComp
|
::= |
"is" | "<<" | ">>"
|
|
| [6063] |
ValidateExpr
|
::= |
(<"validate" "{"> | (<"validate" ValidationMode> "{")) Expr "}"
|
/* gn: validate */ |
| [61[64] |
ExtensionExpr
|
::= |
Pragma+ "{" Expr? "}"
|
|
| [65] |
Pragma
|
::= |
"(#" S? QName PragmaContents "#)"
|
/* ws: explicit */ |
| [66] |
PragmaContents
|
::= |
(Char* - (Char* '#)' Char*))
|
|
| [67] |
PathExpr
|
::= |
("/" RelativePathExpr?)
|
/* gn: leading-lone-slash */ |
| [6268] |
RelativePathExpr
|
::= |
StepExpr (("/" | "//") StepExpr)*
|
|
| [6369] |
StepExpr
|
::= |
AxisStep | FilterExpr
|
|
| [6470] |
AxisStep
|
::= |
(ForwardStep | ReverseStep) PredicateList
|
|
| [6571] |
ForwardStep
|
::= |
(ForwardAxis NodeTest) | AbbrevForwardStep
|
|
| [6672] |
ForwardAxis
|
::= |
<"child" "::">
|
|
| [6773] |
AbbrevForwardStep
|
::= |
"@"? NodeTest
|
|
| [6874] |
ReverseStep
|
::= |
(ReverseAxis NodeTest) | AbbrevReverseStep
|
|
| [6975] |
ReverseAxis
|
::= |
<"parent" "::">
|
|
| [7076] |
AbbrevReverseStep
|
::= |
".."
|
|
| [7177] |
NodeTest
|
::= |
KindTest | NameTest
|
|
| [7278] |
NameTest
|
::= |
QName | Wildcard
|
|
| [7379] |
Wildcard
|
::= |
"*"
|
/* ws: explicit */ |
| [7480] |
FilterExpr
|
::= |
PrimaryExpr PredicateList
|
|
| [7581] |
PredicateList
|
::= |
Predicate*
|
|
| [7682] |
Predicate
|
::= |
"[" Expr "]"
|
|
| [7783] |
PrimaryExpr
|
::= |
Literal | VarRef | ParenthesizedExpr | ContextItemExpr | FunctionCall | Constructor | OrderedExpr | UnorderedExpr
|
|
| [7884] |
Literal
|
::= |
NumericLiteral | StringLiteral
|
|
| [7985] |
NumericLiteral
|
::= |
IntegerLiteral | DecimalLiteral | DoubleLiteral
|
|
| [8086] |
VarRef
|
::= |
"$" VarName
|
|
| [8187] |
ParenthesizedExpr
|
::= |
"(" Expr? ")"
|
|
| [8288] |
ContextItemExpr
|
::= |
"."
|
|
| [8389] |
OrderedExpr
|
::= |
<"ordered" "{"> Expr "}"
|
|
| [8490] |
UnorderedExpr
|
::= |
<"unordered" "{"> Expr "}"
|
|
| [8591] |
FunctionCall
|
::= |
<QName "("> (ExprSingle ("," ExprSingle)*)? ")"
|
/* gn: parens */ |
| /* gn: reserved-function-names */ | ||||
| [8692] |
Constructor
|
::= |
DirectConstructor
|
|
| [8793] |
DirectConstructor
|
::= |
DirElemConstructor
|
|
| [8894] |
DirElemConstructor
|
::= |
"<" QName DirAttributeList ("/>" | (">" DirElemContent* "</" QName S? ">"))
|
/* ws: explicit */ |
| /* gn: lt */ | ||||
| [8995] |
DirAttributeList
|
::= |
(S (QName S? "=" S? DirAttributeValue)?)*
|
/* ws: explicit */ |
| [9096] |
DirAttributeValue
|
::= |
('"' (EscapeQuot | QuotAttrValueContent)* '"')
|
/* ws: explicit */ |
| [9197] |
QuotAttrValueContent
|
::= |
QuotAttrContentChar
|
|
| [9298] |
AposAttrValueContent
|
::= |
AposAttrContentChar
|
|
| [9399] |
DirElemContent
|
::= |
DirectConstructor
|
|
| [94100] |
CommonContent
|
::= |
PredefinedEntityRef | CharRef | "{{" | "}}" | EnclosedExpr
|
|
| [95101] |
DirCommentConstructor
|
::= |
"<!--" DirCommentContents "-->"
|
/* ws: explicit */ |
| [96102] |
DirCommentContents
|
::= |
((Char - '-') | <'-' (Char - '-')>)*
|
/* ws: explicit */ |
| [97103] |
DirPIConstructor
|
::= |
"<?" PITarget (S DirPIContents)? "?>"
|
/* ws: explicit */ |
| [98104] |
DirPIContents
|
::= |
(Char* - (Char* '?>' Char*))
|
/* ws: explicit */ |
| [99105] |
CDataSection
|
::= |
"<![CDATA[" CDataSectionContents "]]>"
|
/* ws: explicit */ |
| [100106] |
CDataSectionContents
|
::= |
(Char* - (Char* ']]>' Char*))
|
/* ws: explicit */ |
| [101107] |
ComputedConstructor
|
::= |
CompDocConstructor
|
|
| [102108] |
CompDocConstructor
|
::= |
<"document" "{"> Expr "}"
|
|
| [103109] |
CompElemConstructor
|
::= |
(<"element" QName "{"> | (<"element" "{"> Expr "}" "{")) ContentExpr? "}"
|
|
| [10410] |
ContentExpr
|
::= |
Expr
|
|
| [10511] |
CompAttrConstructor
|
::= |
(<"attribute" QName "{"> | (<"attribute" "{"> Expr "}" "{")) Expr? "}"
|
|
| [10612] |
CompTextConstructor
|
::= |
<"text" "{"> Expr "}"
|
|
| [10713] |
CompCommentConstructor
|
::= |
<"comment" "{"> Expr "}"
|
|
| [10814] |
CompPIConstructor
|
::= |
(<"processing-instruction" NCName "{"> | (<"processing-instruction" "{"> Expr "}" "{")) Expr? "}"
|
|
| [10915] |
SingleType
|
::= |
AtomicType "?"?
|
|
| [110116] |
TypeDeclaration
|
::= |
"as" SequenceType
|
|
| [111117] |
SequenceType
|
::= |
(ItemType OccurrenceIndicator?)
|
|
| [112118] |
OccurrenceIndicator
|
::= |
"?" | "*" | "+"
|
/* gn: occurrence-indicators */ |
| [113119] |
ItemType
|
::= |
AtomicType | KindTest | <"item" "(" ")">
|
|
| [11420] |
AtomicType
|
::= |
QName
|
|
| [11521] |
KindTest
|
::= |
DocumentTest
|
|
| [11622] |
AnyKindTest
|
::= |
<"node" "("> ")"
|
|
| [11723] |
DocumentTest
|
::= |
<"document-node" "("> (ElementTest | SchemaElementTest)? ")"
|
|
| [11824] |
TextTest
|
::= |
<"text" "("> ")"
|
|
| [11925] |
CommentTest
|
::= |
<"comment" "("> ")"
|
|
| [120126] |
PITest
|
::= |
<"processing-instruction" "("> (NCName | StringLiteral)? ")"
|
|
| [121127] |
AttributeTest
|
::= |
<"attribute" "("> (AttribNameOrWildcard ("," TypeName)?)? ")"
|
|
| [122128] |
AttribNameOrWildcard
|
::= |
AttributeName | "*"
|
|
| [123129] |
SchemaAttributeTest
|
::= |
<"schema-attribute" "("> AttributeDeclaration ")"
|
|
| [12430] |
AttributeDeclaration
|
::= |
AttributeName
|
|
| [12531] |
ElementTest
|
::= |
<"element" "("> (ElementNameOrWildcard ("," TypeName "?"?)?)? ")"
|
|
| [12632] |
ElementNameOrWildcard
|
::= |
ElementName | "*"
|
|
| [12733] |
SchemaElementTest
|
::= |
<"schema-element" "("> ElementDeclaration ")"
|
|
| [12834] |
ElementDeclaration
|
::= |
ElementName
|
|
| [12935] |
AttributeName
|
::= |
QName
|
|
| [130136] |
ElementName
|
::= |
QName
|
|
| [131137] |
TypeName
|
::= |
QName
|
| [132138] |
IntegerLiteral
|
::= |
Digits
|
|
| [133139] |
DecimalLiteral
|
::= |
("." Digits) | (Digits "." [0-9]*)
|
/* ws: explicit */ |
| [13440] |
DoubleLiteral
|
::= |
(("." Digits) | (Digits ("." [0-9]*)?)) [eE] [+-]? Digits
|
/* ws: explicit */ |
| [135[141] |
URILiteral
|
::= |
StringLiteral
|
|
| [142] |
StringLiteral
|
::= |
('"' (PredefinedEntityRef | CharRef | ('"' '"') | [^"&])* '"') | ("'" (PredefinedEntityRef | CharRef | ("'" "'") | [^'&])* "'")
|
/* ws: explicit */ |
| [13643] |
PITarget
|
::= |
[http://www.w3.org/TR/REC-xml#NT-PITarget]
|
/* gn: xml-version */ |
| [13744] |
VarName
|
::= |
QName
|
|
| [13845] |
ValidationMode
|
::= |
"lax" | "strict"
|
|
| [13946] |
Digits
|
::= |
[0-9]+
|
|
| [140147] |
PredefinedEntityRef
|
::= |
"&" ("lt" | "gt" | "amp" | "quot" | "apos") ";"
|
/* ws: explicit */ |
| [141148] |
CharRef
|
::= |
[http://www.w3.org/TR/REC-xml#NT-CharRef]
|
/* gn: xml-version */ |
| [142149] |
EscapeQuot
|
::= |
'""'
|
|
| [14350] |
EscapeApos
|
::= |
"''"
|
|
| [14451] |
ElementContentChar
|
::= |
Char - [{}<&]
|
|
| [14552] |
QuotAttrContentChar
|
::= |
Char - ["{}<&]
|
|
| [14653] |
AposAttrContentChar
|
::= |
Char - ['{}<&]
|
|
| [147[154] |
Pragma
|
::= |
"(::" S? "pragma" S QName (S ExtensionContents)? "::)"
|
/* ws: explicit */ |
| [148] |
MUExtension
|
::= |
"(::" S? "extension" S QName (S ExtensionContents)? "::)"
|
/* ws: explicit */ |
| [149] |
ExtensionContents
|
::= |
(Char* - (Char* '::)' Char*))
|
|
| [150] |
Comment
|
::= |
"(:" (CommentContents | Comment)* ":)"
|
/* ws: explicit */ |
| /* gn: comments */ | ||||
| [151155] |
CommentContents
|
::= |
(Char+ - (Char* ':)' Char*))
|
|
| [152156] |
QName
|
::= |
[http://www.w3.org/TR/REC-xml-names/#NT-QName]
|
/* gn: xml-version */ |
| [153157] |
NCName
|
::= |
[http://www.w3.org/TR/REC-xml-names/#NT-NCName]
|
/* gn: xml-version */ |
| [154158] |
S
|
::= |
[http://www.w3.org/TR/REC-xml#NT-S]
|
/* gn: xml-version */ |
| [155159] |
Char
|
::= |
[http://www.w3.org/TR/REC-xml#NT-Char]
|
/* gn: xml-version */ |
The following term definitions will be helpful in defining precisely this exposition.
[Definition: Each rule in the grammar defines one symbol , in the form]
symbol ::= expression
[Definition: A terminal is a single unit of the grammar that can not be further subdivided, and is specified in the EBNF by a character or characters in quotes, or a regular expression.] The following expressions are used to match strings of one or more characters in a terminal:
where N is a hexadecimal integer, the expression matches the character in [ISO/IEC 10646] whose canonical (UCS-4) code value, when interpreted as an unsigned binary number, has the value indicated. The number of leading zeros in the #xN form is insignificant; the number of leading zeros in the corresponding code value is governed by the character encoding in use and is not significant for XML.
matches any Char with a value in the range(s) indicated (inclusive).
matches any Char with a value among the characters enumerated. Enumerations and ranges can be mixed in one set of brackets.
matches any Char with a value outside the range indicated.
matches any Char with a value not among the characters given. Enumerations and ranges of forbidden values can be mixed in one set of brackets.
matches a literal string matching that given inside the double quotes.
matches a literal string matching that given inside the single quotes.
matches a production defined in the external specification as per the provided reference. For the purposes of this secification, the entire unit is defined as a terminal.
[Definition: A production combines symbols to form more complex patterns. ] The following productions serve as examples, where A and B represent simple expressions:
expression
is treated as a unit and may be combined as described in this list.
matches
A
or nothing; optional
A
.
matches
A
followed by
B
. This operator has higher precedence than alternation; thus
A B | C D
is identical to
(A B) | (C D)
.
matches
A
or
B
but not both.
matches any string that matches
A
but does not match
B
.
matches one or more occurrences of
A
.Concatenation has higher precedence than alternation; thus
A+ | B+
is identical to
(A+) | (B+)
.
matches zero or more occurrences of
A
. Concatenation has higher precedence than alternation; thus
A* | B*
is identical to
(A*) | (B*)
This section contains general notes on the EBNF productions, which may be helpful in understanding how to create a parser based on this EBNF, how to read the EBNF, and generally call out issues with the syntax. The notes below are referenced from the right side of the production, with the notation: /* gn: <id> */ .
A look-ahead of one character is required to distinguish function patterns from a QName or keyword followed by a Pragma, MUExtension or Comment. For example:
address (: this may be empty :)
may be mistaken for a call to a function named "address" unless this lookahead is employed. Another example is
for (: whom the bell :) $tolls = 3 return $tolls
, where the keyword "for" must not be mistaken for a function name.
A tokenizer must be aware of the context in which the "<" pattern appears, in order to distinguish the "<" comparison operator from the "<" tag open symbol. The "<" comparison operator can not occur in the same places as a "<" tag open pattern.
The ValidateExpr in the exposition, which does not use the "< ... >" token grouping, presents the production in a much simplified, and understandable, form. The ValidateExpr presented in the appendix is technically correct, but structurally hard to understand, because of limitations of the "< ... >" token grouping.
The "/" presents an issue because it occurs either as a stand alone unit or as a leading prefix that expects a pattern to follow such as a QName or "*". Both of these patterns also may occur as patterns which are recognized in contexts where operators may occur. Thus, expressions such as "/ * 5" can easily be confused with the path expression "/*". Therefore, a stand-alone slash on the right hand side of an operator, will need to be parenthesized in order to stand alone, as in "(/) * 5". "5 * /", on the other hand, is legal syntax.
Expression comments are allowed inside expressions everywhere that ignorable whitespace is allowed. Note that expression comments are not allowed in constructor content.
Comments can nest within each other, as long as all "(:" and ":)" patterns are balanced, no matter where they occur within the outer comment.
Note:
Lexical analysis may typically handle nested comments by incrementing a counter for each "(:" pattern, and decrementing the counter for each ":)" pattern. The comment does not terminate until the counter is back to zero.
Following are some illustrative examples:
for (: set up loop :) $i in $x return $i
will parse correctly, ignoring the comment.
5 instance (: strange place for a comment :) of xs:integer
will also parse correctly, ignoring the comment.
<eg (: an example:)> $i//title </eg>
will cause a syntax error.
<eg> (: an example:) </eg>
will parse correctly, but characters inside the element is element content, and not an expression comment.
See Comments , Pragmas and Extensions for further information and examples.
An implementation's choice (see to support the [XML 1.0] and [XML Names], or [XML 1.1] and [XML Names 1.1] lexical specification determines the external document from which to obtain the definition for this production. For convenience, XML 1.0 references are always used. In some cases, the XML 1.0 and XML 1.1 definitions may be exactly the same. Also please note that these external productions follow the whitespace rules of their respective specifications, and not the rules of this specification, in particular
2.6.5 XML and Names 1.1 Feature
) to support the [XML 1.0] and [XML Names], or [XML 1.1] and [XML Names 1.1] lexical specification determines the external document from which to obtain the definition for this production. For convenience, XML 1.0 references are always used. In some cases, the XML 1.0 and XML 1.1 definitions may be exactly the same. Also please note that these external productions follow the whitespace rules of their respective specifications, and not the rules of this specification, in particular
A.2.2.1 Default Whitespace Handling
. Thus
prefix : localname
is not a valid QName for purposes of this specification, just as it is not permitted in a textual XML document. Also, comments are not permissible on either side of the colon.
Some unprefixed function names may be confused with expression syntax by the parser. For instance,
if(foo)
could either be a function name invocation or an incomplete IfExpr. Therefore it is not legal syntax for a user to invoke functions with these names. See
A.3 Reserved Function Names
for a list of these names.
The '+' and '*' Kleene operators are tightly bound to the SequenceType expression, and have higher precedence than other uses of these symbols. Any occurrence of '+' and '*', as well as '?', following a sequence type is assumed to be an occurrence indicator. Thus,
4 treat as item() + - 5
is interpreted as
(4 treat as item()+) - 5
, so that the '+' would be an occurrence operator, and the '-' in this case would be a subtraction operator. To have this interpreted as an addition operator of a negative number, the form
(4 treat as item()) + -5
would have to be used, so that the SequenceType expression is bounded by a parenthisis.
It is implementation-defined whether the lexical rules of [XML 1.0] and [XML Names]are followed, or alternatively, the lexical rules of [XML 1.1] and [XML Names 1.1] are followed.
Note:
Implementations that support the full [XML 1.1] character set may wish, for purposes of interoperability, to provide a mode that follows only the [XML 1.0] and [XML Names] lexical rules.
When patterns are simple string matches, the strings are embedded directly into the EBNF. In other cases, named terminals are used.
It is up to an implementation to decide on the exact tokenization strategy, which may be different depending on the parser construction. In the EBNF, the notation "< ... >" is used to indicate a grouping of terminals that together may help disambiguate the individual symbols.
When tokenizing, the longest possible match that is valid in the current context is preferred .
All keywords are case sensitive. Keywords are not reserved—that is, any QName may duplicate a keyword except as noted in A.3 Reserved Function Names .
| Editorial note | |
| The tables found in A.2.4 Lexical Rules of the previous draft have been made non-normative and removed from this document. They will be subsequently be re-published in a non-normative W3C Note. | |
The entire set of terminals in XQuery 1.0 may be divided into two major classes, those that can act as token delimiters, and those that can not.
[Definition: A delimiting terminal may delimit adjacent non-delimiting terminals.] The following is the list of delimiting terminals:
, "<?", "?>", "::", "*", "$", "?", ")", "(", "{", ":", "/", "//", "=", "!=", "<=", "<<", ">=", ">>", ":=", "<", ">", "-", "+", "|", "@", "[", "]", ",", ";", "%%%", """, ".", "..", "<![CDATA[", "]]>", PredefinedEntityRef, CharRef, "/>", "</", "{{", "}}", EscapeQuot, EscapeApos, "'", Pragma, "(::", "::)", comment">Comment, "(:", ":)", "<!--", "-->", S, "}"
[Definition: Non-delimiting terminals generally start and end with alphabetic characters or digits. Adjacent non-delimiting terminals must be delimited by a delimiting terminal.] The following is the list of non-delimiting terminals:
DecimalLiteral, DoubleLiteral, StringLiteral, "xquery", "version", "encoding", "at", "module", "namespace", "child", "descendant", "parent", "attribute", "self", "descendant-or-self", "ancestor", "following-sibling", "preceding-sibling", "following", "preceding", "ancestor-or-self", "declare", "function", "option", "ordering", "ordered", "unordered", "default", "order", "external", "or", "and", "div", "idiv", "mod", "in", ValidationMode, "construction", "satisfies", "return", "then", "else", "boundary-space", "base-uri", "preserve", "strip", "copy-namespaces", "no-preserve", "inherit-namespaces", "yes"", "no-inherit", "to", "where", "collation", "intersect", "union", "external", "or", "and", "div", "idiv", "mod"except", "as", "case", "instance", "of", "castable", "item", "element", "schema-element", "schema-attribute", "processing-instruction", "comment", "text", "empty", "import", "schema", "is", "eq", "ne", "gt", "ge", "lt", "le", "some", "every", "for", "let", "cast", "treat", "validate", ValidationModedigits">Digits, "construction", "satisfies", "return", "then", "else", "xmlspace", "base-uri", "preserve"document-node", "document", "node", "if", "typeswitch", "by", "strip", "to", "where", "collation", "intersect", "union", "except"stable", "as", "case", "instance", "of", "castable", "item", "element", "schema-element", "schema-attribute", "processing-instruction", "comment", "text", "empty", "import", "schema", "is", "eq", "ne", "gt", "ge", "lt"cending", "descending", "greatest", "le", "some", "every", "for", "let", "cast", "treat"ast", "validate"variable", Digits, "document-node", "document", "node", "if", "typeswitch", "by", "stable", "ascending", "descending", "greatest", "least", "variable", ExtensionContents, commentcontents">CommentContents, "pragma", "extension", QName, NCName
[Definition: Whitespace characters are defined by [http://www.w3.org/TR/REC-xml#NT-S] when these characters occur outside of a StringLiteral.]
[Definition: Unless otherwise specified (see A.2.2.2 Explicit Whitespace Handling ), Ignorable whitespace may occur between terminals, and is not significant to the parse tree. For readability, whitespace may be used in most expressions even though not explicitly notated in the EBNF. All allowable whitespace that is not explicitly specified in the EBNF is ignorable whitespace, and converse, this term does not apply to whitespace that is explicitly specified. ] Whitespace is allowed before the first token and after the last token of an expressiona module. Whitespace is optional between delimiting terminals . Whitespace is required to prevent two adjacent non-delimiting terminal from being (mis-)recognized as one. Comments may also act as "whitespace" to prevent two adjacent tokens from being recognized as one.. Some illustrative examples are as follows: , Pragmas, and MUExtensions may also act as "whitespace" to prevent two adjacent tokens from being recognized as one.. Some illustrative examples are as follows:
foo- foo
is a syntax error. "foo-" would be recognized as a QName.
foo -foo
parses the same as
foo - foo
, two QNames separated by a subtraction operator. The parser would match the first "foo" as a QName. The parser would then expect a to match an operator, which is satisfied by "-" (but not "-foo"). The last foo would then be matched as another QName.
foo(: This is a comment :)- foo
also parses the same as
foo - foo
. This is because the comment prevents the two adjacent tokens from being recognized as one.
foo-foo
parses as a single QName. This is because "-" is a valid character in a QName. When used as an operator after the characters of a name, the "-" must be separated from the name, e.g. by using whitespace or parentheses.
10div 3
results in a syntax error, since the "10" and the "div" would both be non-delimiting terminals and must be separated by delimiting terminals in order to be recognized.
10 div3
also results in a syntax error, since the "div" and the "3" would both be non-delimiting terminals and must be separated by delimiting terminals in order to be recognized.
10div3
also results in a syntax error, since the "10", "div" and the "3" would all be non-delimiting terminals and must be separated by delimiting terminals in order to be recognized.
Explict whitespace notation is specified with the EBNF productions, when it is different from the default rules, as follows. These notations do not inherit. In other words, if EBNF rule is marked as /* ws: explicit */, the rule does not automatically apply to all the 'child' EBNF productions of that rule.
"ws: explicit" means that the EBNF notation
S
explicitly notates where whitespace is allowed. In productions with this notation,
A.2.2.1 Default Whitespace Handling
does not apply. Comments are also not allowed in these productions.
, Pragmas, and MUExtensions
are also not allowed in these productions.
Whitespace is not freely allowed in the direct Constructor productions, but is specified explicitly in the grammar, in order to be more consistent with XML.
Pragmas and MUExtensions may be used anywhere that ignorable whitespace is allowed. Within a Pragma or MUExtension, the extension content may consist of any sequence of characters that does not include the sequence "::)". Pragmas, and MUExtensions are not allowed to nest. Comments are allowed to nest, though the content of a comment must have balanced comment delimiters without regard to structure. Some illustrative examples:
(: is this a comment? ::)
is a legal Comment.
(: can I comment out a (:: pragma foo ::) like this? :)
is a legal Comment. The inner pragma is seen as a nested comment in this case.
(: is this a comment? ::) or an error? :)
must produce a syntax error. Any unbalanced nesting of "(:" and ":)" will result in an error.
(: what about a partial (:: pragma? :)
must produce a syntax error. Any unbalanced nesting of "(:" and ":)" will result in an error.
(:: pragma foo with a comment (: is this ok? :) or not ::)
is a legal Pragma. The inner "comment" is seen as part of the extension content in this case.
(:: pragma foo with a comment (: is this a comment? ::) or an error? :) ::)
is a syntax error. "::)" patterns are not allowed inside pragma's and extensions.
(:: pragma foo with a comment (: what about a partial (:: pragma? :) ::)
is a legal Pragma. The inner "comment" and "(::" are seen as part of the extension content in this case.
(: commenting out a (: comment :) may be confusing, but often helpful :)
is a legal Comment, since balanced nesting of comments is allowed.
"this is just a string :)"
is a legal expression. However,
(: "this is just a string :)" :)
will cause a syntax error. Likewise,
"this is another string (:"
is a legal expression, but
(: "this is another string (:" :)
will cause a syntax error. It is a limitation of nested comments that literal content can cause unbalanced nesting of comments.
The following is a list of names are not recognized as function names that must not be used as user function names in an unprefixed form, because these functions could be confused with expression syntax. Users should not have unprefixed invocations of functions with these names, and if they want to protect themselves from future changes they should use the prefixed form, or put a distinctive string in their function names takes precedence.
attribute
comment
document-node
element
empty
if
item
node
processing-instruction
schema-attribute
schema-element
text
typeswitch
The grammar defines built-in precedence, among the operators of XQuery. These operators are summarised here in order of their precedence from lowest to highest. Operators that have a lower precedence number cannot be contained by operators with a higher precedence number. The associativity column indicates the order in which is summarised here. Operators that have a lower operators of equal precedence number can not be contained by operators with a higher precedence number. Operators may contain other operators with the same precedence number on the right-hand-side. In the cases where a number of operators in an expression are at the same precedence level, the operators are applied from left to right. .
The operators in order of increasing precedence are:
| # | Operator | Associativity |
|---|---|---|
| 1 | , (comma) | left-to-right |
| 2 | := (assignment) | right-to-left |
| 3 | for, some, every, typeswitch, if | left-to-right |
| 34 | or | left-to-right |
| 45 | and | left-to-right |
| 56 | eq, ne, lt, le, gt, ge, =, !=, <, <=, >, >=, is, <<, >> | left-to-right |
| 67 | to | left-to-right |
| 78 | +, - | left-to-right |
| 89 | *, div, idiv, mod | left-to-right |
| 910 | union, | | left-to-right |
| 1011 | intersect, except | left-to-right |
| 1112 | instance of | left-to-right |
| 1213 | treat | left-to-right |
| 1314 | castable | left-to-right |
| 1415 | cast | left-to-right |
| 1516 | -(unary), +(unary) | right-to-left |
| 1617 | ?, *(OccurrenceIndicator), +(OccurrenceIndicator) | left-to-right |
| 1618 | validate, /, // | left-to-right |
| 1719 | [ ] , ( ) [ ], ( ), {} | left-to-right |