XPath and XQuery Functions and Operators 1.1 WD-xpath-functions-11 W3C Working Draft 15 December 2009 http://www.w3.org/TR/2009/WD-xpath-functions-11-20091215/ XML Change markings relative to first edition http://www.w3.org/TR/xpath-functions-11/ Michael Kay (XSL WG) Saxonica http://www.saxonica.com/

This document defines constructor functions, operators, and functions on the datatypes defined in and the datatypes defined in . It also defines functions and operators on nodes and node sequences as defined in the . These functions and operators are defined for use in , and and other related XML standards. The signatures and summaries of functions defined in this document are available at: http://www.w3.org/2005/xpath-functions.

This is the third version of the specification of this function library. The first version was included as an intrinsic part of the specification published on 16 November 1999. The second version was published under the title XQuery 1.0 and XPath 2.0 Functions and Operators on 23 January 2007. This third version is the first to carry its own version number, which has been arbitrarily set at 1.1 to align with version numbering for XQuery.

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is one document in a set of eight documents that have progressed to Recommendation together (XQuery 1.1, XQueryX 1.1, XSLT 2.1, Data Model 1.1, Functions and Operators 1.1, Formal Semantics 1.1, Serialization 1.1, XPath 2.1).

This is a First Public Working Draft as described in the Process Document. It has been jointly developed by the W3C XML Query Working Group and the W3C XSL Working Group, each of which is part of the XML Activity. The Working Groups expect to advance this specification to Recommendation Status.

This is the first public Working Draft of XQuery and XPath Functions and Operators 1.1 (XDM). It is intended to be fully "upwards compatible" with XQuery 1.0 and XPath 2.0 Data Model (XDM). Failures to achieve that goal will be corrected in future versions of the Working Drafts of this document.

A Test Suite has been created for this document. Implementors are encouraged to run this test suite and report their results. The Test Suite can be found at http://dev.w3.org/cvsweb/2006/xquery-test-suite/. An implementation report is available at http://www.w3.org/XML/Query/test-suite/XQTSReport.html.

Please report errors in this document using W3C's public Bugzilla system (instructions can be found at http://www.w3.org/XML/2005/04/qt-bugzilla). If access to that system is not feasible, you may send your comments to the W3C XSLT/XPath/XQuery public comments mailing list, public-qt-comments@w3.org. It will be very helpful if you include the string “[FO11]” in the subject line of your report, whether made in Bugzilla or in email. Please use multiple Bugzilla entries (or, if necessary, multiple email messages) if you have more than one comment to make. Archives of the comments and responses are available at http://lists.w3.org/Archives/Public/public-qt-comments/.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by groups operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the XML Query Working Group and also maintains a public list of any patent disclosures made in connection with the deliverables of the XSL Working Group; those pages also include instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

English

Introduction

The purpose of this document is to catalog the functions and operators required for XPath 2.0, XML Query 1.0 and XSLT 2.0. The exact syntax used to call these functions and operators is specified in , and .

This document defines constructor functions and functions that take typed values as arguments. Some of the functions define the semantics of operators discussed in .

defines a number of primitive and derived datatypes, collectively known as built-in datatypes. This document defines functions and operations on these datatypes as well as the datatypes defined in of the . These functions and operations are defined for use in , and and related XML standards. This document also defines functions and operators on nodes and node sequences as defined in the for use in , and and other related XML standards.

References to specific sections of some of the above documents are indicated by cross-document links in this document. Each such link consists of a pointer to a specific section followed a superscript specifying the linked document. The superscripts have the following meanings: 'XQ' , 'XT' , 'XP' , 'DM' and 'FS' .

Conformance

The Functions and Operators specification is intended primarily as a component that can be used by other specifications. Therefore, Functions and Operators relies on specifications that use it (such as , and ) to specify conformance criteria for their respective environments.

Authors of conformance criteria for the use of the Functions and Operators should pay particular attention to the following features:

It is which version of Unicode is supported, but it is recommended that the most recent version of Unicode be used.

It is whether the type system is based on XML Schema 1.0 or XML Schema 1.1.

Support for XML 1.0 and XML 1.1 by the datatypes used in Functions and Operators.

At the time of writing there is a Candidate Recommendation of XML Schema 1.1 that introduces some new data types including xs:precisionDecimal and xs:dateTimeStamp. This specification provides some limited support for the latter, but does not yet include support for xs:precisionDecimal. This is likely to come in a later draft of this specification. Furthermore, XSD 1.1 includes the option of supporting revised definitions of types such as xs:NCName based on the rules in XML 1.1 rather than 1.0. The rules affecting support for XSD 1.0 versus XSD 1.1 and XML 1.0 versus XML 1.1 are likely to be refined in later drafts of this specification.

In this document, text labeled as an example or as a Note is provided for explanatory purposes and is not normative.

Namespaces and Prefixes

The functions and operators discussed in this document are contained in one of several namespaces (see ) and referenced using an xs:QName.

This document uses conventional prefixes to refer to these namespaces. User-written applications can choose a different prefix to refer to the namespace, so long as it is bound to the correct URI. The host language may also define a default namespace for function calls, in which case function names in that namespace need not be prefixed at all. In many cases the default namespace will be http://www.w3.org/2005/xpath-functions, allowing a call on the fn:name function (for example) to be written as name() rather than fn:name(); in this document, however, all example function calls are explicitly prefixed.

The URIs of the namespaces and the conventional prefixes associated with them are:

http://www.w3.org/2001/XMLSchema for constructors -- associated with xs.

The datatypes and constructor functions for the built-in datatypes defined in and in of and discussed in are in the XML Schema namespace, http://www.w3.org/2001/XMLSchema, and named in this document using the xs prefix.

http://www.w3.org/2005/xpath-functions for functions — associated with fn.

The namespace prefix used in this document for most functions that are available to users is fn.

http://www.w3.org/2005/xpath-functions/math for functions — associated with math.

This namespace is used for some mathematical functions. The namespace prefix used in this document for these functions is math. These functions are available to users in exactly the same way as those in the fn namespace.

http://www.w3.org/2005/xqt-errors — associated with err.

There are no functions in this namespace; it is used for error codes.

This document uses the prefix err to represent the namespace URI http://www.w3.org/2005/xqt-errors, which is the namespace for all XPath and XQuery error codes and messages. This namespace prefix is not predeclared and its use in this document is not normative.

The namespace URI associated with the err prefix is not expected to change from one version of this document to another. The contents of this namespace may be extended to allow additional errors to be returned.

Functions defined with the op prefix are described here to underpin the definitions of the operators in , and . These functions are not available directly to users, and there is no requirement that implementations should actually provide these functions. For this reason, no namespace is associated with the op prefix. For example, multiplication is generally associated with the * operator, but it is described as a function in this document:

Function Overloading

In general, the specifications named above do not support function overloading in the sense that functions that have multiple signatures with the same name and the same number of parameters are not supported. Consequently, there are no such overloaded functions in this document except for legacy functions such as fn:string, which accepts a single parameter of a variety of types. In addition, it should be noted that the functions defined in that accept numeric parameters accept arguments of type xs:integer, xs:decimal, xs:float or xs:double. See . Operators such as "+" may be overloaded. This document does define some functions with more than one signature with the same name and different number of parameters. User-defined functions with more than one signature with the same name and different number of parameters are also supported.

Function Signatures and Descriptions

Each function is defined by specifying its signature, a description of the return type and each of the parameters and its semantics. For many functions, examples are included to illustrate their use.

Each function's signature is presented in a form like this:

In this notation, function-name, in bold-face, is the name of the function whose signature is being specified. If the function takes no parameters, then the name is followed by an empty parameter list: "()"; otherwise, the name is followed by a parenthesized list of parameter declarations, each declaration specifies the static type of the parameter, in italics, and a descriptive, but non-normative, name. If there are two or more parameter declarations, they are separated by a comma. The return-type , also in italics, specifies the static type of the value returned by the function. The dynamic type returned by the function is the same as its static type or derived from the static type. All parameter types and return types are specified using the SequenceType notation defined in .

One function, fn:concat, has a variable number of arguments (two or more). More strictly, there is an infinite set of functions having the name fn:concat, with arity ranging from 2 to infinity. For this special case, a single function signature is given, with an ellipsis indicating an indefinite number of arguments.

In some cases the word numeric is used in function signatures as a shorthand to indicate the four numeric types: xs:integer, xs:decimal, xs:float and xs:double. For example, a function with the signature:

represents the following four function signatures:

For most functions there is an initial paragraph describing what the function does followed by semantic rules. These rules are meant to be followed in the order that they appear in this document.

In some cases, the static type returned by a function depends on the type(s) of its argument(s). These special functions are indicated by using bold italics for the return type. The semantic rules specifying the type of the value returned are documented in the function definition. The rules are described more formally in .

The function name is a QName as defined in and must adhere to its syntactic conventions. Following , function names are composed of English words separated by hyphens,"-". If a function name contains a datatype name, it may have intercapitalized spelling and is used in the function name as such. For example, fn:timezone-from-dateTime.

Rules for passing parameters to operators are described in the relevant sections of and . For example, the rules for passing parameters to arithmetic operators are described in . Specifically, rules for parameters of type xs:untypedAtomic and the empty sequence are specified in this section.

As is customary, the parameter type name indicates that the function or operator accepts arguments of that type, or types derived from it, in that position. This is called subtype substitution (See ). In addition, numeric type instances and instances of type xs:anyURI can be promoted to produce an argument of the required type. (See ).

Subtype Substitution: A derived type may substitute for its base type. In particular, xs:integer may be used where xs:decimal is expected.

Numeric Type Promotion: xs:decimal may be promoted to xs:float or xs:double. Promotion to xs:double should be done directly, not via xs:float, to avoid loss of precision.

anyURI Type Promotion: A value of type xs:anyURI can be promoted to the type xs:string.

Some functions accept a single value or the empty sequence as an argument and some may return a single value or the empty sequence. This is indicated in the function signature by following the parameter or return type name with a question mark: "?", indicating that either a single value or the empty sequence must appear. See below.

Note that this function signature is different from a signature in which the parameter is omitted. See, for example, the two signatures for fn:string. In the first signature, the parameter is omitted and the argument defaults to the context item, referred to as .. In the second signature, the argument must be present but may be the empty sequence, referred to as ().

Some functions accept a sequence of zero or more values as an argument. This is indicated by following the name of type of the items in the sequence with *. The sequence may contain zero or more items of the named type. For example, the function below accepts a sequence of xs:double and returns a xs:double or the empty sequence.

Type System

The diagrams below show how nodes, function items, primitive simple types, and user defined types fit together into a type system. This type system comprises two distinct hierarchies that both include the primitive simple types. In the diagrams, connecting lines represent relationships between derived types and the types from which they are derived; the arrowheads point toward the type from which they are derived. The dashed line represents relationships not present in this diagram, but that appear in one of the other diagrams. Dotted lines represent additional relationships that follow an evident pattern. The information that appears in each diagram is recapitulated in tabular form.

The xs:IDREFS, xs:NMTOKENS, and xs:ENTITIES types and the user-defined list and union types are special types in that these types are lists or unions rather than types derived by extension or restriction.

The first diagram and its corresponding table illustrate the item type hierarchy. In XDM, items include node types, function types, and built-in atomic types.

In the table, each type whose name is indented is derived from the type whose name appears nearest above it with one less level of indentation.

item
xs:anyAtomicType
node
attribute
user-defined attribute types
comment
document
user-defined document types
element
user-defined element types
processing-instruction
text

The next diagram and table illustrate the any type type hierarchy, in which all types are derived from distinguished type xs:anyType.

In the table, each type whose name is indented is derived from the type whose name appears nearest above it with one less level of indentation.

xs:anyType
user-defined complex types
xs:untyped
xs:anySimpleType
user-defined list and union types
xs:IDREFS
xs:NMTOKENS
xs:ENTITIES
xs:anyAtomicType

The final diagram and table show all of the atomic types, including the primitive simple types and the built-in types derived from the primitive simple types. This includes all the built-in datatypes defined in as well as the two totally ordered subtypes of duration defined in .

In the table, each type whose name is indented is derived from the type whose name appears nearest above it with one less level of indentation.

xs:untypedAtomic
xs:dateTime
xs:dateTimeStamp
xs:date
xs:time
xs:duration
xs:yearMonthDuration
xs:dayTimeDuration
xs:float
xs:double
xs:precisionDecimal
xs:decimal
xs:integer
xs:nonPositiveInteger
xs:negativeInteger
xs:long
xs:int
xs:short
xs:byte
xs:nonNegativeInteger
xs:unsignedLong
xs:unsignedInt
xs:unsignedShort
xs:unsignedByte
xs:positiveInteger
xs:gYearMonth
xs:gYear
xs:gMonthDay
xs:gDay
xs:gMonth
xs:string
xs:normalizedString
xs:token
xs:language
xs:NMTOKEN
xs:Name
xs:NCName
xs:ID
xs:IDREF
xs:ENTITY
xs:boolean
xs:base64Binary
xs:hexBinary
xs:anyURI
xs:QName
xs:NOTATION
Terminology

The terminology used to describe the functions and operators on is defined in the body of this specification. The terms defined in this section are used in building those definitions

Namespaces and URIs

This document uses the phrase "namespace URI" to identify the concept identified in as "namespace name", and the phrase "local name" to identify the concept identified in as "local part".

It also uses the term expanded-QName defined below.

An expanded-QName is a pair of values consisting of a namespace URI and a local name. They belong to the value space of the datatype xs:QName. When this document refers to xs:QName we always mean the value space, i.e. a namespace URI, local name pair (and not the lexical space referring to constructs of the form prefix:local-name).

The term URI is used as follows:

Within this specification, the term URI refers to Universal Resource Identifiers as defined in and extended in with a new name IRI. The term URI Reference, unless otherwise stated, refers to a string in the lexical space of the xs:anyURI datatype as defined in .

Note that this means, in practice, that where this specification requires a "URI Reference", an IRI as defined in will be accepted, provided that other relevant specifications also permit an IRI. The term URI has been retained in preference to IRI to avoid introducing new names for concepts such as "Base URI" that are defined or referenced across the whole family of XML specifications. Note also that the definition of xs:anyURI is a wider definition than the definition in ; for example it does not require non-ASCII characters to be escaped.

Conformance terminology

A feature of this specification included to ensure that implementations that use this feature remain compatible with

Conforming documents and processors are permitted to, but need not, behave as described.

Conforming documents and processors are required to behave as described; otherwise, they are either non-conformant or else in error.

Possibly differing between implementations, but specified and documented by the implementor for each particular implementation.

Possibly differing between implementations, but not specified by this or other W3C specification, and not required to be specified by the implementor for any particular implementation.

Properties of functions

This section is concerned with the question of whether two calls on a function, with the same arguments, may produce different results.

Two function calls are said to be within the same execution scope if the host environment defines them as such. In XSLT, any two calls executed during the same transformation are in the same execution scope. In XQuery, any two calls executed during the evaluation of a top-level expression are in the same execution scope. In other contexts, the execution scope is specified by the host environment that invokes the function library.

The following definition explains more precisely what it means for two function calls to return the same result:

Two values are defined to be identical if they contain the same number of items and the items are pairwise identical. Two items are identical if and only if one of the following conditions applies:

Both items are atomic values, of precisely the same type, and the values are equal as defined using the eq operator, using the Unicode codepoint collation when comparing strings

Both items are nodes, and represent the same node

Both items are function items, and have the same name (or absence of a name), arity, function signature, and closure

Some functions produce results that depend not only on their explicit arguments, but also on the static and dynamic context.

A function may have the property of being contextual: the result of such a function depends on the values of properties in the static and dynamic evaluation context as well as on the actual supplied arguments (if any).

Contextual functions fall into a number of categories:

The functions fn:current-date, fn:current-dateTime, fn:current-time, fn:implicit-timezone, fn:adjust-date-to-timezone, fn:adjust-dateTime-to-timezone, and fn:adjust-time-to-timezone depend on properties of the dynamic context that are fixed within the execution scope. The same applies to a number of functions in the op: namespace that manipulate dates and times and that make use of the implicit timezone. These functions will return the same result if called repeatedly during a single execution scope.

The functions fn:position, fn:last, fn:id, fn:idref, fn:element-with-id, fn:lang, fn:local-name, fn:name, fn:namespace-uri, fn:normalize-space, fn:number, fn:root, fn:string, and fn:string-length depend on the focus. These functions will in general return different results on different calls if the focus is different.

The function fn:default-collation and many string-handling operators and functions depend on the default collation and the in-scope collations, which are both properties of the static context. If a particular call of one of these functions is evaluated twice with the same arguments then it will return the same result each time (because the static context, by definition, does not change at run time). However, two distinct calls (that is, two calls on the function appearing in different places in the source code) may produce different results even if the explicit arguments are the same.

Functions such as fn:static-base-uri, fn:doc, and fn:collection depend on other aspects of the static context. As with functions that depend on collations, a single call will produce the same results on each call if the explicit arguments are the same, but two calls appearing in different places in the source code may produce different results.

For a contextual function, the parts of the context on which it depends are referred to as implicit arguments.

A function that is guaranteed to produce identical results from repeated calls if the explicit and implicit arguments are identical is referred to as stable.

All functions defined in this specification are stable unless otherwise stated. Exceptions include the following:

Some functions (such as fn:distinct-values and fn:unordered) produce results in an implementation-defined or implementation-dependentorder. In such cases there is no guarantee that the order of results from different calls will be the same. These functions are said to be ordering-unstable.

The function fn:analyze-string constructs an element node to represent its results. There is no guarantee that repeated calls with the same arguments will return the same identical node (in the sense of the is operator). Such a function is said to be identity-unstable.

Some functions (such as fn:doc and fn:collection) create new nodes by reading external documents. Such functions are guaranteed to be stable with the exception that an implementation is allowed to make them unstable as a user option.

Where the results of a function are described as being (to a greater or lesser extent) implementation-defined or implementation-dependent, this does not by itself remove the requirement that the results should be stable: that is, that repeated calls with the same explicit and implicit arguments must return identical results.

Accessors

Accessors and their semantics are described in . Some of these accessors are exposed to the user through the functions described below.

Function Accessor Accepts Returns
fn:node-name node-name an optional node zero or one xs:QName
fn:nilled nilled a node an optional xs:boolean
fn:string string-value an optional item or no argument xs:string
fn:data typed-value zero or more items a sequence of atomic values
fn:base-uri base-uri an optional node or no argument zero or one xs:anyURI
fn:document-uri document-uri an optional node zero or one xs:anyURI
Errors and Diagnostics Raising Errors

In this document, as well as in , , and , the phrase an error is raised is used. Raising an error is equivalent to calling the fn:error function defined in this section with the provided error code.

The above phrase is normally accompanied by specification of a specific error, to wit: an error is raised [error code]. Each error defined in this document is identified by an xs:QName that is in the http://www.w3.org/2005/xqt-errors namespace, represented in this document by the err prefix. It is this xs:QName that is actually passed as an argument to the fn:error function. Calling this function raises an error. For a more detailed treatment of error handing, see and .

The fn:error function is a general function that may be called as above but may also be called from or applications with, for example, an xs:QName argument.

Diagnostic Tracing
Functions and Operators on Numerics

This section specifies arithmetic operators on the numeric datatypes defined in . It uses an approach that permits lightweight implementation whenever possible.

Numeric Types

The operators described in this section are defined on the following numeric types. Each type whose name is indented is derived from the type whose name appears nearest above with one less level of indentation.

xs:decimal
xs:integer
xs:float
xs:double

They also apply to types derived by restriction from the above types.

This specification uses arithmetic for xs:float and xs:double values. This differs from which defines NaN as being equal to itself and defines only a single zero in the value space while arithmetic treats NaN as unequal to all other values including itself and can produce distinct results of positive zero and negative zero. (These are two different machine representations for the same value.) The text accompanying several functions defines behavior for both positive and negative zero inputs and outputs in the interest of alignment with .

XML Schema 1.1, however, introduces support for positive and negative zero as distinct values.

Arithmetic Operators on Numeric Values

The following functions define the semantics of arithmetic operators defined in and on these numeric types.

Operators Meaning
op:numeric-add Addition
op:numeric-subtract Subtraction
op:numeric-multiply Multiplication
op:numeric-divide Division
op:numeric-integer-divide Integer division
op:numeric-mod Modulus
op:numeric-unary-plus Unary plus
op:numeric-unary-minus Unary minus (negation)

The parameters and return types for the above operators are the basic numeric types: xs:integer, xs:decimal, xs:float and xs:double, and types derived from them. The word numeric in function signatures signifies these four types. For simplicity, each operator is defined to operate on operands of the same type and return the same type. The exceptions are op:numeric-divide, which returns an xs:decimal if called with two xs:integer operands and op:numeric-integer-divide which always returns an xs:integer.

If the two operands are not of the same type, subtype substitution and numeric type promotion are used to obtain two operands of the same type. and describe the semantics of these operations in detail.

The result type of operations depends on their argument datatypes and is defined in the following table:

Operator Returns
op:operation(xs:integer, xs:integer) xs:integer (except for op:numeric-divide(integer, integer), which returns xs:decimal)
op:operation(xs:decimal, xs:decimal) xs:decimal
op:operation(xs:float, xs:float) xs:float
op:operation(xs:double, xs:double) xs:double
op:operation(xs:integer) xs:integer
op:operation(xs:decimal) xs:decimal
op:operation(xs:float) xs:float
op:operation(xs:double) xs:double

These rules define any operation on any pair of arithmetic types. Consider the following example:

op:operation(xs:int, xs:double) => op:operation(xs:double, xs:double)

For this operation, xs:int must be converted to xs:double. This can be done, since by the rules above: xs:int can be substituted for xs:integer, xs:integer can be substituted for xs:decimal, xs:decimal can be promoted to xs:double. As far as possible, the promotions should be done in a single step. Specifically, when an xs:decimal is promoted to an xs:double, it should not be converted to an xs:float and then to xs:double, as this risks loss of precision.

As another example, a user may define height as a derived type of xs:integer with a minimum value of 20 and a maximum value of 100. He may then derive fenceHeight using an enumeration to restrict the permitted set of values to, say, 36, 48 and 60.

op:operation(fenceHeight, xs:integer) => op:operation(xs:integer, xs:integer)

fenceHeight can be substituted for its base type height and height can be substituted for its base type xs:integer.

The basic rules for addition, subtraction, and multiplication of ordinary numbers are not set out in this specification; they are taken as given. In the case of xs:double and xs:float the rules are as defined in . The rules for handling division and modulus operations, as well as the rules for handling special values such as infinity and NaN, and exception conditions such as overflow and underflow, are described more explicitly since they are not necessarily obvious.

On overflow and underflow situations during arithmetic operations conforming implementations behave as follows:

For xs:float and xs:double operations, overflow behavior be conformant with . This specification allows the following options:

Raising an error via an overflow trap.

Returning INF or -INF.

Returning the largest (positive or negative) non-infinite number.

For xs:float and xs:double operations, underflow behavior be conformant with . This specification allows the following options:

Raising an error via an underflow trap.

Returning 0.0E0 or +/- 2**Emin or a denormalized value; where Emin is the smallest possible xs:float or xs:double exponent.

For xs:decimal operations, overflow behavior raise an error . On underflow, 0.0 must be returned.

For xs:integer operations, implementations that support limited-precision integer operations select from the following options:

They choose to always raise an error .

They provide an mechanism that allows users to choose between raising an error and returning a result that is modulo the largest representable integer value. See .

The functions op:numeric-add, op:numeric-subtract, op:numeric-multiply, op:numeric-divide, op:numeric-integer-divide and op:numeric-mod are each defined for pairs of numeric operands, each of which has the same type:xs:integer, xs:decimal, xs:float, or xs:double. The functions op:numeric-unary-plus and op:numeric-unary-minus are defined for a single operand whose type is one of those same numeric types.

For xs:float and xs:double arguments, if either argument is NaN, the result is NaN.

For xs:decimal values the number of digits of precision returned by the numeric operators is . If the number of digits in the result exceeds the number of digits that the implementation supports, the result is truncated or rounded in an manner.

Comparison Operators on Numeric Values

This specification defines the following comparison operators on numeric values. Comparisons take two arguments of the same type. If the arguments are of different types, one argument is promoted to the type of the other as described above in . Each comparison operator returns a boolean value. If either, or both, operands are NaN, false is returned.

Functions on Numeric Values

The following functions are defined on numeric types. Each function returns a value of the same type as the type of its argument.

If the argument is the empty sequence, the empty sequence is returned.

For xs:float and xs:double arguments, if the argument is "NaN", "NaN" is returned.

Except for fn:abs, for xs:float and xs:double arguments, if the argument is positive or negative infinity, positive or negative infinity is returned.

fn:round and fn:round-half-to-even produce the same result in all cases except when the argument is exactly midway between two values with the required precision.

Other ways of rounding midway values can be achieved as follows:

Towards negative infinity: -fn:round(-$x)

Away from zero: fn:round(fn:abs($x))*fn:compare($x,0)

Towards zero: fn:abs(fn:round(-$x))*-fn:compare($x,0)

Formatting Integers Formatting Numbers This section has been created by the editor in response to a WG decision in principle; the detailed text needs to be reviewed and approved.

This section defines a function for formatting decimal and floating point numbers.

This function can be used to format any numeric quantity, including an integer. For integers, however, the fn:format-integer function offers additional possibilities. Note also that the picture strings used by the two functions are not 100% compatible, though they share some options in common.

Defining a Decimal Format

Decimal formats are defined in the static context, and the way they are defined is therefore outside the scope of this specification. XSLT and XQuery both provide custom syntax for creating a decimal format.

The static context provides a set of decimal formats. One of the decimal formats is unnamed, the others (if any) are identified by a QName. There is always an unnamed decimal format available, but its contents are implementation-defined.

Each decimal format provides a set of named variables, described in the following table:

NameTypeUsage (non-normative)
decimal-separator-sign A single character Defines the character used to represent the decimal point (typically ".") both in the picture string and in the formatted number
grouping-separator-sign A single character Defines the character used to separate groups of digits (typically ",") both in the picture string and in the formatted number
infinity A string Defines the string used to represent the value positive or negative infinity in the formatted number (typically "Infinity")
valign="top"minus-sign A single character Defines the character used as a minus sign in the formatted number if there is no subpicture for formatting negative numbers (typically "-", x2D)
NaN valign="top"A string Defines the string used to represent the value NaN in the formatted number
percent-sign A single character Defines the character used as a percent sign (typically "%") both in the picture string and in the formatted number
per-mille-sign Defines the character used as a per-mille sign (typically "‰", x2030) both in the picture string and in the formatted number
mandatory-digit-sign A single character, which must be defined in Unicode as a digit with the value zero Defines the character (typically "0") used in the picture string to represent a mandatory digit, and in the formatted number to represent the digit zero; by implication, this also defines the characters used to represent the digits one to nine.
optional-digit-sign A single character Defines the character used in the picture string to represent an optional digit (typically "#")
pattern-separator-sign valign="top"A single character Defines the character used in the picture string to separate the positive and negative subpictures (typically ";")

The decimal digit family of a decimal format is the sequence of ten digits with consecutive Unicode codepoints starting with the mandatory-digit-sign.

It is a constraint that, for any named or unnamed decimal format, the variables representing characters used in a picture string must have distinct values. These variables are decimal-separator-sign, grouping-separator-sign, percent-sign, per-mille-sign, optional-digit-sign, and pattern-separator-sign. Furthermore, none of these variables may be equal to any character in the decimal digit family.

Syntax of the Picture String

This differs from the format-number function previously defined in XSLT 2.0 in that any digit can be used in the picture string to represent a mandatory digit: for example the picture strings '000', '001', and '999' are equivalent. This is to align format-number (which previously used '000') with format-dateTime (which used '001').

The formatting of a number is controlled by a picture string. The picture string is a sequence of characters, in which the characters assigned to the variables decimal-separator-sign, grouping-sign, decimal-digit-family, optional-digit-sign and pattern-separator-sign are classified as active characters, and all other characters (including the percent-sign and per-mille-sign) are classified as passive characters.

The integer part of the sub-picture is defined as the part that appears to the left of the decimal-separator-sign if there is one, or the entire sub-picture otherwise. The fractional part of the sub-picture is defined as the part that appears to the right of the decimal-separator-sign if there is one; it is a zero-length string otherwise.

An error is raised if the picture string does not conform to the following rules. Note that in these rules the words "preceded" and "followed" refer to characters anywhere in the string, they are not to be read as "immediately preceded" and "immediately followed".

A picture-string consists either of a sub-picture, or of two sub-pictures separated by a pattern-separator-sign. A picture-string must not contain more than one pattern-separator-sign. If the picture-string contains two sub-pictures, the first is used for positive values and the second for negative values.

A sub-picture must not contain more than one decimal-separator-sign.

A sub-picture must not contain more than one percent-sign or per-mille-sign, and it must not contain one of each.

A sub-picture must contain at least one character that is an optional-digit-sign or a member of the decimal-digit-family.

A sub-picture must not contain a passive character that is preceded by an active character and that is followed by another active character.

A sub-picture must not contain a grouping-separator-sign adjacent to a decimal-separator-sign.

The integer part of a sub-picture must not contain a member of the decimal-digit-family that is followed by an optional-digit-sign. The fractional part of a sub-picture must not contain an optional-digit-sign that is followed by a member of the decimal-digit-family.

Analysing the Picture String

This phase of the algorithm analyses the picture string and the variables from the selected decimal format in the static context, and it has the effect of setting the values of various variables, which are used in the subsequent formatting phase. These variables are listed below. Each is shown with its initial setting and its data type.

Several variables are associated with each sub-picture. If there are two sub-pictures, then these rules are applied to one sub-picture to obtain the values that apply to positive numbers, and to the other to obtain the values that apply to negative numbers. If there is only one sub-picture, then the values for both cases are derived from this sub-picture.

The variables are as follows:

The integer-part-grouping-positions is a sequence of integers representing the positions of grouping separators within the integer part of the sub-picture. For each grouping-separator-sign that appears within the integer part of the sub-picture, this sequence contains an integer that is equal to the total number of optional-digit-sign and decimal-digit-family characters that appear within the integer part of the sub-picture and to the right of the grouping-separator-sign. In addition, if these integer-part-grouping-positions are at regular intervals (that is, if they form a sequence N, 2N, 3N, ... for some integer value N, including the case where there is only one number in the list), then the sequence contains all integer multiples of N as far as necessary to accommodate the largest possible number.

The minimum-integer-part-size is an integer indicating the minimum number of digits that will appear to the left of the decimal-separator-sign. It is normally set to the number of decimal-digit-family characters found in the integer part of the sub-picture. But if the sub-picture contains no decimal-digit-family character and no decimal-separator-sign, it is set to one.

There is no maximum integer part size. All significant digits in the integer part of the number will be displayed, even if this exceeds the number of optional-digit-sign and decimal-digit-family characters in the subpicture.

The prefix is set to contain all passive characters in the sub-picture to the left of the leftmost active character. If the picture string contains only one sub-picture, the prefix for the negative sub-picture is set by concatenating the minus-sign character and the prefix for the positive sub-picture (if any), in that order.

The fractional-part-grouping-positions is a sequence of integers representing the positions of grouping separators within the fractional part of the sub-picture. For each grouping-separator-sign that appears within the fractional part of the sub-picture, this sequence contains an integer that is equal to the total number of optional-digit-sign and decimal-digit-family characters that appear within the fractional part of the sub-picture and to the left of the grouping-separator-sign.

The minimum-fractional-part-size is set to the number of decimal-digit-family characters found in the fractional part of the sub-picture.

The maximum-fractional-part-size is set to the total number of optional-digit-sign and decimal-digit-family characters found in the fractional part of the sub-picture.

The suffix is set to contain all passive characters to the right of the rightmost active character in the fractional part of the sub-picture.

If there is only one sub-picture, then all variables for positive numbers and negative numbers will be the same, except for prefix: the prefix for negative numbers will be preceded by the minus-sign character.

Formatting the Number

This section describes the second phase of processing of the format-number function. This phase takes as input a number to be formatted (referred to as the input number), and the variables set up by analysing the decimal format in the static context and the picture string, as described above. The result of this phase is a string, which forms the return value of the format-number function.

The algorithm for this second stage of processing is as follows:

If the input number is NaN (not a number), the result is the specified NaN-symbol (with no prefix or suffix).

In the rules below, the positive sub-picture and its associated variables are used if the input number is positive, and the negative sub-picture and its associated variables are used otherwise. Negative zero is taken as negative, positive zero as positive.

If the input number is positive or negative infinity, the result is the concatenation of the appropriate prefix, the infinity-symbol, and the appropriate suffix.

If the sub-picture contains a percent-sign, the number is multiplied by 100. If the sub-picture contains a per-mille-sign, the number is multiplied by 1000. The resulting number is referred to below as the adjusted number.

The adjusted number is converted (if necessary) to an xs:decimal value, using an implementation of xs:decimal that imposes no limits on the totalDigits or fractionDigits facets. If there are several such values that are numerically equal to the adjusted number (bearing in mind that if the adjusted number is an xs:double or xs:float, the comparison will be done by converting the decimal value back to an xs:double or xs:float), the one that is chosen should be one with the smallest possible number of digits not counting leading or trailing zeroes (whether significant or insignificant). For example, 1.0 is preferred to 0.9999999999, and 100000000 is preferred to 100000001. This value is then rounded so that it uses no more than maximum-fractional-part-size digits in its fractional part. The rounded number is defined to be the result of converting the adjusted number to an xs:decimal value, as described above, and then calling the function fn:round-half-to-even with this converted number as the first argument and the maximum-fractional-part-size as the second argument, again with no limits on the totalDigits or fractionDigits in the result.

The absolute value of the rounded number is converted to a string in decimal notation, with no insignificant leading or trailing zeroes, using the digits in the decimal-digit-family to represent the ten decimal digits, and the decimal-separator-sign to separate the integer part and the fractional part. (The value zero will at this stage be represented by a decimal-separator-sign on its own.)

If the number of digits to the left of the decimal-separator-sign is less than minimum-integer-part-size, leading zero-digit-sign characters are added to pad out to that size.

If the number of digits to the right of the decimal-separator-sign is less than minimum-fractional-part-size, trailing zero-digit-sign characters are added to pad out to that size.

For each integer N in the integer-part-grouping-positions list, a grouping-separator-sign character is inserted into the string immediately after that digit that appears in the integer part of the number and has N digits between it and the decimal-separator-sign, if there is such a digit.

For each integer N in the fractional-part-grouping-positions list, a grouping-separator-sign character is inserted into the string immediately before that digit that appears in the fractional part of the number and has N digits between it and the decimal-separator-sign, if there is such a digit.

If there is no decimal-separator-sign in the sub-picture, or if there are no digits to the right of the decimal-separator-sign character in the string, then the decimal-separator-sign character is removed from the string (it will be the rightmost character in the string).

The result of the function is the concatenation of the appropriate prefix, the string conversion of the number as obtained above, and the appropriate suffix.

Trigonometrical Functions

The functions in this section perform trigonometrical calculations on xs:double values. They are designed for use in applications performing geometrical computation, for example when generating SVG graphics.

Functions are provided to support the six most commonly used trigonometric calculations: sine, cosine and tangent, and their inverses arc sine, arc cosine, and arc tangent. Other functions such as secant, cosecant, and cotangent are not provided because they are easily computed in terms of these six.

In this section, when the rules for a function say that the returned value must be the xs:double either side of some mathematical quantity, then if the mathematical quantity is precisely representable in the value space of xs:double the exact result must be returned; otherwise it is acceptable to return either the nearest higher xs:double or the nearest lower xs:double, and it is implementation-dependent which of the two is returned.

Functions on Strings

This section specifies functions and operators on the xs:string datatype and the datatypes derived from it.

String Types

The operators described in this section are defined on the following types. Each type whose name is indented is derived from the type whose name appears nearest above with one less level of indentation.

xs:string
xs:normalizedString
xs:token
xs:language
xs:NMTOKEN
xs:Name
xs:NCName
xs:ID
xs:IDREF
xs:ENTITY

They also apply to user-defined types derived by restriction from the above types.

It is which version of is supported, but it is recommended that the most recent version of Unicode be used.

Unless explicitly stated, the xs:string values returned by the functions in this document are not normalized in the sense of .

This document uses the term "codepoint" to mean the non-negative integer assigned to a character by the Unicode consortium. Equivalent terms found in other specifications are "code point", "character number", or "code position". See . The use of the word "character" in this document is in the sense of production [2] of . , defines codepoints that range from #x0000 to #x10FFFF inclusive and may include codepoints that have not yet been assigned to characters.

In functions that involve character counting such as fn:substring, fn:string-length and fn:translate, what is counted is the number of XML characters in the string (or equivalently, the number of Unicode codepoints). Some implementations may represent a codepoint above xFFFF using two 16-bit values known as a surrogate pair. A surrogate pair counts as one character, not two.

Functions to Assemble and Disassemble Strings Equality and Comparison of Strings Collations

A collation is a specification of the manner in which character strings are compared and, by extension, ordered. When values whose type is xs:string or a type derived from xs:string are compared (or, equivalently, sorted), the comparisons are inherently performed according to some collation (even if that collation is defined entirely on codepoint values). The observes that some applications may require different comparison and ordering behaviors than other applications. Similarly, some users having particular linguistic expectations may require different behaviors than other users. Consequently, the collation must be taken into account when comparing strings in any context. Several functions in this and the following section make use of a collation.

Collations can indicate that two different codepoints are, in fact, equal for comparison purposes (e.g., "v" and "w" are considered equivalent in some Swedish collations). Strings can be compared codepoint-by-codepoint or in a linguistically appropriate manner, as defined by the collation.

Some collations, especially those based on the can be "tailored" for various purposes. This document does not discuss such tailoring, nor does it provide a mechanism to perform tailoring. Instead, it assumes that the collation argument to the various functions below is a tailored and named collation.

The Unicode codepoint collation is a collation available in every implementation, which sorts based on codepoint values. For further details see

In the ideal case, a collation should treat two strings as equal if the two strings are identical after Unicode normalization. Thus, the recommends that all strings be subjected to early Unicode normalization and some collations will raise runtime errors if they encounter strings that are not properly normalized. However, it is not possible to guarantee that all strings in all XML documents are, in fact, normalized, or that they are normalized in the same manner. In order to maximize interoperability of operations on XML documents in general, there may be collations that operate on unnormalized strings and other collations that implicitly normalize strings before comparing them. Applications may choose the kind of collation best suited for their needs. Note that collations based on the Unicode collation algorithm implicitly normalize strings before comparison and produce equivalent results regardless of a string's normalization.

This specification assumes that collations are named and that the collation name may be provided as an argument to string functions. Functions that allow specification of a collation do so with an argument whose type is xs:string but whose lexical form must conform to an xs:anyURI. If the collation is specified using a relative URI, it is assumed to be relative to the value of the base-uri property in the static context. This specification also defines the manner in which a default collation is determined if the collation argument is not specified in calls of functions that use a collation but allow it to be omitted.

This specification does not define whether or not the collation URI is dereferenced. The collation URI may be an abstract identifier, or it may refer to an actual resource describing the collation. If it refers to a resource, this specification does not define the nature of that resource. One possible candidate is that the resource is a locale description expressed using the Locale Data Markup Language: see .

Functions such as fn:compare and fn:max that compare xs:string values use a single collation URI to identify all aspects of the collation rules. This means that any parameters such as the strength of the collation must be specified as part of the collation URI. For example, suppose there is a collation http://www.example.com/collations/French that refers to a French collation that compares on the basis of base characters. Collations that use the same basic rules, but with higher strengths, for example, base characters and accents, or base characters, accents and case, would need to be given different names, say http://www.example.com/collations/French1 and http://www.example.com/collations/French2 . Note that some specifications use the term collation to refer to an algorithm that can be parameterized, but in this specification, each possible parameterization is considered to be a distinct collation.

The XQuery/XPath static context includes a provision for a default collation that can be used for string comparisons and ordering operations. See the description of the static context in . If the default collation is not specified by the user or the system, the default collation is the Unicode codepoint collation.

XML allows elements to specify the xml:lang attribute to indicate the language associated with the content of such an element. This specification does not use xml:lang to identify the default collation because using xml:lang does not produce desired effects when the two strings to be compared have different xml:lang values or when a string is multilingual.

The Unicode Codepoint Collation

The collation URI http://www.w3.org/2005/xpath-functions/collation/codepoint identifies a collation which must be recognized by every implementation: it is referred to as the Unicode codepoint collation (not to be confused with the Unicode collation algorithm).

The Unicode codepoint collation does not perform any normalization on the supplied strings.

The collation is defined as follows. Each of the two strings is converted to a sequence of integers using the fn:string-to-codepoints function. These two sequences $A and $B are then compared as follows:

If both sequences are empty, the strings are equal

If one sequence is empty and the other is not, then the string corresponding to the empty sequence is less than the other string.

If the first integer in $A is less than the first integer in $B, then the string corresponding to $A is less than the string corresponding to $B.

If the first integer in $A is greater than the first integer in $B, then the string corresponding to $A is greater than the string corresponding to $B.

Otherwise (the first pair of integers are equal), the result is obtained by applying the same rules recursively to fn:subsequence($A, 2) and fn:subsequence($B, 2)

While the Unicode codepoint collation does not produce results suitable for quality publishing of printed indexes or directories, it is adequate for many purposes where a restricted alphabet is used, such as sorting of vehicle registrations.

Choosing a Collation

Many functions have two signatures, where one signature includes a $collation argument and the other omits this argument.

The collation to use for these functions is determined by the following rules:

If the function specifies an explicit collation, CollationA (e.g., if the optional collation argument is specified in a call of the fn:compare function), then:

If CollationA is supported by the implementation, then CollationA is used.

Otherwise, an error is raised .

If no collation is explicitly specified for the function and the default collation in the XQuery/XPath static context is CollationB, then:

If CollationB is supported by the implementation, then CollationB is used.

Otherwise, an error is raised .

Because the set of collations that are supported is implementation-defined, an implementation has the option to support all collation URIs, in which case it will never raise this error.

Functions on String Values

The following functions are defined on values of type xs:string and types derived from it.

When the above operators and functions are applied to datatypes derived from xs:string, they are guaranteed to return legal xs:strings, but they might not return a legal value for the particular subtype to which they were applied.

The strings returned by fn:concat and fn:string-join are not guaranteed to be normalized. But see note in fn:concat.

Functions Based on Substring Matching

The functions described in the section examine a string $arg1 to see whether it contains another string $arg2 as a substring. The result depends on whether $arg2 is a substring of $arg1, and if so, on the range of characters in $arg1 which $arg2 matches.

When the Unicode codepoint collation is used, this simply involves determining whether $arg1 contains a contiguous sequence of characters whose codepoints are the same, one for one, with the codepoints of the characters in $arg2.

When a collation is specified, the rules are more complex.

All collations support the capability of deciding whether two strings are considered equal, and if not, which of the strings should be regarded as preceding the other. For functions such as fn:compare, this is all that is required. For other functions, such as fn:contains, the collation needs to support an additional property: it must be able to decompose the string into a sequence of collation units, each unit consisting of one or more characters, such that two strings can be compared by pairwise comparison of these units. ("collation unit" is equivalent to "collation element" as defined in .) The string $arg1 is then considered to contain $arg2 as a substring if the sequence of collation units corresponding to $arg2 is a subsequence of the sequence of the collation units corresponding to $arg1. The characters in $arg1 that match are the characters corresponding to these collation units.

This rule may occasionally lead to surprises. For example, consider a collation that treats "Jaeger" and "Jäger" as equal. It might do this by treating "ä" as representing two collation units, in which case the expression fn:contains("Jäger", "eg") will return true. Alternatively, a collation might treat "ae" as a single collation unit, in which case the expression fn:contains("Jaeger", "eg") will return false. The results of these functions thus depend strongly on the properties of the collation that is used.

In addition, collations may specify that some collation units should be ignored during matching. If hyphen is an ignored collation unit, then fn:contains("code-point", "codepoint") will be true, and fn:contains("codepoint", "-") will also be true.

In the definitions below, we refer to the terms match and minimal match as defined in definitions DS2 and DS4 of . In applying these definitions:

C is the collation; that is, the value of the $collation argument if specified, otherwise the default collation.

P is the (candidate) substring $arg2

Q is the (candidate) containing string $arg1

The boundary condition B is satisfied at the start and end of a string, and between any two characters that belong to different collation units (collation elements in the language of ). It is not satisfied between two characters that belong to the same collation unit.

It is possible to define collations that do not have the ability to decompose a string into units suitable for substring matching. An argument to a function defined in this section may be a URI that identifies a collation that is able to compare two strings, but that does not have the capability to split the string into collation units. Such a collation may cause the function to fail, or to give unexpected results or it may be rejected as an unsuitable argument. The ability to decompose strings into collation units is an property of the collation.

String Functions that use Regular Expressions

The three functions described in this section make use of a regular expression syntax for pattern matching. This is described below.

Regular Expression Syntax

The regular expression syntax used by these functions is defined in terms of the regular expression syntax specified in XML Schema (see ), which in turn is based on the established conventions of languages such as Perl. However, because XML Schema uses regular expressions only for validity checking, it omits some facilities that are widely-used with languages such as Perl. This section, therefore, describes extensions to the XML Schema regular expressions syntax that reinstate these capabilities.

It is recommended that implementers consult for information on using regular expression processing on Unicode characters.

The regular expression syntax and semantics are identical to those defined in with the following additions:

Two meta-characters, ^ and $ are added. By default, the meta-character ^ matches the start of the entire string, while $ matches the end of the entire string. In multi-line mode, ^ matches the start of any line (that is, the start of the entire string, and the position immediately after a newline character), while $ matches the end of any line (that is, the end of the entire string, and the position immediately before a newline character). Newline here means the character #x0A only.

This means that the production in :

[10] Char ::= [^.\?*+()|#x5B#x5D]

is modified to read:

[10] Char ::= [^.\?*+{}()|^$#x5B#x5D]

The characters #x5B and #x5D correspond to "[" and "]" respectively.

The definition of Char (production [10]) in has a known error in which it omits the left brace ("{") and right brace ("}"). That error is corrected here.

The following production:

[11] charClass ::= charClassEsc | charClassExpr | WildCardEsc

is modified to read:

[11] charClass ::= charClassEsc | charClassExpr | WildCardEsc | "^" | "$"

Reluctant quantifiers are supported. They are indicated by a ? following a quantifier. Specifically:

X?? matches X, once or not at all

X*? matches X, zero or more times

X+? matches X, one or more times

X{n}? matches X, exactly n times

X{n,}? matches X, at least n times

X{n,m}? matches X, at least n times, but not more than m times

The effect of these quantifiers is that the regular expression matches the shortest possible substring consistent with the match as a whole succeeding. Without the ? , the regular expression matches the longest possible substring.

To achieve this, the production in :

[4] quantifier ::= [?*+] | ( '{' quantity '}' )

is changed to:

[4] quantifier ::= ( [?*+] | ( '{' quantity '}' ) ) '?'?

Reluctant quantifiers have no effect on the results of the boolean fn:matches function, since this function is only interested in discovering whether a match exists, and not where it exists.

Sub-expressions (groups) within the regular expression are recognized. The regular expression syntax defined by allows a regular expression to contain parenthesized sub-expressions, but attaches no special significance to them. The fn:replace function described below allows access to the parts of the input string that matched a sub-expression (called captured substrings). The sub-expressions are numbered according to the position of the opening parenthesis in left-to-right order within the top-level regular expression: the first opening parenthesis identifies captured substring 1, the second identifies captured substring 2, and so on. 0 identifies the substring captured by the entire regular expression. If a sub-expression matches more than one substring (because it is within a construct that allows repetition), then only the last substring that it matched will be captured.

Non-capturing groups are also recognized. These are indicated by the syntax (?:xxxx). Specifically, the production rule for atom in is changed from:

[9] atom ::= Char | charClass | ( '(' regExp ')' )

to:

[9] atom ::= Char | charClass | ( '(' '?:'? regExp ')' )

The presence of the optional ?: has no effect on the set of strings that match the regular expression, but causes the left parenthesis not to be counted by operations that number the groups within a regular expression, for example the fn:replace function.

Back-references are allowed outside a character class expression. A back-reference is an additional kind of atom. The construct \N where N is a single digit is always recognized as a back-reference; if this is followed by further digits, these digits are taken to be part of the back-reference if and only if the resulting number NN is such that the back-reference is preceded by NN or more unescaped opening parentheses. The regular expression is invalid if a back-reference refers to a subexpression that does not exist or whose closing right parenthesis occurs after the back-reference.

A back-reference matches the string that was matched by the Nth capturing subexpression within the regular expression, that is, the parenthesized subexpression whose opening left parenthesis is the Nth unescaped left parenthesis within the regular expression. For example, the regular expression ('|").*\1 matches a sequence of characters delimited either by an apostrophe at the start and end, or by a quotation mark at the start and end.

If no string is matched by the Nth capturing subexpression, the back-reference is interpreted as matching a zero-length string.

Back-references change the following production:

[9] atom ::= Char | charClass | ( '(' regExp ')' )

to

[9] atom ::= Char | charClass | ( '(' regExp ')' ) | backReference

[9a] backReference ::= "\" [1-9][0-9]*

Within a character class expression, \ followed by a digit is invalid. Some other regular expression languages interpret this as an octal character reference.

Single character escapes are extended to allow the $ character to be escaped. The following production is changed:

[24]SingleCharEsc ::= '\' [nrt\|.?*+(){}#x2D#x5B#x5D#x5E]

to

[24]SingleCharEsc ::= '\' [nrt\|.?*+(){}$#x2D#x5B#x5D#x5E]

Flags

All these functions provide an optional parameter, $flags, to set options for the interpretation of the regular expression. The parameter accepts a xs:string, in which individual letters are used to set options. The presence of a letter within the string indicates that the option is on; its absence indicates that the option is off. Letters may appear in any order and may be repeated. If there are characters present that are not defined here as flags, then an error is raised .

The following options are defined:

s: If present, the match operates in "dot-all" mode. (Perl calls this the single-line mode.) If the s flag is not specified, the meta-character . matches any character except a newline (#x0A) character. In dot-all mode, the meta-character . matches any character whatsoever. Suppose the input contains "hello" and "world" on two lines. This will not be matched by the regular expression "hello.*world" unless dot-all mode is enabled.

m: If present, the match operates in multi-line mode. By default, the meta-character ^ matches the start of the entire string, while $ matches the end of the entire string. In multi-line mode, ^ matches the start of any line (that is, the start of the entire string, and the position immediately after a newline character other than a newline that appears as the last character in the string), while $ matches the end of any line (that is, the position immediately before a newline character, and the end of the entire string if there is no newline character at the end of the string). Newline here means the character #x0A only.

i: If present, the match operates in case-insensitive mode. The detailed rules are as follows. In these rules, a character C2 is considered to be a case-variant of another character C1 if the following XPath expression returns true when the two characters are considered as strings of length one, and the Unicode codepoint collation is used:

fn:lower-case(C1) eq fn:lower-case(C2)

or

fn:upper-case(C1) eq fn:upper-case(C2)

Note that the case-variants of a character under this definition are always single characters.

When a normal character (Char) is used as an atom, it represents the set containing that character and all its case-variants. For example, the regular expression "z" will match both "z" and "Z".

A character range (charRange) represents the set containing all the characters that it would match in the absence of the "i" flag, together with their case-variants. For example, the regular expression "[A-Z]" will match all the letters A-Z and all the letters a-z. It will also match certain other characters such as #x212A (KELVIN SIGN), since fn:lower-case("#x212A") is "k".

This rule applies also to a character range used in a character class subtraction (charClassSub): thus [A-Z-[IO]] will match characters such as "A", "B", "a", and "b", but will not match "I", "O", "i", or "o".

The rule also applies to a character range used as part of a negative character group: thus [^Q] will match every character except "Q" and "q" (these being the only case-variants of "Q" in Unicode).

A back-reference is compared using case-blind comparison: that is, each character must either be the same as the corresponding character of the previously matched string, or must be a case-variant of that character. For example, the strings "Mum", "mom", "Dad", and "DUD" all match the regular expression "([md])[aeiou]\1" when the "i" flag is used.

All other constructs are unaffected by the "i" flag. For example, "\p{Lu}" continues to match upper-case letters only.

x: If present, whitespace characters (#x9, #xA, #xD and #x20) in the regular expression are removed prior to matching with one exception: whitespace characters within character class expressions (charClassExpr) are not removed. This flag can be used, for example, to break up long regular expressions into readable lines.

Examples:

fn:matches("helloworld", "hello world", "x") returns true()

fn:matches("helloworld", "hello[ ]world", "x") returns false()

fn:matches("hello world", "hello\ sworld", "x") returns true()

fn:matches("hello world", "hello world", "x") returns false()

q: if present, all characters in the regular expression are treated as representing themselves, not as metacharacters. In effect, every character that would normally have a special meaning in a regular expression is implicitly escaped by preceding it with a backslash.

Furthermore, when this flag is present, the characters $ and \ have no special significance when used in the replacement string supplied to the fn:replace function.

This flag can be used in conjunction with the i flag. If it is used together with the m, s, or x flag, that flag has no effect.

Examples:

fn:tokenize("12.3.5.6", ".", "q") returns ("12", "3", "5", "6")

fn:replace("a\b\c", "\", "\\", "q") returns "a\\b\\c"

fn:replace("a/b/c", "/", "$", "q") returns "a$b$c"

fn:matches("abcd", ".*", "q") returns false()

fn:matches("Mr. B. Obama", "B. OBAMA", "iq") returns true()

Functions that manipulate URIs

This section specifies functions that manipulate URI values, either as instances of xs:anyURI or as strings.

Functions and Operators on Boolean Values

This section defines functions and operators on the xs:boolean datatype.

Boolean Constant Functions

Since no literals are defined in XPath to reference the constant boolean values true and false, two functions are provided for the purpose.

Operators on Boolean Values

The following functions define the semantics of operators on boolean values in and :

The ordering operators op:boolean-less-than and op:boolean-greater-than are provided for application purposes and for compatibility with . The datatype xs:boolean is not ordered.

Functions on Boolean Values

The following functions are defined on boolean values:

Functions and Operators on Durations

Operators are defined on the following type:

xs:duration

and on the two defined subtypes (see ):

xs:yearMonthDuration

xs:dayTimeDuration

No ordering relation is defined on xs:duration values. Two xs:duration values may however be compared for equality.

Operations on durations (including equality comparison, casting to string, and extraction of components) all treat the duration as normalized. This means that the seconds and minutes components will always be less than 60, the hours component less than 24, and the months component less than 12. Thus, for example, a duration of 120 seconds always gives the same result as a duration of two minutes.

This means that in practice, the information content of an xs:duration value can be reduced to an xs:integer number of months, and an xs:decimal number of seconds. For the two defined subtypes this is further simplified so that one of these two components is fixed at zero. Operations such as comparison of durations and arithmetic on durations can be expressed in terms of numeric operations applied to these two components.

Limits and Precision This section needs revision - it comes from the old text describing both duration and date/time operations, but it's not clear exactly what it should say for durations.

A processor that limits the number of digits in date and time datatype representations may encounter overflow and underflow conditions when it tries to execute the functions in . In these situations, the processor return P0M or PT0S in case of duration underflow and 00:00:00 in case of time underflow. It raise an error in case of overflow.

The value spaces of the two totally ordered subtypes of xs:duration described in are xs:integer months for xs:yearMonthDuration and xs:decimal seconds for xs:dayTimeDuration. If a processor limits the number of digits allowed in the representation of xs:integer and xs:decimal then overflow and underflow situations can arise when it tries to execute the functions in . In these situations the processor return zero in case of numeric underflow and P0M or PT0S in case of duration underflow. It raise an error in case of overflow.

Two Totally Ordered Subtypes of Duration

Two totally ordered subtypes of xs:duration are defined in specification using the mechanisms described in for defining user-defined types. Additional details about these types is given below.

These types were not defined in XSD 1.0, but they are defined in the current draft of XSD 1.1. The description given here is believed to be equivalent to that in XSD 1.1, and will become non-normative when XSD 1.1 reaches Recommendation status.

xs:yearMonthDuration

[Definition] xs:yearMonthDuration is derived from xs:duration by restricting its lexical representation to contain only the year and month components. The value space of xs:yearMonthDuration is the set of xs:integer month values. The year and month components of xs:yearMonthDuration correspond to the Gregorian year and month components defined in section 5.5.3.2 of , respectively.

Lexical representation

The lexical representation for xs:yearMonthDuration is the reduced format PnYnM, where nY represents the number of years and nM the number of months. The values of the years and months components are not restricted but allow an arbitrary unsigned xs:integer.

An optional preceding minus sign ('-') is allowed to indicate a negative duration. If the sign is omitted a positive duration is indicated. To indicate a xs:yearMonthDuration of 1 year, 2 months, one would write: P1Y2M. One could also indicate a xs:yearMonthDuration of minus 13 months as: -P13M.

Reduced precision and truncated representations of this format are allowed provided they conform to the following:

If the number of years or months in any expression equals zero (0), the number and its corresponding designator be omitted. However, at least one number and its designator be present. For example, P1347Y and P1347M are allowed; P-1347M is not allowed, although -P1347M is allowed. P1Y2MT is not allowed. Also, P24YM is not allowed, nor is PY43M since Y must have at least one preceding digit and M must have one preceding digit.

Calculating the value from the lexical representation

The value of a xs:yearMonthDuration lexical form is obtained by multiplying the value of the years component by 12 and adding the value of the months component. The value is positive or negative depending on the preceding sign.

Canonical representation

The canonical representation of xs:yearMonthDuration restricts the value of the months component to xs:integer values between 0 and 11, both inclusive. To convert from a non-canonical representation to the canonical representation, the lexical representation is first converted to a value in xs:integer months as defined above. This value is then divided by 12 to obtain the value of the years component of the canonical representation. The remaining number of months is the value of the months component of the canonical representation. For negative durations, the canonical form is calculated using the absolute value of the duration and a negative sign is prepended to it. If a component has the value zero (0), then the number and the designator for that component be omitted. However, if the value is zero (0) months, the canonical form is "P0M".

Order relation on xs:yearMonthDuration

Let the function that calculates the value of an xs:yearMonthDuration in the manner described above be called V(d). Then for two xs:yearMonthDuration values x and y, x > y if and only if V(x) > V(y). The order relation on yearMonthDuration is a total order.

xs:dayTimeDuration

[Definition] xs:dayTimeDuration is derived from xs:duration by restricting its lexical representation to contain only the days, hours, minutes and seconds components. The value space of xs:dayTimeDuration is the set of fractional second values. The components of xs:dayTimeDuration correspond to the day, hour, minute and second components defined in Section 5.5.3.2 of , respectively.

Lexical representation

The lexical representation for xs:dayTimeDuration is the truncated format PnDTnHnMnS, where nD represents the number of days, T is the date/time separator, nH the number of hours, nM the number of minutes and nS the number of seconds.

The values of the days, hours and minutes components are not restricted, but allow an arbitrary unsigned xs:integer. Similarly, the value of the seconds component allows an arbitrary unsigned xs:decimal. An optional minus sign ('-') is allowed to precede the 'P', indicating a negative duration. If the sign is omitted, the duration is positive. See also Date and Time Formats.

For example, to indicate a duration of 3 days, 10 hours and 30 minutes, one would write: P3DT10H30M. One could also indicate a duration of minus 120 days as: -P120D. Reduced precision and truncated representations of this format are allowed, provided they conform to the following:

If the number of days, hours, minutes, or seconds in any expression equals zero (0), the number and its corresponding designator be omitted. However, at least one number and its designator be present.

The seconds part have a decimal fraction.

The designator 'T' be absent if and only if all of the time items are absent. The designator 'P' always be present.

For example, P13D, PT47H, P3DT2H, -PT35.89S and P4DT251M are all allowed. P-134D is not allowed (invalid location of minus sign), although -P134D is allowed.

Calculating the value of a xs:dayTimeDuration from the lexical representation

The value of a xs:dayTimeDuration lexical form in fractional seconds is obtained by converting the days, hours, minutes and seconds value to fractional seconds using the conversion rules: 24 hours = 1 day, 60 minutes = 1 hour and 60 seconds = 1 minute.

Canonical representation

The canonical representation of xs:dayTimeDuration restricts the value of the hours component to xs:integer values between 0 and 23, both inclusive; the value of the minutes component to xs:integer values between 0 and 59; both inclusive; and the value of the seconds component to xs:decimal valued from 0.0 to 59.999... (see , Appendix D).

To convert from a non-canonical representation to the canonical representation, the value of the lexical form in fractional seconds is first calculated in the manner described above. The value of the days component in the canonical form is then calculated by dividing the value by 86,400 (24*60*60). The remainder is in fractional seconds. The value of the hours component in the canonical form is calculated by dividing this remainder by 3,600 (60*60). The remainder is again in fractional seconds. The value of the minutes component in the canonical form is calculated by dividing this remainder by 60. The remainder in fractional seconds is the value of the seconds component in the canonical form. For negative durations, the canonical form is calculated using the absolute value of the duration and a negative sign is prepended to it. If a component has the value zero (0) then the number and the designator for that component must be omitted. However, if all the components of the lexical form are zero (0), the canonical form is PT0S.

Order relation on xs:dayTimeDuration

Let the function that calculates the value of a xs:dayTimeDuration in the manner described above be called V(d). Then for two xs:dayTimeDuration values x and y, x > y if and only if V(x) > V(y). The order relation on xs:dayTimeDuration is a total order.

Comparison Operators on Durations

The following comparison operators are defined on the duration datatypes. Each operator takes two operands of the same type and returns an xs:boolean result. As discussed in , the order relation on xs:duration is not a total order but, rather, a partial order. For this reason, only equality is defined on xs:duration. A full complement of comparison and arithmetic functions are defined on the two subtypes of duration described in which do have a total order.

Component Extraction Functions on Durations

The duration datatype may be considered to be a composite datatypes in that it contains distinct properties or components. The extraction functions specified below extract a single component from a duration value. For xs:duration and its subtypes, including the two subtypes xs:yearMonthDuration and xs:dayTimeDuration, the components are normalized: this means that the seconds and minutes components will always be less than 60, the hours component less than 24, and the months component less than 12.

Arithmetic Operators on Durations

For operators that combine a duration and a date/time value, see .

Functions and Operators on Dates and Times

This section defines operations on the date and time types.

See for a disquisition on working with date and time values with and without timezones.

Date and Time Types

The operators described in this section are defined on the following date and time types:

xs:dateTime

xs:date

xs:time

xs:gYearMonth

xs:gYear

xs:gMonthDay

xs:gMonth

xs:gDay

The only operations defined on xs:gYearMonth, xs:gYear, xs:gMonthDay, xs:gMonth and xs:gDay values are equality comparison and component extraction. For other types, further operations are provided, including order comparisons, arithmetic, formatted display, and timezone adjustment.

Limits and Precision

For a number of the above datatypes extends the basic lexical representations, such as YYYY-MM-DDThh:mm:ss.s for dateTime, by allowing a preceding minus sign, more than four digits to represent the year field — no maximum is specified — and an unlimited number of digits for fractional seconds. Leap seconds are not supported.

All minimally conforming processors support positive year values with a minimum of 4 digits (i.e., YYYY) and a minimum fractional second precision of 1 millisecond or three digits (i.e., s.sss). However, conforming processors set larger limits on the maximum number of digits they support in these two situations. Processors also choose to support the year 0000 and years with negative values. The results of operations on dates that cross the year 0000 are .

A processor that limits the number of digits in date and time datatype representations may encounter overflow and underflow conditions when it tries to execute the functions in . In these situations, the processor return 00:00:00 in case of time underflow. It raise an error in case of overflow.

Can time underflow occur, and if so when?
Date/time datatype values

As defined in , xs:dateTime, xs:date, xs:time, xs:gYearMonth, xs:gYear, xs:gMonthDay, xs:gMonth, xs:gDay values, referred to collectively as date/time values, are represented as seven components or properties: year, month, day, hour, minute, second and timezone. The value of the first five components are xs:integers. The value of the second component is an xs:decimal and the value of the timezone component is an xs:dayTimeDuration. For all the primitive date/time datatypes, the timezone property is optional and may or may not be present. Depending on the datatype, some of the remaining six properties must be present and some must be absent. Absent, or missing, properties are represented by the empty sequence. This value is referred to as the local value in that the value retains its original timezone. Before comparing or subtracting xs:dateTime values, this local value be translated or normalized to UTC.

For xs:time, 00:00:00 and 24:00:00 are alternate lexical forms for the same value, whose canonical representation is 00:00:00. For xs:dateTime, a time component 24:00:00 translates to 00:00:00" of the following day.

Examples

An xs:dateTime with lexical representation 1999-05-31T05:00:00 is represented in the datamodel by {1999, 5, 31, 5, 0, 0.0, ()}.

An xs:dateTime with lexical representation 1999-05-31T13:20:00-05:00 is represented by {1999, 5, 31, 13, 20, 0.0, -PT5H}.

An xs:dateTime with lexical representation 1999-12-31T24:00:00 is represented by {2000, 1, 1, 0, 0, 0.0, ()}.

An xs:date with lexical representation 2005-02-28+8:00 is represented by {2005, 2, 28, (), (), (), PT8H}.

An xs:time with lexical representation 24:00:00 is represented by {(), (), (), 0, 0, 0, ()}.

Constructing a dateTime

A function is provided for constructing a xs:dateTime value from a xs:date value and a xs:time value.

Comparison Operators on Duration, Date and Time Values

The following comparison operators are defined on the date/time datatypes. Each operator takes two operands of the same type and returns an xs:boolean result.

also states that the order relation on date and time datatypes is not a total order but a partial order because these datatypes may or may not have a timezone. This is handled as follows. If either operand to a comparison function on date or time values does not have an (explicit) timezone then, for the purpose of the operation, an implicit timezone, provided by the dynamic context , is assumed to be present as part of the value. This creates a total order for all date and time values.

The following paragraph seems to duplicate material that has already been covered.

An xs:dateTime can be considered to consist of seven components: year, month, day, hour, minute, second and timezone. For xs:dateTime six components: year, month, day, hour, minute and second are required and timezone is optional. For other date/time values, of the first six components, some are required and others must be absent or missing. Timezone is always optional. For example, for xs:date, the year, month and day components are required and hour, minute and second components must be absent; for xs:time the hour, minute and second components are required and year, month and day are missing; for xs:gDay, day is required and year, month, hour, minute and second are missing.

Values of the date/time datatypes xs:time, xs:gMonthDay, xs:gMonth, and xs:gDay, can be considered to represent a sequence of recurring time instants or time periods. An xs:time occurs every day. An xs:gMonth occurs every year. Comparison operators on these datatypes compare the starting instants of equivalent occurrences in the recurring series. These xs:dateTime values are calculated as described below.

Comparison operators on xs:date, xs:gYearMonth and xs:gYear compare their starting instants. These xs:dateTime values are calculated as described below.

The starting instant of an occurrence of a date/time value is an xs:dateTime calculated by filling in the missing components of the local value from a reference xs:dateTime. An example of a suitable reference xs:dateTime is 1972-01-01T00:00:00. Then, for example, the starting instant corresponding to the xs:date value 2009-03-12 is 2009-03-12T00:00:00; the starting instant corresponding to the xs:time value 13:30:02 is 1972-01-01T13:30:02; and the starting instant corresponding to the gMonthDay value --02-29 is 1972-02-29T00:00:00 (which explains why a leap year was chosen for the reference).

In the previous version of this specification, the reference date/time chosen was 1972-12-31T00:00:00. While this gives the same results, it produces a "starting instant" for a gMonth or gMonthDay that bears no relation to the English meaning of the term, and it also required special handling of short months. The original choice was made to allow for leap seconds; but since leap seconds are not recognized in date/time arithmetic, this is not actually necessary.

If the xs:time value written as 24:00:00 is to be compared, filling in the missing components gives 1972-01-01T00:00:00, because 24:00:00 is an alternative representation of 00:00:00 (the lexical value "24:00:00" is converted to the time components {0,0,0} before the missing components are filled in). This has the consequence that when ordering xs:time values, 24:00:00 is considered to be earlier than 23:59:59. However, when ordering xs:dateTime values, a time component of 24:00:00 is considered equivalent to 00:00:00 on the following day.

Note that the reference xs:dateTime does not have a timezone. The timezone component is never filled in from the reference xs:dateTime. In some cases, if the date/time value does not have a timezone, the implicit timezone from the dynamic context is used as the timezone.

This specification uses the reference xs:dateTime 1972-01-01T00:00:00 in the description of the comparison operators. Implementations may use other reference xs:dateTime values as long as they yield the same results. The reference xs:dateTime used must meet the following constraints: when it is used to supply components into xs:gMonthDay values, the year must allow for February 29 and so must be a leap year; when it is used to supply missing components into xs:gDay values, the month must allow for 31 days. Different reference xs:dateTime values may be used for different operators.

Component Extraction Functions on Dates and Times

The date and time datatypes may be considered to be composite datatypes in that they contain distinct properties or components. The extraction functions specified below extract a single component from a date or time value. In all cases the local value (that is, the original value as written, without any timezone adjustment) is used.

A time written as 24:00:00 is treated as 00:00:00 on the following day.

Timezone Adjustment Functions on Dates and Time Values

These functions adjust the timezone component of an xs:dateTime, xs:date or xs:time value. The $timezone argument to these functions is defined as an xs:dayTimeDuration but must be a valid timezone value.

Arithmetic Operators on Durations, Dates and Times

These functions support adding or subtracting a duration value to or from an xs:dateTime, an xs:date or an xs:time value. Appendix E of describes an algorithm for performing such operations.

Formatting Dates and Times This section has been created by the editor in response to a WG decision in principle; the detailed text needs to be reviewed and approved.

Three functions are provided to represent dates and times as a string, using the conventions of a selected calendar, language, and country. The signatures are presented first, followed by the rules which apply to each of the functions.

The date/time formatting functions

The fn:format-dateTime, fn:format-date, and fn:format-time functions format $value as a string using the picture string specified by the $picture argument, the calendar specified by the $calendar argument, the language specified by the $language argument, and the country specified by the $country argument. The result of the function is the formatted string representation of the supplied xs:dateTime, xs:date, or xs:time value.

The three functions fn:format-dateTime, fn:format-date, and fn:format-time are referred to collectively as the date formatting functions.

If $value is the empty sequence, the function returns the empty sequence.

Calling the two-argument form of each of the three functions is equivalent to calling the five-argument form with each of the last three arguments set to an empty sequence.

For details of the language, calendar, and country arguments, see .

In general, the use of an invalid picture, language, calendar, or country argument results in a dynamic error. By contrast, use of an option in any of these arguments that is valid but not supported by the implementation is not an error, and in these cases the implementation is required to output the value in a fallback representation.

The Picture String

The picture consists of a sequence of variable markers and literal substrings. A substring enclosed in square brackets is interpreted as a variable marker; substrings not enclosed in square brackets are taken as literal substrings. The literal substrings are optional and if present are rendered unchanged, including any whitespace. If an opening or closing square bracket is required within a literal substring, it must be doubled. The variable markers are replaced in the result by strings representing aspects of the date and/or time to be formatted. These are described in detail below.

A variable marker consists of a component specifier followed optionally by one or two presentation modifiers and/or optionally by a width modifier. Whitespace within a variable marker is ignored.

The component specifier indicates the component of the date or time that is required, and takes the following values:

Specifier Meaning Default Presentation Modifier
Y year (absolute value) 1
M month in year 1
D day in month 1
d day in year 1
F day of week n
W week in year 1
w week in month 1
H hour in day (24 hours) 1
h hour in half-day (12 hours) 1
P am/pm marker n
m minute in hour 01
s second in minute 01
f fractional seconds 1
Z timezone as a time offset from UTC, or if an alphabetic modifier is present the conventional name of a timezone (such as PST) 1
z timezone as a time offset using GMT, for example GMT+1 or GMT-05:00. For this component there is a fixed prefix of GMT, or a localized variation thereof for the chosen language, and the presentation modifier controls the representation of the signed time offset that follows. 1
C calendar: the name or abbreviation of a calendar name n
E era: the name of a baseline for the numbering of years, for example the reign of a monarch n

An error is reported if the syntax of the picture is incorrect.

An error is reported if a component specifier within the picture refers to components that are not available in the given type of $value, for example if the picture supplied to the format-time refers to the year, month, or day component.

It is not an error to include a timezone component when the supplied value has no timezone. In these circumstances the timezone component will be ignored.

The first presentation modifier indicates the style in which the value of a component is to be represented. Its value may be either:

any format token permitted as a primary format token in the second argument of the format-integer function, indicating that the value of the component is to be output numerically using the specified number format (for example, 1, 01, i, I, w, W, or Ww) or

the format token n, N, or Nn, indicating that the value of the component is to be output by name, in lower-case, upper-case, or title-case respectively. Components that can be output by name include (but are not limited to) months, days of the week, timezones, and eras. If the processor cannot output these components by name for the chosen calendar and language then it must use an implementation-defined fallback representation.

If a comma is to be used as a grouping separator within the format token, then there must be a width specifier. More specifically: if a variable marker contains one or more commas, then the last comma is treated as introducing the width modifier, and all others are treated as grouping separators. So [Y9,999,*] will output the year as 2,008.

If the implementation does not support the use of the requested format token, it must use the default presentation modifier for that component.

If the first presentation modifier is present, then it may optionally be followed by a second presentation modifier as follows:

Modifier Meaning
t traditional numbering. This has the same meaning as in the second argument of fn:format-integer.
o ordinal form of a number, for example 8th or . This has the same meaning as in the second argument of fn:format-integer. The actual representation of the ordinal form of a number may depend not only on the language, but also on the grammatical context (for example, in some languages it must agree in gender).

Although the formatting rules are expressed in terms of the rules for format tokens in fn:format-integer, the formats actually used may be specialized to the numbering of date components where appropriate. For example, in Italian, it is conventional to use an ordinal number (primo) for the first day of the month, and cardinal numbers (due, tre, quattro ...) for the remaining days. A processor may therefore use this convention to number days of the month, ignoring the presence or absence of the ordinal presentation modifier.

Whether or not a presentation modifier is included, a width modifier may be supplied. This indicates the number of characters or digits to be included in the representation of the value.

The width modifier, if present, is introduced by a comma or semicolon. It takes the form:

   ","  min-width ("-" max-width)?

where min-width is either an unsigned integer indicating the minimum number of characters to be output, or * indicating that there is no explicit minimum, and max-width is either an unsigned integer indicating the maximum number of characters to be output, or * indicating that there is no explicit maximum; if max-width is omitted then * is assumed. Both integers, if present, must be greater than zero.

A format token containing more than one digit, such as 001 or 9999, sets the minimum and maximum width to the number of digits appearing in the format token; if a width modifier is also present, then the width modifier takes precedence.

A format token consisting of a single digit, such as 1, does not constrain the number of digits in the output. In the case of fractional seconds in particular, [f001] requests three decimal digits, [f01] requests two digits, but [f1] will produce an implementation-defined number of digits. If exactly one digit is required, this can be achieved using the component specifier [f1,1-1].

If the minimum and maximum width are unspecified, then the output uses as many characters as are required to represent the value of the component without truncation and without padding: this is referred to below as the full representation of the value. For a timezone offset (component specifier z), the full representation consists of a sign for the offset, the number of hours of the offset, and if the offset is not an integral number of hours, a colon (:) followed by the two digits of the minutes of the offset..

If the full representation of the value exceeds the specified maximum width, then the processor should attempt to use an alternative shorter representation that fits within the maximum width. Where the presentation modifier is N, n, or Nn, this is done by abbreviating the name, using either conventional abbreviations if available, or crude right-truncation if not. For example, setting max-width to 4 indicates that four-letter abbreviations should be used, though it would be acceptable to use a three-letter abbreviation if this is in conventional use. (For example, "Tuesday" might be abbreviated to "Tues", and "Friday" to "Fri".) In the case of the year component, setting max-width requests omission of high-order digits from the year, for example, if max-width is set to 2 then the year 2003 will be output as 03. In the case of the fractional seconds component, the value is rounded to the specified size as if by applying the function round-half-to-even(fractional-seconds, max-width). If no mechanism is available for fitting the value within the specified maximum width (for example, when roman numerals are used), then the value should be output in its full representation.

If the full representation of the value is shorter than the specified minimum width, then the processor should pad the value to the specified width.

For decimal representations of numbers, this should be done by prepending zero digits from the appropriate set of digit characters, or appending zero digits in the case of the fractional seconds component.

For timezone offsets this should be done by first appending a colon (:) followed by two zero digits from the appropriate set of digit characters if the full representation does not already include a minutes component and if the specified minimum width permits adding three characters, and then if necessary prepending zero digits from the appropriate set of digit characters to the hour component.

In other cases, it should be done by appending spaces.

The Language, Calendar, and Country Arguments

The set of languages, calendars, and countries that are supported in the date formatting functions is implementation-defined. When any of these arguments is omitted or is an empty sequence, an implementation-defined default value is used.

The set of languages, calendars, and countries that are supported in the date formatting functions is implementation-defined. If any of these arguments is omitted or set to an empty sequence, the default is implementation-defined.

If the fallback representation uses a different calendar from that requested, the output string must identify the calendar actually used, for example by prefixing the string with [Calendar: X] (where X is the calendar actually used), localized as appropriate to the requested language. If the fallback representation uses a different language from that requested, the output string must identify the language actually used, for example by prefixing the string with [Language: Y] (where Y is the language actually used) localized in an implementation-dependent way. If a particular component of the value cannot be output in the requested format, it should be output in the default format for that component.

The language argument specifies the language to be used for the result string of the function. The value of the argument must be either the empty sequence or a value that would be valid for the xml:lang attribute (see [XML]). Note that this permits the identification of sublanguages based on country codes (from ) as well as identification of dialects and of regions within a country.

If the language argument is omitted or is set to an empty sequence, or if it is set to an invalid value or a value that the implementation does not recognize, then the processor uses an implementation-defined language.

The language is used to select the appropriate language-dependent forms of:

names (for example, of months) numbers expressed as words or as ordinals (twenty, 20th, twentieth) hour convention (0-23 vs 1-24, 0-11 vs 1-12) first day of week, first week of year

Where appropriate this choice may also take into account the value of the country argument, though this should not be used to override the language or any sublanguage that is specified as part of the language argument.

The choice of the names and abbreviations used in any given language is implementation-defined. For example, one implementation might abbreviate July as Jul while another uses Jly. In German, one implementation might represent Saturday as Samstag while another uses Sonnabend. Implementations may provide mechanisms allowing users to control such choices.

The choice of the names and abbreviations used in any given language for calendar units such as days of the week and months of the year is implementation-defined.

Where ordinal numbers are used, the selection of the correct representation of the ordinal (for example, the linguistic gender) may depend on the component being formatted and on its textual context in the picture string.

The calendar attribute specifies that the dateTime, date, or time supplied in the $value argument must be converted to a value in the specified calendar and then converted to a string using the conventions of that calendar.

A calendar value must be a valid lexical QName. If the QName does not have a prefix, then it identifies a calendar with the designator specified below. If the QName has a prefix, then the QName is expanded into an expanded-QName using the in-scope namespaces from the static context; the expanded-QName identifies the calendar; the behavior in this case is implementation-defined.

If the calendar attribute is omitted an implementation-defined value is used.

The calendars listed below were known to be in use during the last hundred years. Many other calendars have been used in the past.

This specification does not define any of these calendars, nor the way that they map to the value space of the xs:date data type in . There may be ambiguities when dates are recorded using different calendars. For example, the start of a new day is not simultaneous in different calendars, and may also vary geographically (for example, based on the time of sunrise or sunset). Translation of dates is therefore more reliable when the time of day is also known, and when the geographic location is known. When translating dates between one calendar and another, the processor may take account of the values of the country and/or language arguments, with the country argument taking precedence.

Information about some of these calendars, and algorithms for converting between them, may be found in .

Designator Calendar
AD Anno Domini (Christian Era)
AH Anno Hegirae (Muhammedan Era)
AME Mauludi Era (solar years since Mohammed's birth)
AM Anno Mundi (Jewish Calendar)
AP Anno Persici
AS Aji Saka Era (Java)
BE Buddhist Era
CB Cooch Behar Era
CE Common Era
CL Chinese Lunar Era
CS Chula Sakarat Era
EE Ethiopian Era
FE Fasli Era
ISO ISO 8601 calendar
JE Japanese Calendar
KE Khalsa Era (Sikh calendar)
KY Kali Yuga
ME Malabar Era
MS Monarchic Solar Era
NS Nepal Samwat Era
OS Old Style (Julian Calendar)
RS Rattanakosin (Bangkok) Era
SE Saka Era
SH Mohammedan Solar Era (Iran)
SS Saka Samvat
TE Tripurabda Era
VE Vikrama Era
VS Vikrama Samvat Era

At least one of the above calendars must be supported. It is implementation-defined which calendars are supported.

The ISO 8601 calendar (), which is included in the above list and designated ISO, is very similar to the Gregorian calendar designated AD, but it differs in several ways. The ISO calendar is intended to ensure that date and time formats can be read easily by other software, as well as being legible for human users. The ISO calendar prescribes the use of particular numbering conventions as defined in ISO 8601, rather than allowing these to be localized on a per-language basis. In particular it provides a numeric 'week date' format which identifies dates by year, week of the year, and day in the week; in the ISO calendar the days of the week are numbered from 1 (Monday) to 7 (Sunday), and week 1 in any calendar year is the week (from Monday to Sunday) that includes the first Thursday of that year. The numeric values of the components year, month, day, hour, minute, and second are the same in the ISO calendar as the values used in the lexical representation of the date and time as defined in . The era ("E" component) with this calendar is either a minus sign (for negative years) or a zero-length string (for positive years). For dates before 1 January, AD 1, year numbers in the ISO and AD calendars are off by one from each other: ISO year 0000 is 1 BC, -0001 is 2 BC, etc.

The value space of the date and time data types, as defined in XML Schema, is based on absolute points in time. The lexical space of these data types defines a representation of these absolute points in time using the proleptic Gregorian calendar, that is, the modern Western calendar extrapolated into the past and the future; but the value space is calendar-neutral. The date formatting functions produce a representation of this absolute point in time, but denoted in a possibly different calendar. So, for example, the date whose lexical representation in XML Schema is 1502-01-11 (the day on which Pope Gregory XIII was born) might be formatted using the Old Style (Julian) calendar as 1 January 1502. This reflects the fact that there was at that time a ten-day difference between the two calendars. It would be incorrect, and would produce incorrect results, to represent this date in an element or attribute of type xs:date as 1502-01-01, even though this might reflect the way the date was recorded in contemporary documents.

When referring to years occurring in antiquity, modern historians generally use a numbering system in which there is no year zero (the year before 1 CE is thus 1 BCE). This is the convention that should be used when the requested calendar is OS (Julian) or AD (Gregorian). When the requested calendar is ISO, however, the conventions of ISO 8601 should be followed: here the year before +0001 is numbered zero. In (version 1.0), the value space for xs:date and xs:dateTime does not include a year zero: however, a future edition is expected to endorse the ISO 8601 convention. This means that the date on which Julius Caesar was assassinated has the ISO 8601 lexical representation -0043-03-13, but will be formatted as 15 March 44 BCE in the Julian calendar or 13 March 44 BCE in the Gregorian calendar (dependant on the chosen localization of the names of months and eras).

The intended use of the country argument is to identify the place where an event represented by the dateTime, date, or time supplied in the $value argument took place or will take place. If the value is supplied, and is not the empty sequence, then it should be a country code defined in . Implementations may also allow the use of codes representing subdivisions of a country from ISO 3166-2, or codes representing formerly used names of countries from ISO 3166-3. This argument is not intended to identify the location of the user for whom the date or time is being formatted; that should be done by means of the language attribute. This information may be used to provide additional information when converting dates between calendars or when deciding how individual components of the date and time are to be formatted. For example, different countries using the Old Style (Julian) calendar started the new year on different days, and some countries used variants of the calendar that were out of synchronization as a result of differences in calculating leap years. The geographical area identified by a country code is defined by the boundaries as they existed at the time of the date to be formatted, or the present-day boundaries for dates in the future.

Examples of Date and Time Formatting Gregorian Calendar

The following examples show a selection of dates and times and the way they might be formatted. These examples assume the use of the Gregorian calendar as the default calendar.

Required Output Expression
2002-12-31 format-date($d, "[Y0001]-[M01]-[D01]")
12-31-2002 format-date($d, "[M]-[D]-[Y]")
31-12-2002 format-date($d, "[D]-[M]-[Y]")
31 XII 2002 format-date($d, "[D1] [MI] [Y]")
31st December, 2002 format-date($d, "[D1o] [MNn], [Y]", "en", (), ())
31 DEC 2002 format-date($d, "[D01] [MN,*-3] [Y0001]", "en", (), ())
December 31, 2002 format-date($d, "[MNn] [D], [Y]", "en", (), ())
31 Dezember, 2002 format-date($d, "[D] [MNn], [Y]", "de", (), ())
Tisdag 31 December 2002 format-date($d, "[FNn] [D] [MNn] [Y]", "sv", (), ())
[2002-12-31] format-date($d, "[[[Y0001]-[M01]-[D01]]]")
Two Thousand and Three format-date($d, "[YWw]", "en", (), ())
einunddreißigste Dezember format-date($d, "[Dwo] [MNn]", "de", (), ())
3:58 PM format-time($t, "[h]:[m01] [PN]", "en", (), ())
3:58:45 pm format-time($t, "[h]:[m01]:[s01] [Pn]", "en", (), ())
3:58:45 PM PDT format-time($t, "[h]:[m01]:[s01] [PN] [ZN,*-3]", "en", (), ())
3:58:45 o'clock PM PDT format-time($t, "[h]:[m01]:[s01] o'clock [PN] [ZN,*-3]", "en", (), ())
15:58 format-time($t,"[H01]:[m01]")
15:58:45.762 format-time($t,"[H01]:[m01]:[s01].[f001]")
15:58:45 GMT+02:00 format-time($t,"[H01]:[m01]:[s01] [z,6-6]", "en", (), ())
15.58 Uhr GMT+2 format-time($t,"[H01]:[m01] Uhr [z]", "de", (), ())
3.58pm on Tuesday, 31st December format-dateTime($dt, "[h].[m01][Pn] on [FNn], [D1o] [MNn]")
12/31/2002 at 15:58:45 format-dateTime($dt, "[M01]/[D01]/[Y0001] at [H01]:[m01]:[s01]")
Non-Gregorian Calendars

The following examples use calendars other than the Gregorian calendar.

These examples use non-Latin characters which might not display correctly in all browsers, depending on the system configuration.

Description Request Result
Islamic format-date($d, "[D١] [Mn] [Y١]", "ar", "AH", ()) ٢٦ ﺸﻭّﺍﻝ ١٤٢٣
Jewish (with Western numbering) format-date($d, "[D] [Mn] [Y]", "he", "AM", ()) ‏26 טבת 5763
Jewish (with traditional numbering) format-date($d, "[Dאt] [Mn] [Yאt]", "he", "AM", ()) כ״ו טבת תשס״ג
Julian (Old Style) format-date($d, "[D] [MNn] [Y]", "en", "OS", ()) 18 December 2002
Thai format-date($d, "[D๑] [Mn] [Y๑]", "th", "BE", ()) ๓๑ ธันวาคม ๒๕๔๕
Functions Related to QNames Functions to create a QNames

In addition to the xs:QName constructor function, QName values can be constructed by combining a namespace URI, prefix, and local name, or by resolving a lexical QName against the in-scope namespaces of an element node. This section defines these functions. Leading and trailing whitespace, if present, is stripped from string arguments before the result is constructed.

Functions and Operators Related to QNames

This section specifies functions on QNames as defined in .

Operators on base64Binary and hexBinary Comparisons of base64Binary and hexBinary Values

The following comparison operators on xs:base64Binary and xs:hexBinary values are defined. Comparisons take two operands of the same type; that is, both operands must be xs:base64Binary or both operands may be xs:hexBinary. Each returns a boolean value.

A value of type xs:hexBinary can be compared with a value of type xs:base64Binary by casting one value to the other type. See .

Operators on NOTATION

This section specifies operators that take xs:NOTATION values as arguments.

Functions and Operators on Nodes

This section specifies functions and operators on nodes. Nodes are formally defined in .

For the illustrative examples below assume an XQuery or transformation operating on a PurchaseOrder document containing a number of line-item elements. Each line-item has child elements called description, price, quantity, etc. whose content is different for each line-item. Quantity has simple content of type xs:decimal. Further assume that variables $item1, $item2, etc. are each bound to single line-item element nodes in the document in sequence and that the value of the quantity child of the first line-item is 5.0.

let $po := <PurchaseOrder> <line-item> <description>Large widget</description> <price>8.95</price> <quantity>5.0</quantity> </line-item> <line-item> <description>Small widget</description> <price>3.99</price> <quantity>2.0</quantity> </line-item> <line-item> <description>Tiny widget</description> <price>1.49</price> <quantity>805</quantity> </line-item> </PurchaseOrder>

let $item1 := $po/line-item[1]

let $item2 := $po/line-item[2]

let $item3 := $po/line-item[3]

Functions and Operators on Sequences

A sequence is an ordered collection of zero or more items. An item is either a node or an atomic value. The terms sequence and item are defined formally in and .

General Functions and Operators on Sequences

The following functions are defined on sequences.

As in the previous section, for the illustrative examples below, assume an XQuery or transformation operating on a non-empty Purchase Order document containing a number of line-item elements. The variable $seq is bound to the sequence of line-item nodes in document order. The variables $item1, $item2, etc. are bound to separate, individual line-item nodes in the sequence.

Functions That Test the Cardinality of Sequences

The following functions test the cardinality of their sequence arguments.

The functions fn:zero-or-one, fn:one-or-more, and fn:exactly-one defined in this section, check that the cardinality of a sequence is in the expected range. They are particularly useful with regard to static typing. For example, the function call fn:remove($seq, index-of($seq2, 'abc')) requires the result of the call on fn:index-of to be a singleton integer, but the static type system cannot infer this; writing the expression as fn:remove($seq, fn:exactly-one(fn:index-of($seq2, 'abc'))) will provide a suitable static type at query analysis time, and ensures that the length of the sequence is correct with a dynamic check at query execution time.

The type signatures for these functions deliberately declare the argument type as item()*, permitting a sequence of any length. A more restrictive signature would defeat the purpose of the function, which is to defer cardinality checking until query execution time.

Equals, Union, Intersection and Except

As in the previous sections, for the illustrative examples below, assume an XQuery or transformation operating on a Purchase Order document containing a number of line-item elements. The variables $item1, $item2, etc. are bound to individual line-item nodes in the sequence. We use sequences of these nodes in some of the examples below.

Aggregate Functions

Aggregate functions take a sequence as argument and return a single value computed from values in the sequence. Except for fn:count, the sequence must consist of values of a single type or one if its subtypes, or they must be numeric. xs:untypedAtomic values are permitted in the input sequence and handled by special conversion rules. The type of the items in the sequence must also support certain operations.

Functions and Operators that Generate Sequences
Context Functions

The following functions are defined to obtain information from the dynamic context.

Functions on Functions

The following functions operate on function items, that is, values referring to a function.

Constructor Functions Constructor Functions for XML Schema Built-in Types The following reference needs to be rephrased so it refers to "whichever version of XSD the implementation chooses to support".

Every built-in atomic type that is defined in , except xs:anyAtomicType and xs:NOTATION, has an associated constructor function. xs:untypedAtomic, defined in and the two derived types xs:yearMonthDuration and xs:dayTimeDuration defined in also have associated constructor functions.

A constructor function is not defined for xs:anyAtomicType as there are no atomic values with type annotation xs:anyAtomicType at runtime, although this can be a statically inferred type. A constructor function is not defined for xs:NOTATION since it is defined as an abstract type in . If the static context (See ) contains a type derived from xs:NOTATION then a constructor function is defined for it. See .

The form of the constructor function for a type prefix:TYPE is:

If $arg is the empty sequence, the empty sequence is returned. For example, the signature of the constructor function corresponding to the xs:unsignedInt type defined in is:

Invoking the constructor function xs:unsignedInt(12) returns the xs:unsignedInt value 12. Another invocation of that constructor function that returns the same xs:unsignedInt value is xs:unsignedInt("12"). The same result would also be returned if the constructor function were to be invoked with a node that had a typed value equal to the xs:unsignedInt 12. The standard features described in would 'atomize' the node to extract its typed value and then call the constructor with that value. If the value passed to a constructor is illegal for the datatype to be constructed, an error is raised .

The semantics of the constructor function xs:TYPE(arg) are identical to the semantics of arg cast as xs:TYPE? . See .

If the argument to a constructor function is a literal, the result of the function may be evaluated statically; if an error is found during such evaluation, it may be reported as a static error.

Special rules apply to constructor functions for xs:QName and types derived from xs:QName and xs:NOTATION. See .

The following constructor functions for the built-in types are supported:

Implementations return negative zero for xs:float("-0.0E0"). does not distinguish between the values positive zero and negative zero.

Implementations return negative zero for xs:double("-0.0E0"). does not distinguish between the values positive zero and negative zero.

See for special rules.

See for rules related to constructing values of type xs:ENTITY and types derived from it.

Constructor Functions for xs:QName and xs:NOTATION

Special rules apply to constructor functions for the types xs:QName and xs:NOTATION, for two reasons:

The lexical representation of these types uses namespace prefixes, whose meaning is context-dependent.

Values cannot belong directly to the type xs:NOTATION, only to its subtypes.

These constraints result in the following restrictions:

Conversion from an xs:string to a value of type xs:QName, a type derived from xs:QName or a type derived from xs:NOTATION is permitted only if the xs:string is written as a string literal. This applies whether the conversion is expressed using a constructor function or using the "cast as" syntax. Such a conversion can be regarded as a pseudo-function, which is always evaluated statically. It is also permitted for these constructors and casts to take a dynamically-supplied argument in the normal manner, but as the casting table (see ) indicates, the only arguments that are supported in this case are values of type xs:QName or xs:NOTATION respectively.

There is no constructor function for xs:NOTATION. Constructors are defined, however, for xs:QName, for types derived from xs:QName, and for types derived from xs:NOTATION.

When converting from an xs:string, the prefix within the lexical xs:QName supplied as the argument is resolved to a namespace URI using the statically known namespaces from the static context. If the lexical xs:QName has no prefix, the namespace URI of the resulting expanded-QName is the default element/type namespace from the static context. Components of the static context are discussed in . A static error is raised if the prefix is not bound in the static context. As described in , the supplied prefix is retained as part of the expanded-QName value.

Constructor Functions for User-Defined Types

For every atomic type in the static context (See ) that is derived from a primitive type, there is a constructor function (whose name is the same as the name of the type) whose effect is to create a value of that type from the supplied argument. The rules for constructing user-defined types are defined in the same way as the rules for constructing built-in derived types discussed in .

Special rules apply to constructor functions for types derived from xs:QName and xs:NOTATION. See .

Consider a situation where the static context contains a type called hatSize defined in a schema whose target namespace is bound to the prefix my. In such a case the constructor function:

is available to users.

To construct an instance of an atomic type that is not in a namespace, it is necessary to use a cast expression or undeclare the default function namespace. For example, if the user-defined type apple is derived from xs:integer but is not in a namespace, an instance of this type can be constructed as follows using a cast expression (this requires that the default element/type namespace is no namespace):

17 cast as apple

The following shows the use of the constructor function:

declare default function namespace ""; apple(17)
Casting

Constructor functions and cast expressions accept an expression and return a value of a given type. They both convert a source value, SV, of a source type, ST, to a target value, TV, of the given target type, TT, with identical semantics and different syntax. The name of the constructor function is the same as the name of the built-in datatype or the datatype defined in of (see ) or the user-derived datatype (see ) that is the target for the conversion, and the semantics are exactly the same as for a cast expression; for example, xs:date("2003-01-01") means exactly the same as "2003-01-01" cast as xs:date? .

The cast expression takes a type name to indicate the target type of the conversion. See . If the type name allows the empty sequence and the expression to be cast is the empty sequence, the empty sequence is returned. If the type name does not allow the empty sequence and the expression to be cast is the empty sequence, a type error is raised .

Where the argument to a cast is a literal, the result of the function may be evaluated statically; if an error is encountered during such evaluation, it may be reported as a static error.

Casting from primitive type to primitive type is discussed in . Casting to derived types is discussed in . Casting from derived types is discussed in , and .

When casting from xs:string the semantics in apply, regardless of target type.

Casting from primitive types to primitive types

This section defines casting between the 19 primitive types defined in as well as xs:untypedAtomic, xs:integer and the two derived types of xs:duration (xs:yearMonthDuration and xs:dayTimeDuration). These four types are not primitive types but they are treated as primitive types in this section. The type conversions that are supported are indicated in the table below. In this table, there is a row for each primitive type with that type as the source of the conversion and there is a column for each primitive type as the target of the conversion. The intersections of rows and columns contain one of three characters: Y indicates that a conversion from values of the type to which the row applies to the type to which the column applies is supported; N indicates that there are no supported conversions from values of the type to which the row applies to the type to which the column applies; and M indicates that a conversion from values of the type to which the row applies to the type to which the column applies may succeed for some values in the value space and fails for others.

defines xs:NOTATION as an abstract type. Thus, casting to xs:NOTATION from any other type including xs:NOTATION is not permitted and raises . However, casting from one subtype of xs:NOTATION to another subtype of xs:NOTATION is permitted.

Casting is not supported to or from xs:anySimpleType. Thus, there is no row or column for this type in the table below. For any node that has not been validated or has been validated as xs:anySimpleType, the typed value of the node is an atomic value of type xs:untypedAtomic. There are no atomic values with the type annotation xs:anySimpleType at runtime. Casting to a type that is not atomic raises .

Similarly, casting is not supported to or from xs:anyAtomicType and will raise error . There are no atomic values with the type annotation xs:anyAtomicType at runtime, although this can be a statically inferred type.

If casting is attempted from an ST to a TT for which casting is not supported, as defined in the table below, a type error is raised .

In the following table, the columns and rows are identified by short codes that identify simple types as follows:

uA = xs:untypedAtomic aURI = xs:anyURI b64 = xs:base64Binary bool = xs:boolean dat = xs:date gDay = xs:gDay dbl = xs:double dec = xs:decimal dT = xs:dateTime dTD = xs:dayTimeDuration dur = xs:duration flt = xs:float hxB = xs:hexBinary gMD = xs:gMonthDay gMon = xs:gMonth int = xs:integer NOT = xs:NOTATION QN = xs:QName str = xs:string tim = xs:time gYM = xs:gYearMonth yMD = xs:yearMonthDuration gYr = xs:gYear

In the following table, the notation S\T indicates that the source (S) of the conversion is indicated in the column below the notation and that the target (T) is indicated in the row to the right of the notation.

S\T uA str flt dbl dec int dur yMD dTD dT tim dat gYM gYr gMD gDay gMon bool b64 hxB aURI QN NOT
uA Y Y M M M M M M M M M M M M M M M M M M M N N
str Y Y M M M M M M M M M M M M M M M M M M M M M
flt Y Y Y Y M M N N N N N N N N N N N Y N N N N N
dbl Y Y Y Y M M N N N N N N N N N N N Y N N N N N
dec Y Y Y Y Y Y N N N N N N N N N N N Y N N N N N
int Y Y Y Y Y Y N N N N N N N N N N N Y N N N N N
dur Y Y N N N N Y Y Y N N N N N N N N N N N N N N
yMD Y Y N N N N Y Y Y N N N N N N N N N N N N N N
dTD Y Y N N N N Y Y Y N N N N N N N N N N N N N N
dT Y Y N N N N N N N Y Y Y Y Y Y Y Y N N N N N N
tim Y Y N N N N N N N N Y N N N N N N N N N N N N
dat Y Y N N N N N N N Y N Y Y Y Y Y Y N N N N N N
gYM Y Y N N N N N N N N N N Y N N N N N N N N N N
gYr Y Y N N N N N N N N N N N Y N N N N N N N N N
gMD Y Y N N N N N N N N N N N N Y N N N N N N N N
gDay Y Y N N N N N N N N N N N N N Y N N N N N N N
gMon Y Y N N N N N N N N N N N N N N Y N N N N N N
bool Y Y Y Y Y Y N N N N N N N N N N N Y N N N N N
b64 Y Y N N N N N N N N N N N N N N N N Y Y N N N
hxB Y Y N N N N N N N N N N N N N N N N Y Y N N N
aURI Y Y N N N N N N N N N N N N N N N N N N Y N N
QN Y Y N N N N N N N N N N N N N N N N N N N Y M
NOT Y Y N N N N N N N N N N N N N N N N N N N Y M

The following sub-sections define the semantics of casting from a primitive type to a primitive type. Semantics of casting to and from a derived type are defined in sections , , and .

Casting from xs:string and xs:untypedAtomic

When the supplied value is an instance of xs:string or an instance of xs:untypedAtomic, it is treated as being a string value and mapped to a typed value of the target type as defined in . Whitespace normalization is applied as indicated by the whiteSpace facet for the datatype. The resulting whitespace-normalized string must be a valid lexical form for the datatype. The semantics of casting are identical to XML Schema validation. For example, "13" cast as xs:unsignedInt returns the xs:unsignedInt typed value 13. This could also be written xs:unsignedInt("13").

When casting from xs:string or xs:untypedAtomic to a derived type where the derived type is restricted by a pattern facet, the lexical form is first checked against the pattern before further casting is attempted (See ). If the lexical form does not conform to the pattern, error is raised.

Consider a user-defined Schema whose target namespace is bound to the prefix mySchema which defines a restriction of xs:boolean called trueBool which allows only the lexical forms 1 and 0 . "true" cast as mySchema:trueBool would fail with . If the Schema also defines a datatype called height as a restriction of xs:integer with a maximum value of 84 then "100" cast as mySchema:height would also fail with .

Casting is permitted from xs:string and xs:untypedAtomic to any primitive atomic type or any atomic type derived by restriction, except xs:QName or xs:NOTATION. Casting to xs:NOTATION is not permitted because it is an abstract type.

Casting is permitted from xs:string literals to xs:QName and types derived from xs:NOTATION. If the argument to such a cast is computed dynamically, is raised if the value is of any type other than xs:QName or xs:NOTATION (including the case where it is an xs:string). The process is described in more detail in .

This version of the specification allows casting between xs:QName and xs:NOTATION in either direction; this was not permitted in the previous Recommendation.

In casting to numerics, if the value is too large or too small to be accurately represented by the implementation, it is handled as an overflow or underflow as defined in .

In casting to xs:decimal or to a type derived from xs:decimal, if the value is not too large or too small but nevertheless cannot be represented accurately with the number of decimal digits available to the implementation, the implementation may round to the nearest representable value or may raise a dynamic error . The choice of rounding algorithm and the choice between rounding and error behavior and is implementation-defined.

In casting to xs:date, xs:dateTime, xs:gYear, or xs:gYearMonth (or types derived from these), if the value is too large or too small to be represented by the implementation, error is raised.

In casting to a duration value, if the value is too large or too small to be represented by the implementation, error is raised.

For xs:anyURI, the extent to which an implementation validates the lexical form of xs:anyURI is .

If the cast fails for any other reason, error is raised.

Casting to xs:string and xs:untypedAtomic

Casting is permitted from any primitive type to the primitive types xs:string and xs:untypedAtomic.

When a value of any simple type is cast as xs:string, the derivation of the xs:string value TV depends on the ST and on the SV, as follows.

If ST is xs:string or a type derived from xs:string, TV is SV.

If ST is xs:anyURI, the type conversion is performed without escaping any characters.

If ST is xs:QName or xs:NOTATION:

if the qualified name has a prefix, then TV is the concatenation of the prefix of SV, a single colon (:), and the local name of SV.

otherwise TV is the local-name.

If ST is a numeric type, the following rules apply:

If ST is xs:integer, TV is the canonical lexical representation of SV as defined in . There is no decimal point.

If ST is xs:decimal, then:

If SV is in the value space of xs:integer, that is, if there are no significant digits after the decimal point, then the value is converted from an xs:decimal to an xs:integer and the resulting xs:integer is converted to an xs:string using the rule above.

Otherwise, the canonical lexical representation of SV is returned, as defined in .

If ST is xs:float or xs:double, then:

TV will be an xs:string in the lexical space of xs:double or xs:float that when converted to an xs:double or xs:float under the rules of produces a value that is equal to SV, or is NaN if SV is NaN. In addition, TV must satisfy the constraints in the following sub-bullets.

If SV has an absolute value that is greater than or equal to 0.000001 (one millionth) and less than 1000000 (one million), then the value is converted to an xs:decimal and the resulting xs:decimal is converted to an xs:string according to the rules above, as though using an implementation of xs:decimal that imposes no limits on the totalDigits or fractionDigits facets.

If SV has the value positive or negative zero, TV is "0" or "-0" respectively.

If SV is positive or negative infinity, TV is the string "INF" or "-INF" respectively.

In other cases, the result consists of a mantissa, which has the lexical form of an xs:decimal, followed by the letter "E", followed by an exponent which has the lexical form of an xs:integer. Leading zeroes and "+" signs are prohibited in the exponent. For the mantissa, there must be a decimal point, and there must be exactly one digit before the decimal point, which must be non-zero. The "+" sign is prohibited. There must be at least one digit after the decimal point. Apart from this mandatory digit, trailing zero digits are prohibited.

The above rules allow more than one representation of the same value. For example, the xs:float value whose exact decimal representation is 1.26743223E15 might be represented by any of the strings "1.26743223E15", "1.26743222E15" or "1.26743224E15" (inter alia). It is implementation-dependent which of these representations is chosen.

If ST is xs:dateTime, xs:date or xs:time, TV is the local value. The components of TV are individually cast to xs:string using the functions described in and the results are concatenated together. The year component is cast to xs:string using eg:convertYearToString. The month, day, hour and minute components are cast to xs:string using eg:convertTo2CharString. The second component is cast to xs:string using eg:convertSecondsToString. The timezone component, if present, is cast to xs:string using eg:convertTZtoString.

Note that the hours component of the resulting string will never be "24". Midnight is always represented as "00:00:00".

If ST is xs:yearMonthDuration or xs:dayTimeDuration, TV is the canonical representation of SV as defined in or , respectively.

If ST is xs:duration then let SYM be SV cast as xs:yearMonthDuration, and let SDT be SV cast as xs:dayTimeDuration; Now, let the next intermediate value, TYM, be SYM cast as TT , and let TDT be SDT cast as TT . If TYM is "P0M", then TV is TDT. Otherwise, TYM and TDT are merged according to the following rules:

If TDT is "PT0S", then TV is TYM.

Otherwise, TV is the concatenation of all the characters in TYM and all the characters except the first "P" and the optional negative sign in TDT.

In all other cases, TV is the canonical representation of SV. For datatypes that do not have a canonical lexical representation defined an canonical representation may be used.

To cast as xs:untypedAtomic the value is cast as xs:string, as described above, and the type annotation changed to xs:untypedAtomic.

The string representations of numeric values are backwards compatible with XPath 1.0 except for the special values positive and negative infinity, negative zero and values outside the range 1.0e-6 to 1.0e+6.

Casting to numeric types Casting to xs:float

When a value of any simple type is cast as xs:float, the xs:float TV is derived from the ST and the SV as follows:

If ST is xs:float, then TV is SV and the conversion is complete.

If ST is xs:double, then TV is obtained as follows:

if SV is the xs:double value INF, -INF, NaN, positive zero, or negative zero, then TV is the xs:float value INF, -INF, NaN, positive zero, or negative zero respectively.

otherwise, SV can be expressed in the form m × 2^e where the mantissa m and exponent e are signed xs:integers whose value range is defined in , and the following rules apply:

if m (the mantissa of SV) is outside the permitted range for the mantissa of an xs:float value (-2^24-1 to +2^24-1), then it is divided by 2^N where N is the lowest positive xs:integer that brings the result of the division within the permitted range, and the exponent e is increased by N. This is integer division (in effect, the binary value of the mantissa is truncated on the right). Let M be the mantissa and E the exponent after this adjustment.

if E exceeds 104 (the maximum exponent value in the value space of xs:float) then TV is the xs:float value INF or -INF depending on the sign of M.

if E is less than -149 (the minimum exponent value in the value space of xs:float) then TV is the xs:float value positive or negative zero depending on the sign of M

otherwise, TV is the xs:float value M × 2^E.

If ST is xs:decimal, or xs:integer, then TV is xs:float( SV cast as xs:string) and the conversion is complete.

If ST is xs:boolean, SV is converted to 1.0E0 if SV is true and to 0.0E0 if SV is false and the conversion is complete.

If ST is xs:untypedAtomic or xs:string, see .

Implementations return negative zero for "-0.0E0" cast as xs:float. does not distinguish between the values positive zero and negative zero.

Casting to xs:double

When a value of any simple type is cast as xs:double, the xs:double value TV is derived from the ST and the SV as follows:

If ST is xs:double, then TV is SV and the conversion is complete.

If ST is xs:float or a type derived from xs:float, then TV is obtained as follows:

if SV is the xs:float value INF, -INF, NaN, positive zero, or negative zero, then TV is the xs:double value INF, -INF, NaN, positive zero, or negative zero respectively.

otherwise, SV can be expressed in the form m × 2^e where the mantissa m and exponent e are signed xs:integer values whose value range is defined in , and TV is the xs:double value m × 2^e.

If ST is xs:decimal or xs:integer, then TV is xs:double( SV cast as xs:string) and the conversion is complete.

If ST is xs:boolean, SV is converted to 1.0E0 if SV is true and to 0.0E0 if SV is false and the conversion is complete.

If ST is xs:untypedAtomic or xs:string, see .

Implementations return negative zero for "-0.0E0" cast as xs:double. does not distinguish between the values positive zero and negative zero.

Casting to xs:decimal

When a value of any simple type is cast as xs:decimal, the xs:decimal value TV is derived from ST and SV as follows:

If ST is xs:decimal, xs:integer or a type derived from them, then TV is SV, converted to an xs:decimal value if need be, and the conversion is complete.

If ST is xs:float or xs:double, then TV is the xs:decimal value, within the set of xs:decimal values that the implementation is capable of representing, that is numerically closest to SV. If two values are equally close, then the one that is closest to zero is chosen. If SV is too large to be accommodated as an xs:decimal, (see for limits on numeric values) an error is raised . If SV is one of the special xs:float or xs:double values NaN, INF, or -INF, an error is raised .

If ST is xs:boolean, SV is converted to 1.0 if SV is 1 or true and to 0.0 if SV is 0 or false and the conversion is complete.

If ST is xs:untypedAtomic or xs:string, see .

Casting to xs:integer

When a value of any simple type is cast as xs:integer, the xs:integer value TV is derived from ST and SV as follows:

If ST is xs:integer, or a type derived from xs:integer, then TV is SV, converted to an xs:integer value if need be, and the conversion is complete.

If ST is xs:decimal, xs:float or xs:double, then TV is SV with the fractional part discarded and the value converted to xs:integer. Thus, casting 3.1456 returns 3 and -17.89 returns -17. Casting 3.124E1 returns 31. If SV is too large to be accommodated as an integer, (see for limits on numeric values) an error is raised . If SV is one of the special xs:float or xs:double values NaN, INF, or -INF, an error is raised .

If ST is xs:boolean, SV is converted to 1 if SV is 1 or true and to 0 if SV is 0 or false and the conversion is complete.

If ST is xs:untypedAtomic or xs:string, see .

Casting to duration types

When a value of type xs:untypedAtomic, xs:string, a type derived from xs:string, xs:yearMonthDuration or xs:dayTimeDuration is cast as xs:duration, xs:yearMonthDuration or xs:dayTimeDuration, TV is derived from ST and SV as follows:

If ST is the same as TT, then TV is SV.

If ST is xs:duration, or a type derived from xs:duration, but not xs:dayTimeDuration or a type derived from xs:dayTimeDuration, and TT is xs:yearMonthDuration, then TV is derived from SV by removing the day, hour, minute and second components from SV.

If ST is xs:duration, or a type derived from duration, but not xs:yearMonthDuration or a type derived from xs:yearMonthDuration, and TT is xs:dayTimeDuration, then TV is derived from SV by removing the year and month components from SV.

If ST is xs:yearMonthDuration or xs:dayTimeDuration, and TT is xs:duration, then TV is derived from SV as discussed in .

If ST is xs:yearMonthDuration and TT is xs:dayTimeDuration, the cast is permitted and returns a xs:dayTimeDuration with value 0.0 seconds.

If ST is xs:dayTimeDuration and TT is xs:yearMonthDuration, the cast is permitted and returns a xs:yearMonthDuration with value 0 months.

If ST is xs:untypedAtomic or xs:string, see .

Note that casting from xs:duration to xs:yearMonthDuration or xs:dayTimeDuration loses information. To avoid this, users can cast the xs:duration value to both an xs:yearMonthDuration and an xs:dayTimeDuration and work with both values.

Casting to date and time types

In several situations, casting to date and time types requires the extraction of a component from SV or from the result of fn:current-dateTime and converting it to an xs:string. These conversions must follow certain rules. For example, converting an xs:integer year value requires converting to an xs:string with four or more characters, preceded by a minus sign if the value is negative.

This document defines four functions to perform these conversions. These functions are for illustrative purposes only and make no recommendations as to style or efficiency. References to these functions from the following text are not normative.

The arguments to these functions come from functions defined in this document. Thus, the functions below assume that they are correct and do no range checking on them.

declare function eg:convertYearToString($year as xs:integer) as xs:string { let $plusMinus := if ($year >= 0) then "" else "-" let $yearString := fn:abs($year) cast as xs:string let $length := fn:string-length($yearString) return if ($length = 1) then fn:concat($plusMinus, "000", $yearString) else if ($length = 2) then fn:concat($plusMinus, "00", $yearString) else if ($length = 3) then fn:concat($plusMinus, "0", $yearString) else fn:concat($plusMinus, $yearString) } declare function eg:convertTo2CharString($value as xs:integer) as xs:string { let $string := $value cast as xs:string return if (fn:string-length($string) = 1) then fn:concat("0", $string) else $string } declare function eg:convertSecondsToString($seconds as xs:decimal) as xs:string { let $string := $seconds cast as xs:string let $intLength := fn:string-length(($seconds cast as xs:integer) cast as xs:string) return if ($intLength = 1) then fn:concat("0", $string) else $string } declare function eg:convertTZtoString($tz as xs:dayTimeDuration?) as xs:string { if (empty($tz)) then "" else if ($tz eq xs:dayTimeDuration('PT0S')) then "Z" else let $tzh := fn:hours-from-duration($tz) let $tzm := fn:minutes-from-duration($tz) let $plusMinus := if ($tzh >= 0) then "+" else "-" let $tzhString := eg:convertTo2CharString(fn:abs($tzh)) let $tzmString := eg:convertTo2CharString(fn:abs($tzm)) return fn:concat($plusMinus, $tzhString, ":", $tzmString) }

Conversion from primitive types to date and time types follows the rules below.

When a value of any primitive type is cast as xs:dateTime, the xs:dateTime value TV is derived from ST and SV as follows:

If ST is xs:dateTime, then TV is SV.

If ST is xs:date, then let SYR be eg:convertYearToString( fn:year-from-date( SV )), let SMO be eg:convertTo2CharString( fn:month-from-date( SV )), let SDA be eg:convertTo2CharString( fn:day-from-date( SV )) and let STZ be eg:convertTZtoString( fn:timezone-from-date( SV )); TV is xs:dateTime( fn:concat( SYR , '-', SMO , '-', SDA , 'T00:00:00 ', STZ ) ).

If ST is xs:untypedAtomic or xs:string, see .

When a value of any primitive type is cast as xs:time, the xs:time value TV is derived from ST and SV as follows:

If ST is xs:time, then TV is SV.

If ST is xs:dateTime, then TV is xs:time( fn:concat( eg:convertTo2CharString( fn:hours-from-dateTime( SV )), ':', eg:convertTo2CharString( fn:minutes-from-dateTime( SV )), ':', eg:convertSecondsToString( fn:seconds-from-dateTime( SV )), eg:convertTZtoString( fn:timezone-from-dateTime( SV )) )).

If ST is xs:untypedAtomic or xs:string, see .

When a value of any primitive type is cast as xs:date, the xs:date value TV is derived from ST and SV as follows:

If ST is xs:date, then TV is SV.

If ST is xs:dateTime, then let SYR be eg:convertYearToString( fn:year-from-dateTime( SV )), let SMO be eg:convertTo2CharString( fn:month-from-dateTime( SV )), let SDA be eg:convertTo2CharString( fn:day-from-dateTime( SV )) and let STZ be eg:convertTZtoString(fn:timezone-from-dateTime( SV )); TV is xs:date( fn:concat( SYR , '-', SMO , '-', SDA, STZ ) ).

If ST is xs:untypedAtomic or xs:string, see .

When a value of any primitive type is cast as xs:gYearMonth, the xs:gYearMonth value TV is derived from ST and SV as follows:

If ST is xs:gYearMonth, then TV is SV.

If ST is xs:dateTime, then let SYR be eg:convertYearToString( fn:year-from-dateTime( SV )), let SMO be eg:convertTo2CharString( fn:month-from-dateTime( SV )) and let STZ be eg:convertTZtoString( fn:timezone-from-dateTime( SV )); TV is xs:gYearMonth( fn:concat( SYR , '-', SMO, STZ ) ).

If ST is xs:date, then let SYR be eg:convertYearToString( fn:year-from-date( SV )), let SMO be eg:convertTo2CharString( fn:month-from-date( SV )) and let STZ be eg:convertTZtoString( fn:timezone-from-date( SV )); TV is xs:gYearMonth( fn:concat( SYR , '-', SMO, STZ ) ).

If ST is xs:untypedAtomic or xs:string, see .

When a value of any primitive type is cast as xs:gYear, the xs:gYear value TV is derived from ST and SV as follows:

If ST is xs:gYear, then TV is SV.

If ST is xs:dateTime, let SYR be eg:convertYearToString( fn:year-from-dateTime( SV )) and let STZ be eg:convertTZtoString( fn:timezone-from-dateTime( SV )); TV is xs:gYear(fn:concat( SYR, STZ )).

If ST is xs:date, let SYR be eg:convertYearToString( fn:year-from-date( SV )); and let STZ be eg:convertTZtoString( fn:timezone-from-date( SV )); TV is xs:gYear(fn:concat( SYR, STZ )).

If ST is xs:untypedAtomic or xs:string, see .

When a value of any primitive type is cast as xs:gMonthDay, the xs:gMonthDay value TV is derived from ST and SV as follows:

If ST is xs:gMonthDay, then TV is SV.

If ST is xs:dateTime, then let SMO be eg:convertTo2CharString( fn:month-from-dateTime( SV )), let SDA be eg:convertTo2CharString( fn:day-from-dateTime( SV )) and let STZ be eg:convertTZtoString( fn:timezone-from-dateTime( SV )); TV is xs:gYearMonth( fn:concat( '--', SMO '-', SDA, STZ ) ).

If ST is xs:date, then let SMO be eg:convertTo2CharString( fn:month-from-date( SV )), let SDA be eg:convertTo2CharString( fn:day-from-date( SV )) and let STZ be eg:convertTZtoString( fn:timezone-from-date( SV )); TV is xs:gYearMonth( fn:concat( '--', SMO , '-', SDA, STZ ) ).

If ST is xs:untypedAtomic or xs:string, see .

When a value of any primitive type is cast as xs:gDay, the xs:gDay value TV is derived from ST and SV as follows:

If ST is xs:gDay, then TV is SV.

If ST is xs:dateTime, then let SDA be eg:convertTo2CharString( fn:day-from-dateTime( SV )) and let STZ be eg:convertTZtoString( fn:timezone-from-dateTime( SV )); TV is xs:gDay( fn:concat( '---', SDA, STZ )).

If ST is xs:date, then let SDA be eg:convertTo2CharString( fn:day-from-date( SV )) and let STZ be eg:convertTZtoString( fn:timezone-from-date( SV )); TV is xs:gDay( fn:concat( '---', SDA, STZ )).

If ST is xs:untypedAtomic or xs:string, see .

When a value of any primitive type is cast as xs:gMonth, the xs:gMonth value TV is derived from ST and SV as follows:

If ST is xs:gMonth, then TV is SV.

If ST is xs:dateTime, then let SMO be eg:convertTo2CharString( fn:month-from-dateTime( SV )) and let STZ be eg:convertTZtoString( fn:timezone-from-dateTime( SV )); TV is xs:gMonth( fn:concat( '--' , SMO, STZ )).

If ST is xs:date, then let SMO be eg:convertTo2CharString( fn:month-from-date( SV )) and let STZ be eg:convertTZtoString( fn:timezone-from-date( SV )); TV is xs:gMonth( fn:concat( '--', SMO, STZ )).

If ST is xs:untypedAtomic or xs:string, see .

Casting to xs:boolean

When a value of any primitive type is cast as xs:boolean, the xs:boolean value TV is derived from ST and SV as follows:

If ST is xs:boolean, then TV is SV.

If ST is xs:float, xs:double, xs:decimal or xs:integer and SV is 0, +0, -0, 0.0, 0.0E0 or NaN, then TV is false.

If ST is xs:float, xs:double, xs:decimal or xs:integer and SV is not one of the above values, then TV is true.

If ST is xs:untypedAtomic or xs:string, see .

Casting to xs:base64Binary and xs:hexBinary

Values of type xs:base64Binary can be cast as xs:hexBinary and vice versa, since the two types have the same value space. Casting to xs:base64Binary and xs:hexBinary is also supported from the same type and from xs:untypedAtomic, xs:string and subtypes of xs:string using semantics.

Casting to xs:anyURI

Casting to xs:anyURI is supported only from the same type, xs:untypedAtomic or xs:string.

When a value of any primitive type is cast as xs:anyURI, the xs:anyURI value TV is derived from the ST and SV as follows:

If ST is xs:untypedAtomic or xs:string see .

Casting to xs:QName and xs:NOTATION

Casting from xs:string or xs:untypedAtomic to xs:QName or xs:NOTATION is described in .

It is also possible to cast from xs:NOTATION to xs:QName, or from xs:QName to any type derived by restriction from xs:NOTATION. (Casting to xs:NOTATION itself is not allowed, because xs:NOTATION is an abstract type.) The resulting xs:QName or xs:NOTATION has the same prefix, local name, and namespace URI parts as the supplied value.

Casting to derived types

Casting a value to a derived type can be separated into four cases. Note that xs:untypedAtomic, xs:integer and the two derived types of xs:duration:xs:yearMonthDuration and xs:dayTimeDuration are treated as primitive types.

When SV is an instance of a type that is derived by restriction from TT. This is described in section .

When SV is an instance of a type derived by restriction from the same primitive type as TT. This is described in .

When the derived type is derived, directly or indirectly, from a different primitive type than the primitive type of ST. This is described in .

When SV is an instance of the TT, the cast always succeeds (Identity cast).

Casting from derived types to parent types

Except in the case of xs:NOTATION, it is always possible to cast a value of any atomic type to an atomic type from which it is derived, directly or indirectly, by restriction. For example, it is possible to cast an xs:unsignedShort to an xs:unsignedInt, an xs:integer, or an xs:decimal. Since the value space of the original type is a subset of the value space of the target type, such a cast is always successful. The result will have the same value as the original, but will have a new type annotation.

Casting within a branch of the type hierarchy

It is possible to cast an SV to a TT if the type of the SV and the TT type are both derived by restriction (directly or indirectly) from the same primitive type, provided that the supplied value conforms to the constraints implied by the facets of the target type. This includes the case where the target type is derived from the type of the supplied value, as well as the case where the type of the supplied value is derived from the target type. For example, an instance of xs:byte can be cast as xs:unsignedShort, provided the value is not negative.

If the value does not conform to the facets defined for the target type, then an error is raised . See . In the case of the pattern facet (which applies to the lexical space rather than the value space), the pattern is tested against the canonical lexical representation of the value, as defined for the source type (or the result of casting the value to an xs:string, in the case of types that have no canonical lexical representation defined for them).

Note that this will cause casts to fail if the pattern excludes the canonical lexical representation of the source type. For example, if the type my:distance is defined as a restriction of xs:decimal with a pattern that requires two digits after the decimal point, casting of an xs:integer to my:distance will always fail, because the canonical representation of an xs:integer does not conform to this pattern.

In some cases, casting from a parent type to a derived type requires special rules. See for rules regarding casting to xs:yearMonthDuration and xs:dayTimeDuration. See , below, for casting to xs:ENTITY and types derived from it.

Casting to xs:ENTITY

says that The value space of ENTITY is the set of all strings that match the NCName production ... and have been declared as an unparsed entity in a document type definition. However, and do not check that constructed values of type xs:ENTITY match declared unparsed entities. Thus, this rule is relaxed in this specification and, in casting to xs:ENTITY and types derived from it, no check is made that the values correspond to declared unparsed entities.

Casting across the type hierarchy

When the ST and the TT are derived, directly or indirectly, from different primitive types, this is called casting across the type hierarchy. Casting across the type hierarchy is logically equivalent to three separate steps performed in order. Errors can occur in either of the latter two steps.

Cast the SV, up the hierarchy, to the primitive type of the source, as described in .

If SV is an instance of xs:string or xs:untypedAtomic, check its value against the pattern facet of TT, and raise an error if the check fails.

Cast the value to the primitive type of TT, as described in .

If TT is derived from xs:NOTATION, assume for the purposes of this rule that casting to xs:NOTATION succeeds.

Cast the value down to the TT, as described in

References Normative References IEEE. IEEE Standard for Binary Floating-Point Arithmetic. Unicode Technical Standard #35, Locale Data Markup Language. Available at: http://www.unicode.org/unicode/reports/tr35/ IETF. RFC 2396: Uniform Resource Identifiers (URI): Generic Syntax. Available at: http://www.ietf.org/rfc/rfc2396.txt IETF. RFC 3986: Uniform Resource Identifiers (URI): Generic Syntax. Available at: http://www.ietf.org/rfc/rfc3986.txt IETF. RFC 3987: Internationalized Resource Identifiers (IRIs). Available at: http://www.ietf.org/rfc/rfc3987.txt Character Model for the World Wide Web 1.0: Fundamentals. Available at: http://www.w3.org/TR/2005/REC-charmod-20050215/ Character Model for the World Wide Web 1.0: Normalization, Last Call Working Draft. Available at: http://www.w3.org/TR/2004/WD-charmod-norm-20040225/ ISO (International Organization for Standardization) Codes for the representation of names of countries and their subdivisions - Part 1: Country codes ISO 3166-1:1997. ISO (International Organization for Standardization). ISO/IEC 10967-1:1994, Information technology—Language Independent Arithmetic—Part 1: Integer and floating point arithmetic [Geneva]: International Organization for Standardization, 1994. Available from: http://www.iso.ch/ The Unicode Consortium, Reading, MA, Addison-Wesley, 2003. The Unicode Standard as updated from time to time by the publication of new versions. See http://www.unicode.org/unicode/standard/versions for the latest version and additional information on versions of the standard and of the Unicode Character Database. The version of Unicode to be used is , but implementations are recommended to use the latest Unicode version; currently, Version 4.0.00, Addison-Wesley, 2003 ISBN 0-321-18578-1 Unicode Technical Standard #10, Unicode Collation Algorithm. Available at: http://www.unicode.org/unicode/reports/tr10/ Unicode Technical Standard #18, Unicode Regular Expressions. Available at: http://www.unicode.org/unicode/reports/tr18/ World Wide Web Consortium. Extensible Markup Language (XML) 1.0 Third Edition. Available at: http://www.w3.org/TR/REC-xml World Wide Web Consortium. Extensible Markup Language (XML) 1.1. Available at: http://www.w3.org/TR/2004/REC-xml11-20040204/ XML Schema Part 2: Datatypes Second Edition, Oct. 28 2004. Available at: http://www.w3.org/TR/xmlschema-2/ XML Schema 1.1 Part 2: Datatypes, 30 January 2009. Available at: http://www.w3.org/TR/xmlschema11-2/ Namespaces in XML. Available at: http://www.w3.org/TR/1999/REC-xml-names-19990114/ Non-normative References Edward M. Reingold and Nachum Dershowitz. Calendrical Calculations Millennium edition (2nd Edition). Cambridge University Press, ISBN 0 521 77752 6 HTML 4.01 Recommendation, 24 December 1999. Available at: http://www.w3.org/TR/REC-html40/ ISO (International Organization for Standardization). Representations of dates and times, 2000-08-03. Available from: http://www.iso.ch/" World Wide Web Consortium Working Group Note. Working With Timezones, October 13, 2005. Available at: http://www.w3.org/TR/2005/NOTE-timezone-20051013/ Error Summary

The error text provided with these errors is non-normative.

Unidentified error.

This error is raised whenever an attempt is made to divide by zero.

This error is raised whenever numeric operations result in an overflow or underflow.

This error is raised if the decimal format name supplied to fn:format-number is not a valid QName, or if the prefix in the QName is undeclared, or if there is no decimal format in the static context with a matching name.

This error is raised if the picture string supplied to fn:format-number has invalid syntax.

This error is raised if the picture string supplied to fn:format-date, fn:format-time, or fn:format-dateTime has invalid syntax.

This error is raised if the picture string supplied to fn:format-date selects a component that is not present in a date, or if the picture string supplied to fn:format-time selects a component that is not present in a time.

Compatibility with XPath 1.0

This appendix summarizes the relationship between certain functions defined in and the corresponding functions defined in this document. The first column of the table provides the signature of functions defined in this document. The second column provides the signature of the corresponding function in . The third column describes the differences in the semantics of the corresponding functions. The functions appear in the order they appear in .

The evaluation of the arguments to the functions defined in this document depends on whether the XPath 1.0 compatibility mode is on or off. See . If the mode is on, the following conversions are applied, in order, before the argument value is passed to the function:

If the expected type is a single item or an optional single item, (examples: xs:string, xs:string?, xs:untypedAtomic, xs:untypedAtomic?, node(), node()?, item(), item()?), then the given value V is effectively replaced by fn:subsequence(V, 1, 1).

If the expected type is xs:string or xs:string?, then the given value V is effectively replaced by fn:string(V).

If the expected type is numeric or optional numeric, then the given value V is effectively replaced by fn:number(V).

Otherwise, the given value is unchanged.

XQuery 1.0 and XPath 2.0 XPath 1.0 Notes
last() => number Precision of numeric results may be different.
position() => number Precision of numeric results may be different.
count(node-set) => number Precision of numeric results may be different.
id(object) => node-set XPath 2.0 behavior is different for boolean and numeric arguments. The recognition of a node as an id value is sensitive to the manner in which the datamodel is constructed. In XPath 1.0 the whole string is treated as a unit. In XPath 2.0 each string is treated as a list.
local-name(node-set?) => string If compatibility mode is off, an error will occur if argument has more than one node.
namespace-uri(node-set?) => string If compatibility mode is off, an error will occur if argument has more than one node.
name(node-set?) => string If compatibility mode is off, an error will occur if argument has more than one node. The rules for determining the prefix are more precisely defined in . Function is not "well-defined" for parentless attribute nodes.
string(object) => string If compatibility mode is off, an error will occur if argument has more than one node. Representations of numeric values are XPath 1.0 compatible except for the special values positive and negative infinity, and for values outside the range 1.0e-6 to 1.0e+6.
concat(string, string, string*) => string If compatibility mode is off, an error will occur if an argument has more than one node. If compatibility mode on, the first node in the sequence is used.
starts-with(string, string) => boolean If compatibility mode is off, an error will occur if either argument has more than one node or is a number or a boolean. If compatibility mode is on, implicit conversion is performed.
contains(string, string) => boolean If compatibility mode is off, an error will occur if either argument has more than one node or is a number or a boolean. If compatibility mode is on, implicit conversion is performed.
substring-before(string, string) => string If compatibility mode is off, an error will occur if either argument has more than one node or is a number or a boolean. If compatibility mode is on, implicit conversion is performed.
substring-after(string, string) => string If compatibility mode is off, an error will occur if either argument has more than one node or is a number or a boolean. If compatibility mode is on, implicit conversion is performed.
substring(string, number, number?) => string If compatibility mode is off, an error will occur if $sourceString has more than one node or is a number or a boolean. If compatibility mode is on, implicit conversion is performed.
string-length(string?) => number If compatibility mode is off, numbers and booleans will give errors for first arg. Also, multiple nodes will give error.
normalize-space(string?) => string If compatibility mode is off, an error will occur if $arg has more than one node or is a number or a boolean. If compatibility mode is on, implicit conversion is performed.
translate(string, string, string)=> string .
boolean(object) => boolean
not(boolean) => boolean
true() => boolean
false() => boolean
lang(string) => boolean If compatibility mode is off, numbers and booleans will give errors. Also, multiple nodes will give error. If compatibility mode is on, implicit conversion is performed.
number(object?) => number Error if argument has more than one node when not in compatibility node.
sum(node-set) => number 2.0 raises an error if sequence contains values that cannot be added together such as NMTOKENS and other subtypes of string. 1.0 returns NaN.
floor(number)=> number In 2.0, if argument is (), the result is (). In 1.0, the result is NaN. If compatibility mode is off, an error will occur with more than one node. If compatibility mode is on, implicit conversion is performed.
ceiling(number)=> number In 2.0, if argument is (), the result is (). In 1.0, the result is NaN. If compatibility mode is off, an error will occur with more than one node. If compatibility mode is on, implicit conversion is performed.
round(number)=> number In 2.0, if argument is (), the result is (). In 1.0, the result is NaN. If compatibility mode is off, an error will occur with more than one node. If compatibility mode is on, implicit conversion is performed.
Illustrative User-written Functions

Certain functions that were proposed for inclusion in this function library have been excluded on the basis that it is straightforward for users to implement these functions themselves using XSLT 2.0 or XQuery 1.0.

This Appendix provides sample implementations of some of these functions.

To emphasize that these functions are examples of functions that vendors may write, their names carry the prefix 'eg'. Vendors are free to define such functions in any namespace. A group of vendors may also choose to create a collection of such useful functions and put them in a common namespace.

eg:if-empty and eg:if-absent

In some situations, users may want to provide default values for missing information that may be signaled by elements that are omitted, have no value or have the empty sequence as their value. For example, a missing middle initial may be indicated by omitting the element or a non-existent bonus signaled with an empty sequence. This section includes examples of functions that provide such defaults. These functions return xs:anyAtomicType*. Users may want to write functions that return more specific types.

eg:if-empty

If the first argument is the empty sequence or an element without simple or complex content, if-empty() returns the second argument; otherwise, it returns the content of the first argument.

XSLT implementation

<xsl:function name="eg:if-empty" as="xs:anyAtomicType*"> <xsl:param name="node" as="node()?"/> <xsl:param name="value" as="xs:anyAtomicType"/> <xsl:choose> <xsl:when test="$node and $node/child::node()"> <xsl:sequence select="fn:data($node)"/> </xsl:when> <xsl:otherwise> <xsl:sequence select="$value"/> </xsl:otherwise> </xsl:choose> </xsl:function>

XQuery implementation

declare function eg:if-empty ( $node as node()?, $value as xs:anyAtomicType) as xs:anyAtomicType* { if ($node and $node/child::node()) then fn:data($node) else $value }
eg:if-absent

If the first argument is the empty sequence, if-absent() returns the second argument; otherwise, it returns the content of the first argument.

XSLT implementation

<xsl:function name="eg:if-absent"> <xsl:param name="node" as="node()?"/> <xsl:param name="value" as="xs:anyAtomicType"/> <xsl:choose> <xsl:when test="$node"> <xsl:sequence select="fn:data($node)"/> </xsl:when> <xsl:otherwise> <xsl:sequence select="$value"/> </xsl:otherwise> </xsl:choose> </xsl:function>

XQuery implementation

declare function eg:if-absent ( $node as node()?, $value as xs:anyAtomicType) as xs:anyAtomicType* { if ($node) then fn:data($node) else $value }
union, intersect and except on sequences of values eg:value-union

This function returns a sequence containing all the distinct items in $arg1 and $arg2, in an undefined order.

XSLT implementation

xsl:function name="eg:value-union" as="xs:anyAtomicType*"> <xsl:param name="arg1" as="xs:anyAtomicType*"/> <xsl:param name="arg2" as="xs:anyAtomicType*"/> <xsl:sequence select="fn:distinct-values(($arg1, $arg2))"/> </xsl:function>

XQuery implementation

declare function eg:value-union ( $arg1 as xs:anyAtomicType*, $arg2 as xs:anyAtomicType*) as xs:anyAtomicType* { fn:distinct-values(($arg1, $arg2)) }
eg:value-intersect

This function returns a sequence containing all the distinct items that appear in both $arg1 and $arg2, in an undefined order.

XSLT implementation>

<xsl:function name="eg:value-intersect" as="xs:anyAtomicType*"> <xsl:param name="arg1" as="xs:anyAtomicType*"/> <xsl:param name="arg2" as="xs:anyAtomicType*"/> <xsl:sequence select="fn:distinct-values($arg1[.=$arg2])"/> </xsl:function>

XQuery implementation

declare function eg:value-intersect ( $arg1 as xs:anyAtomicType*, $arg2 as xs:anyAtomicType* ) as xs:anyAtomicType* { fn:distinct-values($arg1[.=$arg2]) }
eg:value-except

This function returns a sequence containing all the distinct items that appear in $arg1 but not in $arg2, in an undefined order.

XSLT implementation

<xsl:function name="eg:value-except" as="xs:anyAtomicType*"> <xsl:param name="arg1" as="xs:anyAtomicType*"/> <xsl:param name="arg2" as="xs:anyAtomicType*"/> <xsl:sequence select="fn:distinct-values($arg1[not(.=$arg2)])"/> </xsl:function>

XQuery implementation

declare function eg:value-except ( $arg1 as xs:anyAtomicType*, $arg2 as xs:anyAtomicType*) as xs:anyAtomicType* { fn:distinct-values($arg1[not(.=$arg2)]) }
eg:index-of-node

This function returns a sequence of positive integers giving the positions within the sequence $seqParam of nodes that are identical to $srchParam.

The nodes in the sequence $seqParam are compared with $srchParam under the rules for the is operator. If a node compares identical, then the position of that node in the sequence $srchParam is included in the result.

If the value of $seqParam is the empty sequence, or if no node in $seqParam matches $srchParam, then the empty sequence is returned.

The index is 1-based, not 0-based.

The result sequence is in ascending numeric order.

XSLT implementation

<xsl:function name="eg:index-of-node" as="xs:integer*"> <xsl:param name="sequence" as="node()*"/> <xsl:param name="srch" as="node()"/> <xsl:for-each select="$sequence"> <xsl:if test=". is $srch"> <xsl:sequence select="position()"/> </xsl:if> </xsl:for-each> </xsl:function>

XQuery implementation

declare function eg:index-of-node($sequence as node()*, $srch as node()) as xs:integer* { for $n at $i in $sequence where ($n is $srch) return $i }
eg:string-pad

Returns a xs:string consisting of a given number of copies of an xs:string argument concatenated together.

XSLT implementation

<xsl:function name="eg:string-pad" as="xs:string"> <xsl:param name="padString" as="xs:string?"/> <xsl:param name="padCount" as="xs:integer"/> <xsl:sequence select="fn:string-join((for $i in 1 to $padCount return $padString), '')"/> </xsl:function>

XQuery implementation

declare function eg:string-pad ( $padString as xs:string?, $padCount as xs:integer) as xs:string { fn:string-join((for $i in 1 to $padCount return $padString), "") }

This returns the zero-length string if $padString is the empty sequence, which is consistent with the general principle that if an xs:string argument is the empty sequence it is treated as if it were the zero-length string.

eg:distinct-nodes-stable

This function illustrates one possible implementation of a distinct-nodes function. It removes duplicate nodes by identity, preserving the first occurrence of each node.

XPath

$arg[empty(subsequence($arg, 1, position()-1) intersect .)]

XSLT implementation

<xsl:function name="eg:distinct-nodes-stable" as="node()*"> <xsl:param name="arg" as="node()*"/> <xsl:sequence select="$arg[empty(subsequence($arg, 1, position()-1) intersect .)]"/> </xsl:function>

XQuery implementation

declare function distinct-nodes-stable ($arg as node()*) as node()* { for $a at $apos in $arg let $before_a := fn:subsequence($arg, 1, $apos - 1) where every $ba in $before_a satisfies not($ba is $a) return $a }
Checklist of Implementation-Defined Features

This appendix provides a summary of features defined in this specification whose effect is explicitly . The conformance rules require vendors to provide documentation that explains how these choices have been exercised.

This list is incomplete in this working draft.

The destination of the trace output is . See .

For xs:integer operations, implementations that support limited-precision integer operations either raise an error or provide an mechanism that allows users to choose between raising an error and returning a result that is modulo the largest representable integer value. See .

For xs:decimal values the number of digits of precision returned by the numeric operators is . See . See also and

If the number of digits in the result of a numeric operation exceeds the number of digits that the implementation supports, the result is truncated or rounded in an manner. See . See also and

It is which version of Unicode is supported by the features defined in this specification, but it is recommended that the most recent version of Unicode be used. See .

For , conforming implementations support normalization form "NFC" and support normalization forms "NFD", "NFKC", "NFKD", "FULLY-NORMALIZED". They also support other normalization forms with semantics.

The ability to decompose strings into collation units suitable for substring matching is an property of a collation. See .

All minimally conforming processors support year values with a minimum of 4 digits (i.e., YYYY) and a minimum fractional second precision of 1 millisecond or three digits (i.e., s.sss). However, conforming processors set larger limits on the maximum number of digits they support in these two situations. See .

The result of casting a string to xs:decimal, when the resulting value is not too large or too small but nevertheless has too many decimal digits to be accurately represented, is implementation-defined. See .

Various aspects of the processing provided by are . Implementations may provide external configuration options that allow any aspect of the processing to be controlled by the user.

The manner in which implementations provide options to weaken the characteristic of and are .

Changes since previous Recommendation Substantive changes

The following changes have been made since the first edition of the Functions and Operators specification for XPath 2.0 and XQuery 1.0 published on 23 January 2007:

Errata E1 through E47 have been applied.

A two-argument version of the fn:round function has been introduced. (Bugzilla 6240)

A single-argument version of the fn:string-join function has been introduced.

Specifications for the functions fn:format-date, fn:format-time, and fn:format-dateTime have been transferred from the XSLT 2.0 specification.

The specification of fn:format-number has been transferred from the XSLT specification.

A function fn:format-integer is introduced.

The function fn:generate-id is introduced, transferred from the XSLT specification.

A range of trigonometric functions is defined (in a new namespace).

New functions fn:parse and fn:serialize are defined.

A new function fn:analyze-string is defined.

The syntax of regular expressions is extended to allow non-capturing groups.

A new flag is introduced for the $flags argument of functions that use regular expressions: the q flag causes all characters in a regular expression to be treated as ordinary characters rather than metacharacters.

Supporting the new language feature of higher-order functions, a number of functions are defined that operate on function items as their arguments.

The description of the fn:error function has been rewritten to allow for the introduction of try/catch facilities into XQuery and XSLT.

The section describing what it means for functions to be contextual and/or stable has been rewritten.

Editorial changes

The following editorial changes have been made since the first edition of the Functions and Operators specification for XPath 2.0 and XQuery 1.0 published on 23 January 2007. These are not explicitly marked in the change-highlighted version of the specification:

A quick reference section containing links to the functions has been added before the full table of contents.

The section on constructor functions has been moved so that it is now adjacent to the closely-related section on casting.

The function fn:dateTime has been moved out of the section describing constructor functions, and is no longer described as "a special constructor function". It is now an ordinary function described in the appropriate section along with other functions on dates and times. This allows the term "constructor function" to be associated exclusively with single-argument functions whose name is the same as the type name of the value that they return, and avoids any suggestion that this function has special behavior. Similarly, the functions fn:true and fn:false are no longer described as constructor functions.

Where a function is referred to by name, the reference is now always in the form (for example) fn:base-uri rather than fn:base-uri(). The latter form is used only to indicate a call on the function in which no arguments are supplied.

The specification of each function now consists of a set of standard subsections: Summary, Operator Mapping, Signature, Rules, Error Conditions, Notes, and Examples.

The "Summary" of the effect of each function is now just that: it never contains any information that cannot be found in the more detailed rules, and it does not attempt to list unusual or error conditions. Such rules have been moved into separate paragraphs. Sometimes the language used in the summary is relatively informal. Although the summary remains normative, it must be regarded as being subservient to the rules that follow.

Functions are always called, never invoked.

The specification no longer discusses functions, it now specifies or defines them.

Rules have been rewritten in a more consistent style: "If $arg is X, the function returns Y" (avoiding alternatives such as "Returns Y if $arg is X", and avoiding the passive "is returned").

The section heading for a section that defines a function is now always the name of the function. Some function definitions have been moved into subsections to achieve this.

Statements within the rules of a function that follow inevitably from other rules have in many cases been downgraded to notes. An example is the statement that remove($seq, N) returns an empty sequence if $seq is an empty sequence.

The functions for durations and those for dates/times have been split into separate sections.

The fn:boolean function has been moved from "General Functions and Operators on Sequences" to "Functions on Boolean Values".

In the interests of automating the testing of examples, the convention has been adopted that the result of an example expression is wherever possible given in the form of a simple but legal XPath expression. Specifically a numeric or string literal is used for numbers and strings; the expressions true() and false() for booleans; constructors such as xs:duration('PT0S') for other atomic types; expressions such as (1, 2, 3, 4) for sequences. The expression will always return a value of the correct type; so the xs:double value zero is shown as 0.0e0, not as 0, which is the way the value would be serialized on output. The value NaN is given as xs:double('NaN'). Previously results were sometimes given in this form, sometimes in the form of a serialization of the result value, and sometimes (particularly for dates, times, and durations) in the form of an informal description.

In some cases where one function can be readily specified in terms of another, the opportunity has been taken to simplify the specification. For example, all the operator support functions of the form op:xx-greater-than are now specified by reference to the corresponding op:xx-less-than function with the arguments reversed. This reduces the risk of introducing errors and inconsistencies.

In some cases, the rules for a function have been reordered. For example, the rule describing how an empty sequence is handled now generally comes before any rule that works only if the argument is not an empty sequence.

Some non-normative examples and notes have been added.