W3C

XQuery 1.0 and XPath 2.0 Functions and Operators

W3C Working Draft 16 August 2002

This version:
http://www.w3.org/TR/2002/WD-xquery-operators-20020816/
Latest version:
http://www.w3.org/TR/xquery-operators/
Previous version:
http://www.w3.org/TR/2002/WD-xquery-operators-20020430/
Editors:
Ashok Malhotra (XML Query and XSL WGs), Microsoft <ashokma@microsoft.com>
Jim Melton (XML Query WG), Oracle Corp <jim.melton@acm.org>
Jonathan Robie (XML Query WG), Data Direct Technologies <jonathan.robie@datadirect-technolgies.com>
Norman Walsh (XSL WG), Sun Microsystems <Norman.Walsh@Sun.COM>

Abstract

This document defines basic operators and functions on the datatypes defined in [XML Schema Part 2: Datatypes] for use in XQuery, XPath, XSLT and other related XML standards. It also discusses operators and functions on nodes and node sequences as defined in the [XQuery 1.0 and XPath 2.0 Data Model] for use in XQuery, XPath, XSLT and other related XML standards.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. The latest status of this document series is maintained at the W3C.

This is a Public Working Draft of this document for review by W3C Members and other interested parties. It is a draft document and may be updated, replaced, or made obsolete by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress". This is work in progress and does not imply endorsement by the W3C membership.

This document describes constructor functions, operators, and functions that are used in [XPath 2.0], [XQuery 1.0: An XML Query Language], and [XSLT 2.0]. The document is generally unconcerned with the specific syntax with which these constructor functions, operators, and functions will be used, and focuses instead on defining the semantics of them as precisely as feasible.

Among the more important changes from the previous version of this document are a new algorithm for selecting the collation to be used when certain character string operations are performed (see section 6.2 Equality and Comparison of Strings), simplifications of the rules for casting (see section 4 Constructor Functions and section 16 Casting Functions), and a more complete specification of functions that implement regular expression capabilities (see sections 6.3.15 xf:matches, 6.3.16 xf:replace, and 6.3.17 xf:tokenize).

The discussion about whether the syntax for constructor functions and casting can be unified continues. This has potentially wide impact and may reopen the closed issue of casting to and from derived types.

This document has been produced following the procedures set out for the W3C Process. This document was produced through the efforts of a joint task force of the W3C XML Query Working Group and the W3C XML Schema Working Group (both part of the W3C XML Activity) and a second joint task force of the W3C XML Query Working Group and the W3C XSL Working Group (part of the W3C Style Activity). It is designed to be read in conjunction with the following documents: [XQuery 1.0 and XPath 2.0 Data Model], [XPath 2.0], [XQuery 1.0: An XML Query Language] and [XSLT 2.0].

The following are identified as high priority issues. Reviewers are requested to provide feedback on these issues using the address below.

[Issue 145: Need decisions and text in several of our documents detailing conformance requirements based on resource limitations.]

Public comments on this document and its open issues are welcome. Comments should be sent to the W3C XPath/XQuery mailing list, public-qt-comments@w3.org (archived at http://lists.w3.org/Archives/Public/public-qt-comments/).

Patent disclosures relevant to this specification may be found on the XML Query Working Group's patent disclosure page and the XSL Working Group's patent disclosure page.

A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR/.

Table of Contents

1 Introduction
    1.1 Terminology
    1.2 Datatypes
    1.3 Syntax
    1.4 Notations
    1.5 Namespace Prefix
2 Accessors
    2.1 xf:node-kind
    2.2 xf:node-name
    2.3 xf:string
    2.4 xf:data
    2.5 xf:base-uri
    2.6 xf:unique-ID
3 The xf:error Function
    3.1 Examples
4 Constructor Functions
5 Functions and Operators on Numbers
    5.1 Numeric Types
    5.2 Operators on Numeric Values
        5.2.1 op:numeric-add
        5.2.2 op:numeric-subtract
        5.2.3 op:numeric-multiply
        5.2.4 op:numeric-divide
        5.2.5 op:numeric-integer-divide
        5.2.6 op:numeric-mod
        5.2.7 op:numeric-unary-plus
        5.2.8 op:numeric-unary-minus
    5.3 Comparisons of Numeric Values
        5.3.1 op:numeric-equal
        5.3.2 op:numeric-less-than
        5.3.3 op:numeric-greater-than
    5.4 Functions on Numeric Values
        5.4.1 xf:floor
        5.4.2 xf:ceiling
        5.4.3 xf:round
6 Functions on Strings
    6.1 String Types
    6.2 Equality and Comparison of Strings
        6.2.1 xf:compare
    6.3 Functions on String Values
        6.3.1 xf:concat
        6.3.2 xf:starts-with
        6.3.3 xf:ends-with
        6.3.4 xf:contains
        6.3.5 xf:substring
        6.3.6 xf:string-length
        6.3.7 xf:substring-before
        6.3.8 xf:substring-after
        6.3.9 xf:normalize-space
        6.3.10 xf:normalize-unicode
        6.3.11 xf:upper-case
        6.3.12 xf:lower-case
        6.3.13 xf:translate
        6.3.14 xf:string-pad
        6.3.15 xf:matches
        6.3.16 xf:replace
        6.3.17 xf:tokenize
        6.3.18 xf:escape-uri
7 Functions and Operators on Booleans
    7.1 Boolean Constructor Functions
        7.1.1 xf:true
        7.1.2 xf:false
    7.2 Operators on Boolean Values
        7.2.1 op:boolean-equal
        7.2.2 op:boolean-less-than
        7.2.3 op:boolean-greater-than
    7.3 Functions on Boolean Values
        7.3.1 xf:not
8 Functions and Operators on Durations, Dates, and Times
    8.1 Duration, Date, and Time Types
    8.2 Two Totally Ordered Subtypes of Duration
        8.2.1 yearMonthDuration
        8.2.2 dayTimeDuration
    8.3 Comparisons of Duration, Date and Time Values
        8.3.1 op:duration-equal
        8.3.2 op:yearMonthDuration-equal
        8.3.3 op:yearMonthDuration-less-than
        8.3.4 op:yearMonthDuration-greater-than
        8.3.5 op:dayTimeDuration-equal
        8.3.6 op:dayTimeDuration-less-than
        8.3.7 op:dayTimeDuration-greater-than
        8.3.8 op:dateTime-equal
        8.3.9 op:dateTime-less-than
        8.3.10 op:dateTime-greater-than
        8.3.11 op:date-equal
        8.3.12 op:date-less-than
        8.3.13 op:date-greater-than
        8.3.14 op:time-equal
        8.3.15 op:time-less-than
        8.3.16 op:time-greater-than
        8.3.17 op:gYearMonth-equal
        8.3.18 op:gYear-equal
        8.3.19 op:gMonthDay-equal
        8.3.20 op:gMonth-equal
        8.3.21 op:gDay-equal
    8.4 Component Extraction Functions on Duration, Date and Time Values
        8.4.1 xf:get-years-from-yearMonthDuration
        8.4.2 xf:get-months-from-yearMonthDuration
        8.4.3 xf:get-days-from-dayTimeDuration
        8.4.4 xf:get-hours-from-dayTimeDuration
        8.4.5 xf:get-minutes-from-dayTimeDuration
        8.4.6 xf:get-seconds-from-dayTimeDuration
        8.4.7 xf:get-year-from-dateTime
        8.4.8 xf:get-month-from-dateTime
        8.4.9 xf:get-day-from-dateTime
        8.4.10 xf:get-hours-from-dateTime
        8.4.11 xf:get-minutes-from-dateTime
        8.4.12 xf:get-seconds-from-dateTime
        8.4.13 xf:get-timezone-from-dateTime
        8.4.14 xf:get-year-from-date
        8.4.15 xf:get-month-from-date
        8.4.16 xf:get-day-from-date
        8.4.17 xf:get-timezone-from-date
        8.4.18 xf:get-hours-from-time
        8.4.19 xf:get-minutes-from-time
        8.4.20 xf:get-seconds-from-time
        8.4.21 xf:get-timezone-from-time
    8.5 Arithmetic Functions on yearMonthDuration and dayTimeDuration
        8.5.1 op:add-yearMonthDurations
        8.5.2 op:subtract-yearMonthDurations
        8.5.3 op:multiply-yearMonthDuration
        8.5.4 op:divide-yearMonthDuration
        8.5.5 op:add-dayTimeDurations
        8.5.6 op:subtract-dayTimeDurations
        8.5.7 op:multiply-dayTimeDuration
        8.5.8 op:divide-dayTimeDuration
    8.6 Timezone Functions on dateTime, date, and time
        8.6.1 xf:add-timezone-to-dateTime
        8.6.2 xf:remove-timezone-from-dateTime
        8.6.3 xf:add-timezone-to-date
        8.6.4 xf:add-timezone-to-time
        8.6.5 xf:remove-timezone-from-time
    8.7 Functions and Operators on TimePeriod Values
        8.7.1 xf:get-yearMonthDuration-from-dateTimes
        8.7.2 xf:get-dayTimeDuration-from-dateTimes
        8.7.3 op:subtract-dates
        8.7.4 op:subtract-times
        8.7.5 op:add-yearMonthDuration-to-dateTime
        8.7.6 op:add-dayTimeDuration-to-dateTime
        8.7.7 op:subtract-yearMonthDuration-from-dateTime
        8.7.8 op:subtract-dayTimeDuration-from-dateTime
        8.7.9 op:add-yearMonthDuration-to-date
        8.7.10 op:add-dayTimeDuration-to-date
        8.7.11 op:subtract-yearMonthDuration-from-date
        8.7.12 op:subtract-dayTimeDuration-from-date
        8.7.13 op:add-dayTimeDuration-to-time
        8.7.14 op:subtract-dayTimeDuration-from-time
9 Functions on QNames
    9.1 Constructor Functions for QNames
        9.1.1 xf:expanded-QName
    9.2 Functions on QNames
        9.2.1 op:QName-equal
        9.2.2 xf:get-local-name-from-QName
        9.2.3 xf:get-namespace-from-QName
10 Functions and Operators for anyURI
    10.1 Constructor Functions for anyURI
        10.1.1 xf:resolve-uri
    10.2 Functions on anyURI
        10.2.1 op:anyURI-equal
11 Functions and Operators on base64Binary and hexBinary
    11.1 Comparisons of base64Binary and hexBinary Values
        11.1.1 op:hex-binary-equal
        11.1.2 op:base64-binary-equal
12 Functions and Operators on NOTATION
    12.1 Functions on NOTATION
        12.1.1 op:NOTATION-equal
13 Functions and Operators on Nodes
    13.1 Functions and Operators on Nodes
        13.1.1 xf:name
        13.1.2 xf:local-name
        13.1.3 xf:namespace-uri
        13.1.4 xf:number
        13.1.5 xf:lang
        13.1.6 op:node-equal
        13.1.7 xf:deep-equal
        13.1.8 op:node-before
        13.1.9 op:node-after
        13.1.10 xf:copy
        13.1.11 xf:root
    13.2 xf:if-absent() and xf:if-empty()
        13.2.1 xf:if-absent
        13.2.2 xf:if-empty
14 Functions and Operators on Sequences
    14.1 Constructor Functions on Sequences
        14.1.1 op:to
    14.2 Functions and Operators on Sequences
        14.2.1 xf:boolean
        14.2.2 op:concatenate
        14.2.3 xf:item-at
        14.2.4 xf:index-of
        14.2.5 xf:empty
        14.2.6 xf:exists
        14.2.7 xf:distinct-nodes
        14.2.8 xf:distinct-values
        14.2.9 xf:insert
        14.2.10 xf:remove
        14.2.11 xf:subsequence
    14.3 Equals, Union, Intersection and Except
        14.3.1 xf:sequence-deep-equal
        14.3.2 xf:sequence-node-equal
        14.3.3 op:union
        14.3.4 op:intersect
        14.3.5 op:except
    14.4 Aggregate Functions
        14.4.1 xf:count
        14.4.2 xf:avg
        14.4.3 xf:max
        14.4.4 xf:min
        14.4.5 xf:sum
    14.5 Functions that Generate Sequences
        14.5.1 xf:id
        14.5.2 xf:idref
        14.5.3 xf:document
        14.5.4 xf:collection
        14.5.5 xf:input
15 Context Functions
    15.1 xf:context-item
    15.2 xf:position
    15.3 xf:last
    15.4 op:context-document
    15.5 xf:current-dateTime
        15.5.1 Examples
    15.6 xf:current-date
        15.6.1 Examples
    15.7 xf:current-time
        15.7.1 Examples
16 Casting Functions
    16.1 Casting from primitive types to primitive types
    16.2 Casting from derived types to primitive types
    16.3 Casting to derived types
    16.4 Casting from strings
    16.5 Casting within a branch of the type hierarchy
    16.6 Casting to string
    16.7 Casting to numeric types
    16.8 Casting to duration and date and time types
    16.9 Casting to boolean
    16.10 Casting to base64Binary and hexBinary
    16.11 Casting to anyURI and NOTATION

Appendices

A References
    A.1 Normative
    A.2 Non-normative
B Compatibility with XPath 1.0 (Non-Normative)
C Functions and Operators Issues List (Non-Normative)
D ChangeLog since Last Public Version on 2002-04-30 (Non-Normative)
E Function and Operator Quick Reference (Non-Normative)
    E.1 Functions and Operators by Section
    E.2 Functions and Operators Alphabetically


1 Introduction

[XML Schema Part 2: Datatypes] defines a number of primitive and derived datatypes, collectively known as built-in datatypes. This document defines operations on those datatypes for use in XQuery, XPath, XSLT and related XML standards. This document also discusses operators and functions on nodes and node sequences as defined in the [XQuery 1.0 and XPath 2.0 Data Model] for use in XQuery, XPath, XSLT and other related XML standards.

1.1 Terminology

The terminology used to describe the functions and operators on [XML Schema Part 2: Datatypes] is defined in the body of this specification. The terms defined in the following list are used in building those definitions:

[Definition] for compatibility

A feature of this specification included to ensure that implementations that use this feature remain compatible with [XPath 1.0]

[Definition] may

Conforming documents and processors are permitted to but need not behave as described.

[Definition] must

Conforming documents and processors are required to behave as described; otherwise, they are non-conformant or in error.

[Definition] implementation defined

Possibly differing between implementations, but specified by the implementor for each particular implementation.

[Definition] implementation dependent

Possibly differing between implementations, but not specified by this or other W3C specification, and not required to be specified by the implementor for any particular implementation.

1.2 Datatypes

The diagram below shows the built-in [XML Schema Part 2: Datatypes]. Solid lines connect a base datatype above to a derived datatype below. Dashed lines connect a datatype created as a list of an item type above.

Type hierarchy graphic

Diagram courtesy Asir Vedamuthu, webMethods

1.3 Syntax

The purpose of this document is to catalog the functions and operators required for XPath 2.0, XML Query 1.0, and XSLT 2.0. The exact syntax used to invoke these functions and operators is specified in [XPath 2.0], [XQuery 1.0: An XML Query Language], and [XSLT 2.0].

In general, the above specifications do not support function overloading. Consequently, there are no overloaded functions in this document except for legacy [XPath 1.0] functions such as string() which takes a single argument of a variety of types and concat() which takes a variable number of string arguments. This does not apply to operators such as "+" which may be overloaded. Functions with optional arguments are allowed. If optional arguments are omitted, omissions are assumed to begin from the right.

1.4 Notations

This document defines, among other things, constructor functions and other functions that apply to one or more data types. Each function is defined by specifying its signature, a description of each of its arguments, and its semantics. For many functions, examples are included to illustrate their use.

Each function's signature is presented in a form like this:

xf:function-name(parameter-type $parameter-name, ...) => return-type

In this notation, function-name is the name of the function whose signature is being specified. If the function takes no parameters, then the name is followed by an empty set of parentheses: (); otherwise, the name is followed by a parenthesized list of parameter declarations, each declaration specifying the static type of the parameter and a non-normative name used to reference the parameter when the function's semantics are specified. If there are two or more parameter declarations, they are separated by a comma. The return-type specifies the static type of the value returned by the function.

The function name is a QName as defined in [XML 1.0 Recommendation (Second Edition)] and must adhere to its syntactic conventions. Following [XPath 1.0], function names are composed of English words separated by hyphens,"-". If a function name contains a [XML Schema Part 2: Datatypes] datatype name, this may have intercapitalized spelling and is used in the function name as such. For example, xf:get-timezone-from-dateTime.

As is customary, the parameter type name indicates that the function accepts arguments of that type, or types derived from it, in that position.

Some functions accept the empty sequence as an argument and some may return the empty sequence. This is indicated in the function signature by following the parameter type name with a question mark:

xf:function-name(parameter-type? $parameter-name) => return-type?

[Issue 133: Syntax for indicating that function accepts empty sequence is incorrect]

1.5 Namespace Prefix

The functions and operators discussed in this document are contained in two namespaces (see [Namespaces in XML]) and referenced using a QName. The namespace prefix used in this document—merely for illustrative purposes—is xf: for the user functions and op: for the operator functions. The namespace prefix for these functions can vary, as long as the prefix is bound to the currect URI.

The actual namespaces (that is, the URIs of the namespaces) are:

  • http://www.w3.org/2002/08/xquery-operators for operators

  • http://www.w3.org/2002/08/xquery-functions for functions.

The functions defined with an xf: prefix are callable by the user. Functions defined with the op: prefix are described here to underpin the definitions of the operators in [XPath 2.0], [XQuery 1.0: An XML Query Language], and [XSLT 2.0]. These functions are not available directly to users, and there is no requirement that implementations should actually provide these functions. For example, multiplication is generally associated with the * operator, but it is described as a function in this document. For example:

op:multiply(numeric $operand1, numeric $operand2) => numeric

2 Accessors

The [XQuery 1.0 and XPath 2.0 Data Model] describes accessors on different types of nodes and defines their semantics. Some of these accessors are exposed to the user through the functions described below.

Function Accessor Accepts Returns
xf:node-kind node-kind any kind of node string
xf:node-name name any kind of node zero or one QName
xf:string string-value item string
xf:data typed-value any kind of node a sequence of atomic values
xf:base-uri base-uri Element or Document node zero or one anyURI
xf:unique-ID unique-ID Element node zero or one ID

2.1 xf:node-kind

xf:node-kind(node $srcval) => string

This function returns a string value representing the node's kind: either "document", "element", "attribute", "text", "namespace", "processing-instruction", or "comment".

2.2 xf:node-name

xf:node-name(node $srcval) => QName?

This function returns an expanded QName for node kinds that can have names. For other node kinds, it returns the empty sequence. Expanded QName is defined in [XQuery 1.0 and XPath 2.0 Data Model], and consists of a namespace URI and a local name.

2.3 xf:string

xf:string() => string
xf:string(item $srcval) => string

Returns the value of $srcval represented as a string. If no argument is supplied, $srcval defaults to the context item (.).

If $srcval is the empty sequence, the zero-length string is returned.

If $srcval is a node, the function returns the string value of the node, as obtained using the string-value accessor defined in the [XQuery 1.0 and XPath 2.0 Data Model].

If $srcval is an atomic value, then the function returns the same string as is returned by the expression cast as xs:string ($srcval), except in the cases listed below:

  • Editorial note  
    The "special rule" for xf:string, in which decimal values without a fractional component are converted to xs:string without a trailing decimal point, has been eliminated. This has not yet been considered by the Working Groups, but is felt to be appropriate for inclusion in this edition of this document.
  • If the type of $srcval is xs:anyURI, the URI is converted to a string without any escaping of special characters.

NOTE: The reason for the special rule for xs:anyURI is that, although XML Schema strongly discourages the use of spaces within URI values, the escaping of spaces can cause problems with legacy applications (for example, this applies to spaces within fragment identifiers in many HTML browsers), and should therefore be under user control.

NOTE: The string representation of double values is not backwards-compatible with the representation of number values in [XPath 1.0]. Ordinary double values are now represented using scientific notation; the representations of positive and negative infinity are now 'INF' and '-INF' rather than 'Infinity' and '-Infinity'. (It should be observed that '+INF' is not supported as a lexical form of infinity in [XML Schema Part 2: Datatypes] and is thus not supported by this specification; if that lexical form is added in a future version of [XML Schema Part 2: Datatypes], then it will be supported by a future version of this specification that aligns with that future version of [XML Schema Part 2: Datatypes].) However, most expressions that would have produced a number in [XPath 1.0] will produce a decimal (or integer) in [XPath 2.0], so unless there is a loss of precision caused by numeric approximation, the result of the expression will in most simple cases be the same after conversion to a string.

[Issue 160: Align the string() function with 'cast as string'.]

2.4 xf:data

xf:data(node $srcval) => atomic value*

If $srcval is a text node, an element node, or an attribute node, xf:data returns the typed value of $srcval, as defined by the accessor function dm:typed-value defined for that kind of node in [XQuery 1.0 and XPath 2.0 Data Model].

Specifically:

If $srcval is a text node, then its typed value is equal to its string value, as an instance of xs:anySimpleType.

If $srcval is an attribute node with type annotation xs:anySimpleType, then its typed value is equal to its string value, as an instance of xs:anySimpleType. The typed value of any other attribute node is derived from its string value and type annotation in a way that is consistent with schema validation, as described in [XQuery 1.0 and XPath 2.0 Data Model].

If $srcval is an element node with type annotation xs:anySimpleType, then its typed value is equal to its string value, as an instance of xs:anySimpleType. The typed value of an element nodes with a type annotation other than xs:anySimpleType is derived from its string value and type annotation in a way that is consistent with schema validation, as described in [XQuery 1.0 and XPath 2.0 Data Model].

If $srcval is not a text node, an attribute node, or an element node, then xf:data causes a static type error.

2.5 xf:base-uri

xf:base-uri(node $srcval) => anyURI?

For document and element nodes this function returns the value of the base-uri property. For other kinds of node it returns the empty sequence.

2.6 xf:unique-ID

xf:unique-ID(node $srcval) => ID?

This function accepts an element node and returns the identifier (ID) which may have been assigned by the user. It corresponds to the normalized value property of the attribute information item in the attributes property that has a type ID, if one exists. If no ID attribute exists the empty sequence is returned.

3 The xf:error Function

In this document, as well as in [XQuery 1.0: An XML Query Language], [XPath 2.0],and [XQuery 1.0 and XPath 2.0 Formal Semantics], the phrase "an error is raised" is used whenever the semantics being described encounter an error other than a static type error. The occurrence of that phrase implicitly causes the invocation of the xf:error function defined in this section. Whenever the raising of an error is accompanied by a specific error, the phrase "an error is raised (name-of-error)" is used, and the value name-of-error is passed as an argument to the xf:error function invocation. The xf:error function may also be invoked from XQuery and XPath 2.0 applications.

xf:error()
xf:error(item $srcval)

The xf:error function accepts any item (e.g., an atomic value or an element) as an argument, and may be invoked without any argument. The xf:error function never returns a value.

[Issue 181: What are the semantics of xf:error?]

[Issue 182: Every condition that raises an error should specify what error is raised]

3.1 Examples

  • xf:error()

  • xf:error("Invalid argument")

  • xf:error(<a>Really <emph>dumb</emph> decision!</a>)

4 Constructor Functions

Every built-in type that is defined in [XML Schema Part 2: Datatypes], as well as each of the two derived types xf:yearMonthDuration and xf:dayTimeDuration defined in this specification, has an associated constructor function. The form of that function for a type TYP is:

xs:TYP(item $srcval) => TYP

For example, the signature of the constructor function corresponding to the unsignedInt type is:

xs:unsignedInt(item $srcval) => unsignedInt

An invocation of that constructor function such as xs:unsignedInt(12) returns the unsignedInt value 12. Another invocation of that constructor function that returns the same unsignedInt value is xs:unsignedInt("12").

The semantics of the constructor function xs:TYP(item) are identical to the semantics of cast as xs:TYP (item).

Where the argument to a constructor function is a literal, the literal must be a valid lexical form of its type, as specified in [XML Schema Part 2: Datatypes].

Where the argument to a constructor function is a literal, the result of the function may be evaluated statically; if an error is found during such evaluation, it may be reported as a static error.

5 Functions and Operators on Numbers

This section discusses arithmetic operators on the numeric datatypes defined in [XML Schema Part 2: Datatypes]. It uses an approach that permits lightweight operations whenever possible.

5.1 Numeric Types

The operators described in this section are defined on the following numeric types. Each type whose name is indented is derived from the type whose name appears nearest above with one less level of indent.

decimal
integer
float
double

They also apply to user-defined types derived by restriction from these types.

5.2 Operators on Numeric Values

The following functions are defined to back up operators defined in [XQuery 1.0: An XML Query Language] and [XPath 2.0] on these numeric types.

Function Meaning Source
op:numeric-add Addition XPath 1.0
op:numeric-subtract Subtraction XPath 1.0
op:numeric-multiply Multiplication XPath 1.0
op:numeric-divide Division XPath 1.0
op:numeric-integer-divide Integer division XPath 1.0
op:numeric-mod Modulus XPath 1.0
op:numeric-unary-plus Unary plus XPath 2.0 Req 1.7 Should
op:numeric-unary-minus Unary minus (negation) XPath 1.0

The arguments and return types for the arithmetic operators are the basic numeric types: integer, decimal, float, and double, and types derived from them. For simplicity, each operator is defined to operate on operands of the same datatype and to return the same datatype. (The one exception is op:numeric-divide, which returns a double if called with two integer operands.) If the two operands are not of the same datatype, one operand is promoted to be the type of the other operand.

The type promotion scheme includes only two rules:

  1. A derived type may be promoted to its base type. In particular, integer may be promoted to decimal.

  2. decimal may be promoted to float, and float may be promoted to double.

The result type of operations depends on their argument datatypes and is defined in the following table:

Operator Returns
op:operation(integer, integer) integer (except for op:numeric-divide(integer, integer), which returns a double)
op:operation(decimal, decimal) decimal
op:operation(float, float) float
op:operation(double, double) double
op:operation(integer) integer
op:operation(decimal) decimal
op:operation(float) float
op:operation(double) double

These rules define any operation on any pair of arithmetic types. Consider the following example:

op:operation(int, double) => op:operation(double, double)

For this operation, int must be converted to double. This can be done, since by the rules above: int can be promoted to integer, integer can be promoted to decimal, decimal can be promoted to float, and float can be promoted to double. As far as possible, the promotions should be done in a single step. Specifically, when a decimal is promoted to a double, it must not be converted to a float and then to double as this will lose precision.

As another example, a user may define height as a derived type of integer with a minimum value of 20 and a maximum value of 100. He may then derive oddHeight using a pattern to restrict the value to odd integers.

op:operation(oddHeight, integer) => op:operation(integer, integer)

oddHeight is first promoted to its base type height. height is promoted to its base type integer.

[Issue 177: Must overflow and underflow always be reported?]

Overflow and underflow behavior is ·implementation-defined·. See [ISO 10967]. That is, implementations may determine that, when overflow or underflow is detected in any of the above operations, an error is raised ("overflow or underflow error"). However, implementations are not required to catch or report such errors.

Finally, consider some examples involving special IEEE 754 numerics.

  1. If either argument is "NaN", the result is "NaN".

  2. If neither argument is "NaN", but either argument is "INF", the result is "INF".

  3. If neither argument is "NaN" or "INF", but either argument is "-INF", the result is "-INF".

Note: In the case of multiplication and division, "INF" may become "-INF", and vice versa, as appropriate.

The functions op:numeric-add, op:numeric-subtract, op:numeric-multiply, op:numeric-divide, op:numeric-integer-divide, and op:numeric-mod are each defined for pairs of numeric operands, each of which has the same type: integer, decimal, float, or double. The functions op:numeric-unary-plus and op:numeric-unary-minus are defined for a single operand whose type is one of those same numeric types.

5.2.1 op:numeric-add

op:numeric-add(numeric $operand1, numeric $operand2) => numeric

Backs up the "+" operator and returns the arithmetic sum of its operands: ($operand1 + $operand2).

5.2.2 op:numeric-subtract

op:numeric-subtract(numeric $operand1, numeric $operand2) => numeric

Backs up the "-" operator and returns the arithmetic difference of its operands: ($operand1 - $operand2).

5.2.3 op:numeric-multiply

op:numeric-multiply(numeric $operand1, numeric $operand2) => numeric

Backs up the "*" operator and returns the arithmetic product of its operands: ($operand1 * $operand2).

5.2.4 op:numeric-divide

op:numeric-divide(numeric $operand1, numeric $operand2) => numeric

Backs up the "div" operator and returns the arithmetic quotient of its operands: ($operand1 div $operand2).

Note:

For compatibility with [XPath 1.0], if the types of both $operand1 and $operand2 are xs:integer, then the return type is xs:double.

For xs:decimal and xs:integer operands, if the divisor is 0, then an error is raised ("Division by zero"). For xs:float and xs:double operands, performs floating point division as specified in [IEEE 754-1985].

5.2.5 op:numeric-integer-divide

op:numeric-integer-divide( integer  $operand1,
integer  $operand2) => integer

Backs up the "idiv" operator and returns the arithmetic quotient of its operands: ($operand1 idiv $operand2). If the quotient is not evenly divided by the divisor, then the quotient is the integer value obtained, ignoring any remainder that results from the division (that is, no rounding is performed).

If the divisor is 0, then an error is raised ("Division by zero").

5.2.6 op:numeric-mod

op:numeric-mod(numeric $operand1, numeric $operand2) => numeric

Backs up the "mod" operator and returns the remainder after dividing the first operand by the second operand: ($operand1 mod $operand2). The result is of the same type as the operands after type promotion. The following rules apply:

  • For xs:decimal and xs:integer operands, if the divisor is 0, then an error is raised ("Division by zero").

  • For xs:float and xs:double operands:

    • If either operand is NaN, the result is NaN.

    • If the dividend is positive or negative infinity, or the divisor is positive or negative zero (0), or both, the result is NaN.

    • If not NaN, the sign of the result equals the sign of the dividend.

    • If the dividend is finite and the divisor is an infinity, the result equals the dividend.

    • If the dividend is positive or negative zero and the divisor is finite, the result is the same as the dividend.

    • In the remaining cases, where neither positive or negative infinity, nor positive or negative zero, nor NaN is involved, the float or double remainder r from a dividend n and a divisor d is defined by the mathematical relation r = n-(d * q) where q is an integer that is negative only if n/d is negative and positive only if n/d is positive, and whose magnitude is as large as possible without exceeding the magnitude of the true mathematical quotient of n and d. This is truncating division, analogous to integer division, not [IEEE 754-1985] rounding division.

5.2.6.1 Examples
  • op:numeric-mod(10,3) returns 1.

  • op:numeric-mod(6,2) returns 0.

  • op:numeric-mod(4.5,1.2) returns 0.9.

  • op:numeric-mod(1.23E2, 0.6E1) returns 3.0E0.

5.2.7 op:numeric-unary-plus

op:numeric-unary-plus(numeric $operand) => numeric

Backs up the unary "+" operator and returns its operand with the sign unchanged: (+ $operand). Semantically, this operation performs no operation.

5.2.8 op:numeric-unary-minus

op:numeric-unary-minus(numeric $operand) => numeric

Backs up the unary "-" operator and returns its operand with the sign reversed: (- $operand). If $operand is positive, its negative is returned; if it is negative, its positive is returned.

5.3 Comparisons of Numeric Values

We define the following comparison operators on numeric values. Comparisons take two arguments of the same type. If the arguments are of different types, one argument is promoted to the type of the other. Each comparison operator returns a boolean value. If either, or both, operands are "NaN", false is returned.

Operator Meaning Source
op:numeric-equal Equality comparison XPath 1.0
op:numeric-less-than Less-than comparison XPath 1.0
op:numeric-greater-than Greater-than comparison XPath 1.0

5.3.1 op:numeric-equal

op:numeric-equal(numeric $operand1, numeric $operand2) => boolean

Returns true if and only if $operand1 is exactly equal to $operand2. For xs:float and xs:double values, 0 (zero), +0 (positive zero), and -0 (negative zero) all compare equal. NaN does not equal itself.

This function backs up the "eq" and "ne" operators on numeric values.

5.3.2 op:numeric-less-than

op:numeric-less-than(numeric $operand1, numeric $operand2) => boolean

Returns true if and only if $operand1 is less than $operand2. For xs:float and xs:double values, positive infinity is greater than all other non-NaN values; negative infinity is less than all other non-NaN values. NaN is not comparable with (neither greater than nor less than) any other value including itself.

This function backs up the "lt" and "ge" operators on numeric values.

5.3.3 op:numeric-greater-than

op:numeric-greater-than(numeric $operand1, numeric $operand2) => boolean

Returns true if and only if $operand1 is greater than $operand2. For xs:float and xs:double values, positive infinity is greater than all other non-NaN values; negative infinity is less than all other non-NaN values. NaN is not comparable with (neither greater than nor less than) any other value including itself.

This function backs up the "gt" and "le" operators on numeric values.

5.4 Functions on Numeric Values

The following functions are defined on these numeric types. Each function returns an integer except:

  • If the argument is the empty sequence, the empty sequence is returned.

  • If the argument is "NaN", "NaN" is returned.

  • If the argument is positive or negative infinity, positive or negative infinity is returned.

Function Meaning Source
xf:floor Returns the largest integer less than or equal to the argument XPath 1.0
xf:ceiling Returns the smallest integer greater than or equal to the argument XPath 1.0
xf:round Rounds to the nearest integer XPath 1.0

[Issue 79: How many digits of precision (etc.) are returned from certain functions?]

[Issue 142: Should floor ceiling and round return the same type as their argument? ]

[Issue 179: What is the appropriate return type for xf:floor, xf:celing, and xf:round?]

5.4.1 xf:floor

xf:floor(double? $srcval) => double?

Returns the largest (closest to positive infinity) integer that is not greater than the value of $srcval. If the argument is the empty sequence, returns the empty sequence.

5.4.1.1 Examples
  • xf:floor(10.5) returns 10.

  • xf:floor(-10.5) returns -11.

5.4.2 xf:ceiling

xf:ceiling(double? $srcval) => double?

Returns the smallest (closest to negative infinity) integer that is not smaller than the value of $srcval. If the argument is the empty sequence, returns the empty sequence.

5.4.2.1 Examples
  • xf:ceiling(10.5) returns 11.

  • xf:ceiling(-10.5) returns -10.

5.4.3 xf:round

xf:round(double? $srcval) => double?

Returns the number that is closest to the argument. If there are two such numbers, then the one that is closest to positive infinity is returned. More formally, xf:round(x) produces the same result as xf:floor(x+0.5). If the argument is NaN, then NaN is returned. If the argument is positive infinity, then positive infinity is returned. If the argument is negative infinity, then negative infinity is returned. If the argument is positive zero (+0), then positive zero (+0) is returned. If the argument is negative zero (-0), then negative zero (-0) is returned. If the argument is less than zero (0), but greater than or equal to -0.5, then negative zero (-0) is returned. If the argument is the empty sequence, then the empty sequence is returned.

5.4.3.1 Examples
  • xf:round(2.5) returns 3.

  • xf:round(2.4999) returns 2.

  • xf:round(-2.5) returns -2 (not the possible alternative, -3).

6 Functions on Strings

This section discusses functions and operators on the [XML Schema Part 2: Datatypes] string datatype and the datatypes derived from string.

6.1 String Types

The operators described in this section are defined on the following string types. Each type whose name is indented is derived from the type whose name appears nearest above with one less level of indent.

string
normalizedString
token
language
NMTOKEN
Name
NCName
ID
IDREF
ENTITY

They also apply to user-defined types derived by restriction from these types.

6.2 Equality and Comparison of Strings

When values whose type is string or some type derived from string are compared (or, equivalently, sorted), the comparisons are inherently performed according to some collation (even if that collation is defined entirely on code point values or on the binary representations of the characters of the string). The [Character Model for the World Wide Web 1.0] observes that some applications may require different comparison and ordering behaviors than other applications. Similarly, some users having particular linguistic expectations may require different behaviors than other users. Consequently, the collation must be taken into account when comparing strings in any context. Several functions in this and the following section make use of a collation.

Collations can indicate that some characters that are rendered differently are, in fact equal for collation purpose (e.g., "uve" and "uwe" are considered equivalent in some European languages). Strings can be compared character-by-character or in a logical manner, as defined by the collation.

Some collations, especially those based on the [Unicode Collation Algorithm] can be "tailored" for various purposes. This document does not discuss such tailoring. Instead, it assumes that the collation argument to the various functions below is a tailored and named collation. A specific collation with a distinguished name, http://www.w3.org/2002/08/query-operators/collation/codepoint, provides the ability to compare strings based on code point values. Every implementation of XQuery must support the collation based on code point values.

NOTE: This document uses the term "code point" as a synonym for "Unicode scalar value". [The Unicode Standard] sometimes spells this term "codepoint". Code points range from #x0000 to #x10FFFF inclusive.

While the [Character Model for the World Wide Web 1.0] recommends that all strings be subjected to early Unicode normalization, it is not possible to guarantee that all strings in all XML documents are, in fact, normalized, or that they are normalized in the same manner. In order to maximize interoperable results of operations on XML documents in general, there may be collations that operate on unnormalized strings, other collations that raise runtime errors when unnormalized strings are encountered, and still other collations that implicitly normalize strings for the purposes of collating them. For alignment with the [Character Model for the World Wide Web 1.0], applications may choose collations that treat unnormalized strings as though they were normalized (that is, that implicitly normalize the strings). Note that collations based on the Unicode collation algorithm produce equivalent results regardless of a string's normalization.

This document assumes that collations are named and that the collation name may be provided as an argument to string comparison functions. Functions that allow specification of a collation do so with an argument whose type is anyURI. This document also defines the manner in which a default collation is determined when the collation argument is not specified in invocations of the functions that allow it to be omitted.

The XQuery/XPath static context includes provision for a default collation that can be used for string comparisons (including ordering operations). However, the static context is not required to have a default collation specified; an implementation might choose to provide a default collation only under certain circumstances, or not at all. The static context default collation, if provided, is determined by ·implementation-defined· means. Such means might include determination from the host operating system environment, determination during XQuery/XPath installation, determination when the XQuery/XPath implementation was created, determination from the locale of some user environment, or even ·implementation-defined· language through which the user can specify that collation.

The decision of what collation to use for a given comparison or ordering operation is determined by the following algorithm:

  1. If the operation specifies an explicit collation CollationA (e.g., if the optional collation argument is specified in an invocation of the xf:compare() function), then:

    • If CollationA is supported by the implementation, then CollationA is used.

    • Otherwise, an error is raised ("Unsupported collation").

  2. If no collation is explicitly specified for the operation and the XQuery/XPath static context specifies a collation CollationB, then:

    • If CollationB is supported by the implementation, then CollationB is used.

    • Otherwise, an error is raised ("Unsupported collation").

    NOTE: There might be several ways in which a collation might be specified in the XQuery/XPath static context. For example, XQuery might provide syntax that specifies a default collation as part of the query prolog.

  3. Otherwise, the Unicode codepoint collation (http://www.w3.org/2002/08/query-operators/collation/codepoint) is used.

XML allows elements to specify the xml:lang attribute to indicate the language associated with the content of such an element. This specification does not use xml:lang to identify the default collation, in part because collations should be determined by the user of the data, not (normally) the data itself, and because using xml:lang does not produce desired effects when the two strings to be compared have different xml:lang values or when a string is multilingual.

NOTE: Some data management environments allow collations to be associated with the definition of string items (that is, with the metadata that describes items whose type is string). While such association may be appropriate for use in environments in which data is held in a repository tightly bound to its descriptive metadata, it is not appropriate in the XML environment in which different documents being processed by a single query may be described by differing schemas.

[Issue 44: Collations: URIs and URI references or short names?]

[Issue 170: Some functions require collations with special capabilities. ]

Function Meaning Source
xf:compare Compares two character strings; a collation may optionally be specified XSLT 2.0, Req. 2.13 (Could)

[Issue 73: Is a "between" function needed?]

6.2.1 xf:compare

xf:compare(string? $comparand1, string? $comparand2) => integer?
xf:compare( string?  $comparand1,
string?  $comparand2,
anyURI  $collationLiteral) => integer?

Returns -1, 0, or 1, depending on whether the value of the $comparand1 is respectively less than, equal to, or greater than the value of $comparand2, according to the rules of the collation that is used.

The collation used by the invocation of this function is determined according to the rules in 6.2 Equality and Comparison of Strings.

If the value of $comparand2 begins with a string that is equal to the value of $comparand1 (according to the collation that is used) and has additional characters following that beginning string, then the result is -1. If the value of $comparand1 begins with a string that is equal to the value of $comparand2 (according to the collation that is used) and has additional characters following that beginning string, then the result is 1.

If either argument is the empty sequence, the result is the empty sequence.

This function backs up the "eq", "ne", "gt", "lt", "le" and "ge" operators on string values.

6.2.1.1 Examples
  • xf:compare('abc', 'abc') returns 0.

  • xf:compare('Strasse', 'Straße') returns 0 if and only if the default collation includes provisions that equate "ss" and the (German) character "ß" ("sharp-s"). (Otherwise, the returned value depends on the semantics of the default collation.)

  • xf:compare('Strasse', 'Straße', anyURI('deutsch')) returns 0 if and only if the collation identified by the relative URI constructed from the string value "deutsch" includes provisions that equate "ss" and the (German) character "ß" ("sharp-s"). (Otherwise, the returned value depends on the semantics of that collation.)

  • xf:compare('Strassen', 'Straße') returns 1 if and only if the default collation includes provisions that equate "ss" and the (German) character "ß" ("sharp-s"). (Since the value of $comparand1 has an additional character, an "n", following the string that is equal to "Straße", it is greater than the value of $comparand2.)

6.3 Functions on String Values

The following functions are defined on these string types. Several of these function use a collation. See 6.2 Equality and Comparison of Strings for a discussion of collations.

Function Meaning Source
xf:concat Concatenates two or more character strings. XPath 1.0
xf:starts-with Indicates whether the value of one string begins with the characters of the value of another string. XPath 1.0
xf:ends-with Indicates whether the value of one string ends with the characters of the value of another string. XPath 1.0
xf:contains Indicates whether the value of one string contains the characters of the value of another string. A collation may optionally be specified. XPath 1.0
xf:substring Returns a string located at a specified place in the value of a string. XPath 1.0
xf:string-length Returns the length of the argument. XPath 1.0
xf:substring-before Returns the characters of one string that precede in that string the characters in the value of another string. A collation may optionally be specified. XPath 1.0
xf:substring-after Returns the characters of one string that precede in that string the characters in the value of another string. A collation may optionally be specified. XPath 1.0
xf:normalize-space Returns the whitespace-normalized value of the argument. XPath 1.0
xf:normalize-unicode Returns the normalized value of the first argument in the normalization form specified by the second argument. XPath 2.0 Req 2.9 (Should)
xf:upper-case Returns the upper-cased value of the argument. XPath 2.0 Req 2.4.3 (Should)
xf:lower-case Returns the lower-cased value of the argument. XPath 2.0 Req 2.4.3 (Should)
xf:translate Returns the first argument string with occurrences of characters in the second argument replaced by the character at the corresponding position in the third string. XPath 1.0
xf:string-pad Returns a string composed of as many copies of its first argument as specified in its second argument. XPath 2.0 Req 2.4.2, 4.4 (Should)
xf:matches Returns a boolean value that indicates whether the value of the first argument is matched by the regular expression that is the value of the second argument. XPath 2.0 Req 3. (Must)
xf:replace Returns the value of the first argument with every substring matched by the regular expression that is the value of the second argument replaced by the replacement string that is the value of the third argument. XPath 2.0 Req 2.4.1. (Should)
xf:tokenize Returns a sequence of zero or more strings whose values are substrings of the value of the first argument separated by substrings that match the regular expression that is the value of the second argument. XSLT 2.0 Req (Should)
xf:escape-uri Returns the string representing a URI value with certain characters escaped as specified in [RFC 2396].

[Issue 23: "Returns a copy" is not appropriate wording