W3C

XQuery 3.1 Requirements and Use Cases

W3C Working Group Note 13 December 2016

This version:
https://www.w3.org/TR/2016/NOTE-xquery-31-requirements-20161213/
Latest version:
https://www.w3.org/TR/xquery-31-requirements/
Previous version:
https://www.w3.org/TR/2015/NOTE-xquery-31-requirements-20150811/
Editor:
Jonathan Robie, EMC Corporation

Abstract

This document specifies goals and requirements for XQuery 3.1.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.

This document is governed by the 1 September 2015 W3C Process Document.

This is a Working Group Note as described in the Process Document. It was developed by the W3C XML Query Working Group, which is part of the XML Activity.

These Requirements identify extensions to the XQuery 3.0 Recommendation, published 04 April 2014, that have been requested by WG participants and by reviewers who do not participate in the W3C activities. The XML Query WG has not yet fully reviewed these requirements.

Please report errors in this document using W3C's public Bugzilla system (instructions can be found at https://www.w3.org/XML/2005/04/qt-bugzilla). If access to that system is not feasible, you may send your comments to the W3C XSLT/XPath/XQuery public comments mailing list, public-qt-comments@w3.org. It will be very helpful if you include the string “[XQuery31Req]” in the subject line of your report, whether made in Bugzilla or in email. Please use multiple Bugzilla entries (or, if necessary, multiple email messages) if you have more than one comment to make. Archives of the comments and responses are available at https://lists.w3.org/Archives/Public/public-qt-comments/.

Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.


1 Goals

The primary goal of XML Query 3.1 is to extend XML Query 3.0 with support for JSON maps and arrays, and to leverage these structures to make XQuery more useful. These data structures are also part of XPath 3.1, and are used in XSLT as well as XQuery.

Other features that improve usability or compatibility will be considered as time permits.

Satisfying these goals may require changes to the set of seven documents that have progressed to Recommendation together (Data Model 3.1, Functions and Operators 3.1, Serialization 3.1, XPath 3.1, XQuery 3.1, XQueryX 3.1, and XSLT 3.0).

2 Requirements

2.1 Terminology

The following keywords are used throughout the document to specify the extent to which an item is a requirement for the work of the XML Query Working Group:

MUST

The item is an absolute requirement.

MUST NOT

The item is an absolute prohibition.

SHOULD

There may exist valid reasons not to treat this item as a requirement, but the full implications should be understood and the case carefully weighed before discarding this item.

SHOULD NOT

There may exist valid reasons when the particular behavior is acceptable or even useful, but the full implications should be understood and the case carefully weighed before implementing any behavior described with this label.

MAY

An item deserves attention, but further study is needed to determine whether the item should be treated as a requirement.

When the words MUST, SHOULD, or MAY are used in this technical sense [IETF RFC 2119], they occur as a hyperlink to these definitions. These words will also be used with their conventional English meaning, in which case there is no hyperlink. For instance, the phrase "the full implications should be understood" uses the word "should" in its conventional English sense, and therefore occurs without the hyperlink.

Each requirement also includes a status section, indicating its current situation in the XQuery/XPath/XSLT family of specifications. Three status levels are used:

"Green" status

green                   status This indicates that the requirement, according to its original formulation, has been completely met. Optional clarifying text may follow.

"Yellow" status

yellow                    status This indicates that the requirement has been partially met according to its original formulation. When this happens, explanatory text is provided to better clarify the current scope of the requirement.

"Red" status

red                 status This indicates that the requirement, according to its original formulation, has not been met. If this is the case, explanatory text is provided.

2.2 General Requirements

2.2.1 Backward compatibility

XQuery 3.1 MUST be backward compatible with [XQuery 3.0].

Every valid XQuery 3.0 expression MUST be valid in XQuery 3.1 and it MUST evaluate to the same result.

green                                                                                                                                  status Status: this requirement has been met.

2.2.2 Extension compatibility

XQuery 3.1 MUST be compatible with XQuery 3.0 extensions developed by the XML Query Working Group, including [XQuery Update Facility 3.0] and [XQuery and XPath Full Text 3.0].

green                                                                                                                            status Status: this requirement has been met.

2.3 Maps, Arrays, Nulls, and JSON

2.3.1 Maps

XQuery 3.1 MUST support collections of name / value pairs, which we call maps. In JSON, they are called objects, in other languages they are sometimes called records, structs, dictionaries, hash tables, keyed lists, or associative arrays).

green                                                                                                    status Status: this requirement has been met.

The map feature MUST provide a convenient syntax for creating maps.

green                                                                                                    status Status: this requirement has been met.

The map feature MUST provide a convenient syntax for returning the value associated with a key.

green                                                                                                    status Status: this requirement has been met.

The map feature MUST provide a convenient way to enumerate the keys in a map.

green                                                                                                    status Status: this requirement has been met (using functions).

The map feature MUST provide a convenient way to create modified copies of maps, e.g. by adding or deleting entries.

green                                                                                                    status Status: this requirement has been met (using functions).

The map feature MUST NOT preclude in-situ updates analogous to updates in the XQuery Update Facility.

green                                                                                                     status Status: this requirement has been met.

A map SHOULD allow any atomic value as a key. The map feature SHOULD allow keys of various types to be used as keys in the same map.

green                                                                                                    status Status: this requirement has been met.

A map SHOULD allow any XDM sequence as a value. A map MUST allow any XDM item, map, or array as a value.

green                                                                                                    status Status: this requirement has been met.

A map MUST be allowed as a member of an XDM sequence.

green                                                                                                    status Status: this requirement has been met.

It MAY be possible to use a map as a function.

green                                                                                                    status Status: this requirement has been met.

For the sake of optimizability, a map SHOULD NOT expose identity via the is, <<, >>, union, intersect, or except operators, or any operation that exposes document order.

green                                                                                                    status Status: this requirement has been met.

2.3.2 Arrays

XQuery 3.1 MUST support arrays, which can nest.

green                                                                                                    status Status: this requirement has been met.

XQuery 3.1 MUST provide a convenient syntax for creating arrays.

green                                                                                                    status Status: this requirement has been met.

Arrays MUST provide a convenient syntax for returning the value found in a given position.

green                                                                                                    status Status: this requirement has been met (using function call syntax).

Arrays SHOULD provide a convenient way to create modified copies of an array, e.g. by adding or deleting entries.

green                                                                                                    status Status: this requirement has been met (using functions).

Arrays MUST NOT preclude in-situ updates analogous to updates in the XQuery Update Facility.

green                                                                                                     status Status: this requirement has been met.

An array MUST allow any XDM item, array, or map as a member of an array.

green                                                                                                    status Status: this requirement has been met.

An array MUST be allowed as a member of an XDM sequence.

green                                                                                                    status Status: this requirement has been met.

It MAY be possible to use an array as a function.

green                                                                                                    status Status: this requirement has been met.

For the sake of optimizability, an array SHOULD NOT expose identity via the is, <<, >>, union, intersect, or except operators, or any operation that exposes document order.

green                                                                                                    status Status: this requirement has been met.

2.3.3 Nulls

XQuery 3.1 MUST support nulls. It MAY represent nulls using the empty sequence, or it MAY represent nulls with a new item.

green                                                                                                                           status Status: this requirement has been met. Nulls are represented by empty sequences.

2.3.4 Serialization

XQuery 3.1 MUST support JSON serialization.

green                                                                                                                                  status Status: this requirement has been met.

XQuery 3.1 MAY support serialization to multiple resources from a single query.

red                                                                                                                                  status Status: this requirement has not been met. However, the EXPath File Module provides this functionality for implementations that support it.

2.4 Usability Features

2.4.1 Scientific Notation

XQuery 3.1 MUST provide support for numbers in scientific notation.

green                                                                                                                                  status Status: this requirement has been met.

2.4.2 Type Aliases

XQuery 3.1 MAY support aliases for types.

red                                                                                                                          status Status: this requirement has not been met.

2.4.3 Invoking XSLT Transformations

XQuery 3.1 MUST provide a means to invoke XSLT transformations.

green                                                                                                                           status Status: this requirement has been met. fn:transform() invokes an XSLT transformation.

2.4.4 Collations

XQuery 3.1 MAY provide a standard mechanism for referring to collations.

green status Status: this requirement has been met.

3 Use Cases

The solutions provided for the following Use Cases use the XQuery 3.1 query language, and frequently create maps rather than XML. In some cases, XQuery 3.0 solutions that create XML are also provided for comparison. Every XQuery 3.0 solution provided is also a valid XQuery 3.1 solution.

3.1 XSLT 3.0 Streaming Use Cases

These use cases were originally proposed for XSLT 3.0 streaming. In XQuery, they are done using grouping. In these use cases, we assume that the user is using maps as a lightweight structure to represent the results of grouping.

3.1.1 Simple Grouping

Find the highest earning employee in each department.

3.1.1.1 Solution in XQuery 3.0
for $e in doc("employees.xml")/employees/employee,
    $d in $e/department
group by $d
return
   <department name="{$d}">
     {
       let $max := max($e/salary)
       return $e[salary=$max]
     }
   </department>
3.1.1.2 Solution in XQuery 3.1
for $e in doc("employees.xml")/employees/employee,
    $d in $e/department
group by $d
return
   map {
     "department" : $d,
     "highest paid employee" :
       let $max := max($e/salary)
       return $e[salary=$max]
   } 

3.1.2 Simultaneous Grouping

Find both the highest earning employee in each department, and the total number of employees to job-type across all departments.

3.1.2.1 Solution in XQuery 3.0
for $employee in doc("employees.xml")/*/employee
let $salary := $employee/salary
group by $department := $employee/department
let $max-salary := max($salary)
let $highest-earners := $employee[salary = $max-salary]
return
   <department name="{$department}">{ $highest-earners }</department>,

for $employee in doc("employees.xml")/*/employee
let $salary := $employee/salary
group by $job-type := $employee/job-type
let $totals := count($employee)
return
   <total-by-job-type type="{$job-type}">{ $totals }</total-by-job-type>

	  
3.1.2.2 Solution in XQuery 3.1
for $employee in doc("employees.xml")/*/employee
let $salary := $employee/salary
group by $department := $employee/department
let $max-salary := max($salary)
let $highest-earners := $employee[salary = $max-salary]
return
   map { 
     "department" : $department,
     "highest earners" : $highest-earners 
   }
,
for $employee in doc("employees.xml")/*/employee
let $salary := $employee/salary
group by $job-type := $employee/job-type
let $totals := count($employee)
return
   map {
      "job type" : $job-type,
      "count(employee)" : $totals
   }

3.1.3 Word Count by Lemma

Calculate the word count by lemma of the verbs in the following document.

3.1.3.1 Input Data

The XML document, gnt.xml.

<gnt>
<s>
 <w pos="PP">I</w>
 <w pos="V" lemma="go">go</w>
 <pu>.</pu>
</s>
<s>
 <w pos="PP">She</w>
 <w pos="V" lemma="go">went</w>
 <pu>.</pu>
</s>
<s>
 <w pos="PP">He</w>
 <w pos="V" lemma="go">goes</w>
 <pu>.</pu>
</s>
<s>
 <w pos="PP">I</w>
 <w pos="V" lemma="see">see</w>
 <pu>.</pu>
</s>
<s>
 <w pos="PP">She</w>
 <w pos="V" lemma="see">sees</w>
 <pu>.</pu>
</s>
<s>
 <w pos="PP">I</w>
 <w pos="V" lemma="have">have</w>
 <pu>.</pu>
</s>
<s>
 <w pos="PP">She</w>
 <w pos="V" lemma="have">has</w>
 <pu>.</pu>
</s>
</gnt>
3.1.3.2 Result
<verb lemma="go" count="3"/>
<verb lemma="see" count="2"/>
<verb lemma="have" count="2"/>
3.1.3.3 Solution in XQuery 3.0:

A solution just using grouping, without maps.

for $word in doc("gnt.xml")//w
let $lemma := $word/@lemma
where m:is-verb($word)
group by $lemma
order by count($word) descending
return
  <verb lemma="{ $lemma }" count="{count($word)}" />
3.1.3.4 Solution in XQuery 3.1:
for $word in doc("gnt.xml")//w
let $lemma := $word/@lemma
where m:is-verb($word)
group by $lemma
order by count($word) descending
return
  map { 
   "lemma" :  $lemma,
   "count" : count($word)
  }

3.2 Compound Values

3.2.1 Complex Number Library

Implement a complex number library for XQuery or XSLT 3.0. Complex numbers should be represented as a single item, so they can themselves be manipulated like regular numbers by returning sequences of them etc.

In this library, the complex number 2 + 3i is represented as the map { "true" : 2, "false" : 3 }

3.2.1.1 Solution in XQuery 3.1
declare function i:complex(
  $real as xs:double,
  $imaginary as xs:double
) as map(xs:boolean, xs:double)
{
  map { true() : $real, false() : $imaginary }
};

declare function i:real(
  $complex as map(xs:boolean, xs:double)
) as xs:double
{
  $complex(true())
};

declare function i:imaginary(
  $complex as map(xs:boolean, xs:double)
) as xs:double
{
  $complex(false())
};

declare function i:add(
  $arg1 as map(xs:boolean, xs:double),
  $arg2 as map(xs:boolean, xs:double)
) as map(xs:boolean, xs:double)
}
  i:complex(i:real($arg1)+i:real($arg2),
  i:imaginary($arg1)+i:imaginary($arg2))
};

declare function i:multiply(
  $arg1 as map(xs:boolean, xs:double),
  $arg2 as map(xs:boolean, xs:double)
) as map(xs:boolean, xs:double)
{
  i:complex(
    i:real($arg1)*i:real($arg2) - i:imaginary($arg1)*i:imaginary($arg2),
    i:real($arg1)*i:imaginary($arg2) + i:imaginary($arg1)*i:real($arg2))
};

Here is a query that uses this library:

i:add(i:complex(2, 3), i:complex(1, -6)),
i:multiply(i:complex(2, -1), i:complex(3, 4))

Here is the result of the above query:

{ "true" : 3, "false" : -3 },
{ "true" : 10, "false" : 5 }

3.3 Manual Indexing

Build an index to manually optimize retrieval of books in a catalog by their ISBN number.

3.3.1 Simple Manual Join

Construct a list of all authors, and the books they have written.

3.3.1.1 Input Data

Book elements of the form:

<book>
<isbn>0470192747</isbn>
<publisher>Wiley</publisher>
<title>XSLT 2.0 and XPath 2.0 Programmer's Reference</title>
</book>

Author elements of the form:

<author>
<name>Michael H. Kay</name>
<isbn>0470192747</isbn>
<isbn>...</isbn>
</author>
3.3.1.2 Solution in XQuery 3.1:
declare variable $index := map:merge( //book ! map { isbn : . });

<table>{
  for $a in //author
  return <tr>
    <td>{ $a/name/string() }</td>
    <td>{ string-join($a/isbn ! $index(.)/title/string(), ", ") }</td>
  </tr>
}</table>

3.4 Interface / Implementation Pattern

As in Javascript, a map whose keys are strings and whose associated values are function items can be used in a similar way to a class in object-oriented programming languages.

3.4.1 Data Variety

Suppose an application needs to handle customer order information that may arrive in three different formats, with different hierarchic arrangement.

An application can isolate itself from these differences by defining a set of functions to navigate the relationships between customers, orders, and products: orders-for-customer, orders-for-product, customer-for-order, product-for-order. These functions can be implemented in different ways for the three different input formats.

3.4.1.1 Input Data

Flat structure:

<customer id="c123">...</customer>
<product id="p789">...</product>
<order customer="c123" product="p789">...</order>

Orders within customer elements:

<customer id="c123">
<order product="p789">...</order>
</customer>
<product id="p789">...</product>

Orders within product elements:

<customer id="c123">...</customer>
<product id="p789">
<order customer id="c123">...</order>
</product>
3.4.1.2 Solution in XQuery 3.1

For example, with the first format the implementation might be:

declare variable $flat-input-functions := 
  map {
    'orders-for-customer' : 
       function($c as element(customer)) as element(order)* { $c/../order[@customer=$c/@id] },
    'orders-for-product' : 
       function($p as element(product)) as element(order)* { $p/../order[@product=$p/@id] },
    'customer-for-order' : 
       function($o as element(order)) as element(customer) { $o/../customer[@id=$o/@customer] },
    'product-for-order' : 
       function($o as element(order)) as element(product) { $o/../product[@id=$o/@product] }
  };

3.4.2 Search and Snippeting

Create a general interface that takes as input some words, does a full-text search for them, and returns snippets of the top 10 results, ordered by score, where the nodes to search, their structure, how to construct snippets and how to score them differ for different data sets.

3.4.2.1 Solution in XQuery 3.1:

Create a template method and use a map of functions to define the implementation of the plug-in points.

(: General interface module :)

module namespace this="http://example.com/search-interface/";

declare function this:search(
    $words as xs:string*, $collection as map(xs:string, function(*)))
{
  for $d in $collection('select')[. contains text {$words} any word]
  order by $collection('score', $d, $words)
  count $c
  where $c <= 10
  return $collection('snippet', $d, $words)
};

(: Specific implementation example :)

import module namespace s="http://example.com/search-interface/";

declare variable $twitter as map(xs:string, function(*)) :=
  map {
    'select' : 
       function() as node()* { collection("twitter") },
    'score' : 
       function($n as node(), $words as xs:string*) as xs:double
        { 
          let score $s1 := $n contains text {$words} any word
          let score $s2 := $n contains text {$words} all words
          return $s1 + $s2
        },
    'snippet' : 
       function($node as node(), $words as xs:string*) as node() { $node }
  };

declare variable $blog as map(xs:string, function(*)) :=
  map {
    'select' : 
       function() as node()* { collection("blogs")/body },
    'score' : 
       function($n as node(), $words as xs:string*) as xs:double
        {
          let $s1 := avg(
            for $p score $s in $n/para[. contains text {$words} any word]
            return $s
          )
          let $s2 := avg(
            for $p score $s in $n/comment[. contains text {$words} weight 0.5 any word]
            return $s
          )
          let score $s3 := $n/title contains text {$words} weight 5.0 any word
          return $s1 + $s2 + $s3
        },
    'snippet' : 
       function($node as node(), $words as xs:string*) as node()
        { 
          <result>
           {
             $node/title, $node/para[1], $node/comment[1]
           }
          </result>  
        }
  };

declare variable $books as map(xs:string, function(*)) :=
  map {
   'select' : 
      function() as node()* { collection()//chapter },
   'score' : 
      function($n as node(), $words as xs:string*) as xs:double
        { 
          let score $s1 := $n contains text {$words} any word
          let score $s2 := $n/title contains text {$words} weight 5.0 any word
          return $s1 + $s2
          },
   'snippet' : 
      function($node as node(), $words as xs:string*) as node()
       { 
         <result>
          {
            $node/title,
            ((for $p score $s in $node/p[. contains text {$words} all words]
              order by $s
              return $p),
             (for $p score $s in $node/p[. contains text {$words} any word]
              order by $s
              return $p))[1]
          }
         </result> 
       }
  };

(: Get top 10 from various sources :)
s:search(("fire","earthquake"),$books),
s:search(("fire","earthquake"),$twitter),
s:search(("fire","earthquake"),$blog)

3.4.3 Abstracting Document Structure

Provide access to various pieces of metadata to application, insulating that application code from variations in document structure.

3.4.3.1 Solution in XQuery 3.1:

Define the metadata interface through a map of functions.

(: Specific implementations :)
declare namespace xh="http://www.w3.org/1999/xhtml";
declare variable $xhtml as map(xs:string, function(*)) :=
  map {
    'title' : 
       function($n as document-node()) as xs:string? { $n/xh:head/xh:title },
    'author' : 
       function($n as document-node()) as xs:string? { $n/xh:head/xh:meta[@name='author']/@content },
    'pubdate' : 
       function($n as document-node()) as xs:string? { $n/xh:head/xh:meta[@name='created']/@content },
    'publisher' : 
       function($n as document-node()) as xs:string? { () }
    };

declare variable $medline-citation as map(xs:string, function(*)) :=
  map {
    'title' : 
       function($n as document-node()) as xs:string? 
        { 
          $n/MedlineCitation/Article/ArticleTitle 
        },
    'author': 
       function($n as document-node()) as xs:string?
        {
          string-join(
            for $a in $n/MedlineCitation//Author 
            return concat($a/LastName, ", ", $a/ForeName) 
            , 
            "; "
          )
        },
    'pubdate' : 
       function($n as document-node()) as xs:string?
        {
          let $d := $n/MedlineCitation/Article/PubDate
          return string-join(($d/Day,$d/Month,$d/Year), " ")
        },
    'publisher' : 
       function($n as document-node()) as xs:string?
        {  
          $n/MedlineCitation/MedlineJournalIngo/MedlineTA 
        }
  };

3.5 Parameter Passing

Often library functions may have a large number of optional arguments, which are awkward or impossible to provide using the existing mechanism of variable arity functions.

3.5.1 XSLT Stylesheet Parameters

Pass the list of parameter names and values to the xdmp:xslt-invoke() function, which invokes an XSLT stylesheet.

3.5.1.1 Solution in XQuery 3.1:
declare function xdmp:xslt-invoke($path as xs:string, $input as node(),
  $params as map(xs:QName, item()*)) as document-node()* external;

let $params := map {
  xs:QName("toc") := true(),
  xs:QName("index") := doc("index_terms.xml")
}
return xdmp:xslt-invoke("my-stylesheet.xsl", doc("my-doc.xml"), $params)

3.5.2 Function Options

Provide a mechanism to supply (otherwise defaulted) option values to the my:doc() function, which control aspects of it's behaviour, including:

  • Parsing of external entities

  • DTD validation

  • XML Schema validation

  • Lax (XML Schema) validation

  • Whitespace stripping

  • URI resolution

Using maps in this scenario brings benefits over using XML structure, including:

  • Nodes are not copied; their identity is retained

  • Atomic items are not serialized, and retain their specific type

  • Functions can be passed in as options - the relevant example in this case being the URI resolver.

3.5.2.1 Solution in XQuery 3.1
declare function my:doc($uri as xs:string, $options as map(xs:string, item()*)) as document-node()? external;

(: Enable lax XML Schema validation :)
my:doc("validate-me.xml", map {
  "schema-validation" : true(),
  "lax-validation" : true()
}),

(: Enable whitespace stripping, and a custom URI resolution :)
my:doc("../relative-uri.xml", map {
  "strip-whitespace" : true(),
  "uri-resolver" : resolve-uri(?, base-uri())
})

3.5.3 Translation

Design a language-agnostic game (here just the core), which allows a translation function or map as a parameter.

3.5.3.1 Solution in XQuery 3.1:
declare function local:play(
  $secret-number as xs:integer,
  $guessed-number as xs:integer,
  $translator as function(xs:string) as xs:string)
{
  switch (true())
  case $guessed-number eq $secret-number
    return $translator("You won!")
  case $guessed-number lt $secret-number
    return $translator("The secret number is greater.")
  default (: $guessed-number gt $secret-number :)
    return $translator("The secret number is lower.")
};

local:play(76, 86, function($x) { $x }), (: Keep English :)

local:play(76, 86, map {
  "You won!" : "Du hast gewonnen!",
  "The secret number is greater." : "Die geheime Zahl ist groesser.",
  "The secret number is lower." :  Die geheime Zahl ist kleiner." }
),

local:play(76, 86, $automated-translator-based-on-natural-language-processing)

3.6 Natural Language Processing

Software used for natural language processing and text analytics frequently uses data structures like maps and arrays. For instance, the Python Natural Language Toolkit (NLTK) uses lists and tuples extensively. In this use case, we use a library that invokes NLTK to perform simple natural language processing, returning results in a format very similar to that used by NLTK, and perform a variety of simple tasks.

3.6.1 Input Data

In this use case, we are using the Gutenberg edition of Jane Austin's "Emma", as packaged in NLTK. To return the sentences of a text, we use the nltk:sentences() function, which returns sentences using the same data structures as NLTK.

Here are a few sentences resulting from the function call nltk:sentences('austin-emma.txt'), using arrays to represent Python's list structures:

Sentence Representation:

[
  ['I', 'must', 'put', 'on', 'a', 'few', 'ornaments', 'now', ',', 'because', 'it', 'is', 'expected', 'of', 'me', '.'],
  ['A', 'bride', ',', 'you', 'know', ',', 'must', 'appear', 'like', 'a', 'bride', ',', 'but', 'my', 'natural', 'taste', 
   'is', 'all', 'for', 'simplicity', ';', 'a', 'simple', 'style', 'of', 'dress', 'is', 'so', 'infinitely', 'preferable', 
   'to', 'finery', '.'],
  ['But', 'I', 'am', 'quite', 'in', 'the', 'minority', ',', 'I', 'believe', ';', 'few', 'people', 'seem', 'to', 'value', 
   'simplicity', 'of', 'dress', ',--', 'show', 'and', 'finery', 'are', 'every', 'thing', '.']
]
      

NLTK has multiple representations of sentences. If $s is bound to the second sentence in the above data structure, then nltk:pos-tag($s) returns the following:

Part of Speech Representation:

[['A', 'DT'], ['bride', 'NN'], [',', ','], ['you', 'PRP'], ['know', 'VBP'], [',', ','], ['must', 'MD'], 
 ['appear', 'VB'], ['like', 'IN'], ['a', 'DT'], ['bride', 'NN'], [',', ','], ['but', 'CC'], ['my', 'PRP$'], 
 ['natural', 'JJ'], ['taste', 'NN'], ['is', 'VBZ'], ['all', 'DT'], ['for', 'IN'], ['simplicity', 'NN'], [';', ':'], 
 ['a', 'DT'], ['simple', 'JJ'], ['style', 'NN'], ['of', 'IN'], ['dress', 'NN'], ['is', 'VBZ'], 
 ['so', 'RB'], ['infinitely', 'RB'], ['preferable', 'JJ'], ['to', 'TO'], ['finery', 'VB'], ['.', '.']
]
      

3.6.2 Convert Part of Speech Data to XML

If $s is bound to a part of speech representation, we can convert it to an XML format using the following query:

<s>
 {
  for $w in $s?*
  return <w pos="{ $w(2) }">{ $w(1) }</w>
 }
</s>
      

Or if we prefer to use meaningful names instead of the numeric positions, we can create an index that maps between names and positions and use it as follows:

declare variable $index := map { "pos" : 2, "lemma" : 1 };

<s>
 {
  for $w in $s?*
  return <w pos="{ $w($index("pos")) }">{ $w($index("lemma")) }</w>
 }
</s>
      

Both queries have the same result:

<s>
  <w pos="DT">A</w>
  <w pos="NN">bride</w>
  <w pos=",">,</w>
  <w pos="PRP">you</w>
  <w pos="VBP">know</w>
  <w pos=",">,</w>
  <w pos="MD">must</w>
  <w pos="VB">appear</w>
  <w pos="IN">like</w>
  <w pos="DT">a</w>
  <w pos="NN">bride</w>
  <w pos=",">,</w>
  <w pos="CC">but</w>
  <w pos="PRP$">my</w>
  <w pos="JJ">natural</w>
  <w pos="NN">taste</w>
  <w pos="VBZ">is</w>
  <w pos="DT">all</w>
  <w pos="IN">for</w>
  <w pos="NN">simplicity</w>
  <w pos=":">;</w>
  <w pos="DT">a</w>
  <w pos="JJ">simple</w>
  <w pos="NN">style</w>
  <w pos="IN">of</w>
  <w pos="NN">dress</w>
  <w pos="VBZ">is</w>
  <w pos="RB">so</w>
  <w pos="RB">infinitely</w>
  <w pos="JJ">preferable</w>
  <w pos="TO">to</w>
  <w pos="VB">finery</w>
  <w pos=".">.</w>
</s>
      
      

3.6.3 Converting arrays to maps

If $s is bound to a sentence in part of speech representation, the following query converts it to a map with meaningful property names:

array {
  for $w in $s?*
  return map { "pos" : $w(2), "lemma" : $w(1) }
}
       

Here is the output of the above query:

[ { "pos" : "DT", "lemma" : "A" }, 
  { "pos" : "NN", "lemma" : "bride" }, 
  { "pos" : ",", "lemma" : "," }, 
  { "pos" : "PRP", "lemma" : "you" }, 
  { "pos" : "VBP", "lemma" : "know" }, 
  { "pos" : ",", "lemma" : "," }, 
  { "pos" : "MD", "lemma" : "must" }, 
  { "pos" : "VB", "lemma" : "appear" }, 
  { "pos" : "IN", "lemma" : "like" }, 
  { "pos" : "DT", "lemma" : "a" }, 
  { "pos" : "NN", "lemma" : "bride" }, 
  { "pos" : ",", "lemma" : "," }, 
  { "pos" : "CC", "lemma" : "but" }, 
  { "pos" : "PRP$", "lemma" : "my" }, 
  { "pos" : "JJ", "lemma" : "natural" }, 
  { "pos" : "NN", "lemma" : "taste" }, 
  { "pos" : "VBZ", "lemma" : "is" }, 
  { "pos" : "DT", "lemma" : "all" }, 
  { "pos" : "IN", "lemma" : "for" }, 
  { "pos" : "NN", "lemma" : "simplicity" }, 
  { "pos" : ":", "lemma" : ";" }, 
  { "pos" : "DT", "lemma" : "a" }, 
  { "pos" : "JJ", "lemma" : "simple" }, 
  { "pos" : "NN", "lemma" : "style" }, 
  { "pos" : "IN", "lemma" : "of" }, 
  { "pos" : "NN", "lemma" : "dress" }, 
  { "pos" : "VBZ", "lemma" : "is" }, 
  { "pos" : "RB", "lemma" : "so" }, 
  { "pos" : "RB", "lemma" : "infinitely" }, 
  { "pos" : "JJ", "lemma" : "preferable" }, 
  { "pos" : "TO", "lemma" : "to" }, 
  { "pos" : "VB", "lemma" : "finery" }, 
  { "pos" : ".", "lemma" : "." } 
]
       

3.6.4 Group by Part of Speech

If $s is bound to a sentence in part of speech representation, the following query groups words by part of speech, selecting parts of speech particularly illustrative of Jane Austen's writing style.

for $word in $s?*
let $pos := $word(2)
let $lexeme := $word(1)
where $pos = ("JJ", "NN", "RB", "VB")
group by $pos
order by $pos
return 
  <pos name="{$pos}">
    { 
      for $l in distinct-values($lexeme)
      return <lexeme>{ $l }</lexeme>
    }
  </pos>
      

Here is the output of the above query:

<pos name="JJ">
<lexeme>natural</lexeme>
<lexeme>simple</lexeme>
<lexeme>preferable</lexeme>
</pos>
<pos name="NN">
  <lexeme>bride</lexeme>
  <lexeme>taste</lexeme>
  <lexeme>simplicity</lexeme>
  <lexeme>style</lexeme>
  <lexeme>dress</lexeme>
</pos>
<pos name="RB">
  <lexeme>so</lexeme>
  <lexeme>infinitely</lexeme>
</pos>
<pos name="VB">
  <lexeme>appear</lexeme>
  <lexeme>finery</lexeme>
</pos>
      

3.6.5 Trigrams

In corpus linguistics, n-grams are the basis for certain statistical techniques used to explore and compare texts; for instance, they are used to determine authorship of texts. If $s is bound to a sentence in sentence notation, the following query computes trigrams for a text:

declare function local:words-only($s)
{
  for $w in $s
  where not($w(2) = (".", ",", ";", ":"))
  return $w(1)
};

for sliding window $w in local:words-only($s?*)
    start at $i when true()
    only end at $j when $j - $i eq 2
return 
    array { $w }

Here is the result for a sentence used in an earlier example:

[ "A", "bride", "you" ], 
[ "bride", "you", "know" ], 
[ "you", "know", "must" ], 
[ "know", "must", "appear" ], 
[ "must", "appear", "like" ], 
[ "appear", "like", "a" ], 
[ "like", "a", "bride" ], 
[ "a", "bride", "but" ], 
[ "bride", "but", "my" ], 
[ "but", "my", "natural" ], 
[ "my", "natural", "taste" ], 
[ "natural", "taste", "is" ], 
[ "taste", "is", "all" ], 
[ "is", "all", "for" ], 
[ "all", "for", "simplicity" ], 
[ "for", "simplicity", "a" ], 
[ "simplicity", "a", "simple" ], 
[ "a", "simple", "style" ], 
[ "simple", "style", "of" ], 
[ "style", "of", "dress" ], 
[ "of", "dress", "is" ], 
[ "dress", "is", "so" ], 
[ "is", "so", "infinitely" ], 
[ "so", "infinitely", "preferable" ], 
[ "infinitely", "preferable", "to" ], 
[ "preferable", "to", "finery" ]
	  

3.6.6 Partitioning using filters

Filters can be used to partition the words of a sentence in a variety of ways. In this simple example, we use filters to distinguish verbs from other parts of speech. In NLTK, parse codes that start with the string VB denote verb forms.

In this example, the variable $s is bound to sentence in parsed format, e.g.

[
 ['A', 'DT'], ['bride', 'NN'], [',', ','], ['you', 'PRP'], ['know', 'VBP'], [',', ','], ['must', 'MD'], 
 ['appear', 'VB'], ['like', 'IN'], ['a', 'DT'], ['bride', 'NN'], [',', ','], ['but', 'CC'], ['my', 'PRP$'], 
 ['natural', 'JJ'], ['taste', 'NN'], ['is', 'VBZ'], ['all', 'DT'], ['for', 'IN'], ['simplicity', 'NN'], [';', ':'], 
 ['a', 'DT'], ['simple', 'JJ'], ['style', 'NN'], ['of', 'IN'], ['dress', 'NN'], ['is', 'VBZ'], 
 ['so', 'RB'], ['infinitely', 'RB'], ['preferable', 'JJ'], ['to', 'TO'], ['finery', 'VB'], ['.', '.']
]

The filter function takes a boolean function, and returns one array with those items that satisfy the function, and a second array with those items that do not.

declare function local:filter($s as item()*, $p as function(item()) as xs:boolean)
{
  array { $s[$p(.)] },   array { $s[not($p(.))] }
};
        

We can call it with the starts-with() function to partition a sentence.

let $f := function($a) { starts-with($a(2), "VB") }
return
  local:filter($s?*, $f)
       

Here is the output of the query for the sentence shown above.

[ [ "know", "VBP" ], [ "appear", "VB" ], [ "is", "VBZ" ], [ "is", "VBZ" ], 
[ "finery", "VB" ] ],

[ [ "A", "DT" ], [ "bride", "NN" ], [ ",", "," ], [ "you", "PRP" ], 
  [ ",", "," ], [ "must", "MD" ], [ "like", "IN" ], [ "a", "DT" ], 
  [ "bride", "NN" ], [ ",", "," ], [ "but", "CC" ], [ "my", "PRP$" ], 
  [ "natural", "JJ" ], [ "taste", "NN" ], [ "all", "DT" ], [ "for", "IN" ], 
  [ "simplicity", "NN" ], [ ";", ":" ], [ "a", "DT" ], [ "simple", "JJ" ], 
  [ "style", "NN" ], [ "of", "IN" ], [ "dress", "NN" ], [ "so", "RB" ], 
  [ "infinitely", "RB" ], [ "preferable", "JJ" ], [ "to", "TO"], 
  [ ".", "." ] ]
       

A programmer might choose to represent filter results using a map instead of an array, as shown in the following code.

declare function local:filter($s as item()*, $p as function(item()) as xs:boolean)
{
  {
    true() : array { $s[$p(.)] },   
    false() : array { $s[not($p(.))] }
  }
};


let $f := function($a) { starts-with($a(2), "VB") }
return
  local:filter($s?*, $f)
      

Here is the output of the above query using the same data.

{ 

  "true" : 
             [ [ "know", "VBP" ], [ "appear", "VB" ], [ "is", "VBZ" ],
               ["is", "VBZ" ], [ "finery", "VB" ] ],

  "false" :  

             [ [ "A", "DT" ], ["bride", "NN" ], [ ",", "," ], 
	       [ "you", "PRP" ], [ ",", "," ], [ "must", "MD" ], 
	       [ "like", "IN" ], [ "a", "DT" ], [ "bride", "NN" ], 
	       [ ",", "," ], [ "but", "CC" ], [ "my", "PRP$" ], 
	       [ "natural", "JJ" ], [ "taste", "NN" ], [ "all", "DT"],
	       [ "for", "IN" ], [ "simplicity", "NN" ], [ ";", ":" ],
	       [ "a", "DT" ], [ "simple", "JJ" ], [ "style", "NN" ], 
	       [ "of", "IN" ], [ "dress", "NN" ], [ "so", "RB" ], 
	       [ "infinitely", "RB" ], [ "preferable", "JJ" ], 
	       [ "to", "TO" ], [ ".", "." ] ] 
}
      

3.7 Comparing Sequences in Optical Character Recognition

When Rigaudon optical character recognition software is used for multilingual texts, languages are identified by character set if possible, and formatted in hocr format. For instance, the text "the other possible derivation from ἡ ἐπιοῦσα, dies crastinus", which contains English, Greek, and Latin, might be represented as follows in raw OCR output (the format is simplified somewhat for the sake of presentation).

<span class="ocr_word" title="bbox 1388 430 1461 474">the</span> 
<span class="ocr_word" title="bbox 1514 433 1635 476">other</span>
<span class="ocr_word" title="bbox 133 498 317 554">pcssible</span> 
<span class="ocr_word" title="bbox 354 498 590 541">derivation</span> 
<span class="ocr_word" title="bbox 631 497 738 538">from</span> 
<span class="ocr_word" title="bbox 772 495 799 547" lang="grc" xml:lang="grc">ἡ</span> 
<span class="ocr_word" title="bbox 835 495 1019 538" lang="grc" xml:lang="grc">ἐπιοῦσα</span> 
<span class="ocr_word" title="bbox 134 567 220 607">dies</span> 
<span class="ocr_word" title="bbox 257 566 462 607">erastinus</span>
    

In the above output, two words were not correctly recognized, the English word "possible" and the Latin word "crastinus". Rigaudon uses multilingual spell checkers to find the nearest likely word in a one of the languages likely to be used in a given text. For this particular text, we expect to find English, Greek, and Latin.

In this use case, we take the above hocr as input and call the spellcheck function, implemented as an external function, to identify which words are likely in each candidate language. Having done so, we combine the results to construct the most likely text.

The following function extracts the text from the above data.

declare function local:extract-text($spans)
{
  for $s in $spans return string($s)
};
    

Here is the output of the function for the data shown above.

"the", "other", "pcssible", "derivation", "from", "ἡ", "ἐπιοῦσα", "dies", "erastinus"
    

The following function performs a spellcheck in a set of languages, creating a map that identifies the original and each language.

declare variable $languages := ("English", "Greek", "Latin");

declare function local:spellcheck($languages, $text)
{
  map:merge (
     map { "languages" : $languages },
     map { "raw" : $text  },

     for $l in $languages
     return { 
       $l : array {
         for $w in $text
         return ext:sc($l, $w)
       }
     }
  )
};

let $t := local:extract-text($spans)
return local:spellcheck($languages, $t)
    

Here is the output of the above query.

{ 
   "languages" : ( "English", "Greek", "Latin" ), 
   "raw" :     [ "the", "other", "pcssible", "derivation", "from", "ἡ", "ἐπιοῦσα", "dies", "erastinus" ], 
   "English" : [ "the", "other", "possible", "derivation", "from", null, null, "dies", null ], 
   "Greek" :   [ null, null, null, null, null, "ἡ", "ἐπιοῦσα", null, null ],
   "Latin" :   [ null, null, null, null, null, null, null, "dies", "crastinus" ]
}
    

The following function merges lookup results in the above format. The first parameter lists a set of languages, in preference order. For each word, the function picks the non-null lookup result for the most preferred language available, or the original "raw" word if all lookups return null. In this code, we assume that $m is bound to the data structure shown above.

declare variable $languages := ("English", "Greek", "Latin");

declare function local:merge($languages, $m)
{
  let $size := count($m("raw")?*)
  for $i in 1 to $size
  let $candidates := ($languages ! $m(.)($i)[ . ne null] , $m("raw")($i))
  return $candidates[1]
};

local:merge($languages, $m)
    

Here is the result of the query:

the other possible derivation from ἡ ἐπιοῦσα dies crastinus

3.8 Transforms for Graphics

This use case uses rotation matrices to rotate a shape in three dimensions.

The following library implements three-dimensional rotation in XQuery

declare function local:rotate-x( $theta )
{
   [
     [ 1, 0, 0 ],
     [ 0, cosine($theta), - sine($theta) ],
     [ 0, sine($theta), cosine($theta) ]
   ]
}; 

declare function local:rotate-y( $theta )
{
   [
     [ cosine($theta), 0, sine($theta) ],
     [ 0, 1, 0],
     [ - sine($theta), 0, cosine($theta) ]
   ]
}; 

declare function local:rotate-z( $theta )
{
   [
     [ cosine($theta), - sine($theta), 0 ],
     [ sine($theta), cosine($theta), 0 ],
     [ 0, 0, 1]
   ]
}; 

declare function local:rotate($pitch as xs:double, $yaw as xs:double, $roll as xs:double)
{
   let $p := local:rotate-x($pitch)
   let $y := local:rotate-y($yaw)
   let $r := local:rotate-z($roll)
   let $py :=local:mult($p, $y)
   return local:mult($py, $r)
};

declare function local:mult( $matrix1, $matix2 )
{
  if (length($matrix1) != length($matrix2(1))
  then error("Matrices must be m*n and n*p to multiply!")
  else array {
     for $i in 1 to length($matrix1)
     return array {
         for $j in 1 to length($matrix2(1))
	 return
	    sum (
           for $k in 1 to length($matrix2)
	       return $matrix1($i)($k) * $matrix2($k)($j)
	    )
     }
  }
};

let $rect := [[0, 0, 0], [10, 0, 0], [10, 10, 0], [0, 10, 0], [0, 0, 0]]
let $rot := for $r in $rect()
            return local:mult($r, local:rotate( 10, 10, 10 )
return img:render( $rot )
                        
        

3.9 JSON

JSON is becoming an important data format that many XQuery and XSLT users have to deal with. Tasks performed can include importing JSON, processing it, and exporting JSON.

3.9.1 Information Retrieval

Import a JSON document and retrieve the mobile phone number from it.

The fn:parse-json() function parses a JSON document into an XDM value as follows:

  1. A JSON object is converted into a map of type map(xs:string, item()?).

  2. A JSON array is converted into a map of type map(xs:integer, item()?).

  3. A JSON string is converted into an xs:string atomic value.

  4. A JSON number is converted into an xs:double atomic value.

  5. A JSON boolean is converted into an xs:boolean atomic value.

  6. A JSON null is converted into the empty sequence.

3.9.1.1 Input Data

The JSON document, mildred.json:

{
     "firstname": "Mildred",
     "lastname": "Moore",
     "age": 32,
     "address":
     {
         "street": "91 High Street",
         "town": "Biscester",
         "county": "Oxfordshire",
         "postcode": "OX6 3PD"
     },
     "phone":
     [
         {
           "type": "home",
           "number": "01869 378073"
         },
         {
           "type": "mobile",
           "number": "07356 740756"
         }
     ]
}
3.9.1.2 Solution in XQuery 3.1
        json-doc("mildred.json")?phone?*[?type = 'mobile']?number
3.9.1.3 Result
"07356 740756"

3.9.2 Converting JSON to XML

Convert a JSON data file to XML.

3.9.2.1 Input Data

The JSON document, employees.json:

{ "accounting" : [
      { "firstName" : "John",
        "lastName"  : "Doe",
        "age"       : 23 },

      { "firstName" : "Mary",
        "lastName"  : "Smith",
        "age"       : 32 }
                 ],
  "sales"     : [
      { "firstName" : "Sally",
        "lastName"  : "Green",
        "age"       : 27 },

      { "firstName" : "Jim",
        "lastName"  : "Galley",
        "age"       : 41 }
                  ]
}
3.9.2.2 Result
<department name="accounting">
  <employee>
    <firstName>John</firstName>
    <lastName>Doe</lastName>
    <age>23</age>
  </employee>
  <employee>
    <firstName>Mary</firstName>
    <lastName>Smith</lastName>
    <age>32</age>
  </employee>
</department>
<department name="sales">
  <employee>
    <firstName>Sally</firstName>
    <lastName>Green</lastName>
    <age>27</age>
  </employee>
  <employee>
    <firstName>Jim</firstName>
    <lastName>Galley</lastName>
    <age>41</age>
  </employee>
</department>
3.9.2.3 Solution in XQuery 3.1
let $input := json-doc('employees.json')
for $k in map:keys($input)
return
  <department name="{ $k }">
    {
    let $array := $input($k)
    for $i in 1 to array:size($array)
    let $emp := $array($i)
    return
      <employee>
        <firstName>{ $emp('firstName') }</firstName>
        <lastName>{ $emp('lastName') }</lastName>
        <age>{ $emp('age') }</age>
      </employee>
    }
  </department>

3.9.3 Update by Copying

Update the first name of the author "Dan Suciu" to "John" in the "bookinfo.json" document.

3.9.3.1 Input Data

The JSON document, bookinfo.json:

{
    "book": {
        "title": "Data on the Web",
        "year": 2000,
        "author": [
            {
                "last": "Abiteboul",
                "first": "Serge"
            },
            {
                "last": "Buneman",
                "first": "Peter"
            },
            {
                "last": "Suciu",
                "first": "Dan"
            }
        ],
        "publisher": "Morgan Kaufmann Publishers",
        "price": 39.95
    }
}
3.9.3.2 Solution in XQuery 3.1
declare function local:deep-put($input as item()*, $key as xs:string, $value as item()*) as item()* 
{                                                             
  let $mf := function($k, $v) {
                if ($k eq $key) 
                then map{$k : $value} 
                else map{$k : local:deep-put($v, $key, $value)} 
             }
  for $i in $input 
  return
    if ($i instance of map(*))
    then map:merge(map:for-each($i, $mf))
    else if ($i instance of array(*))
    then array{ local:deep-put($i?*, $key, $value) }
    else $i
};

local:deep-put(json-doc("bookinfo.json"), "first", "John")

Note:

Extending the Update Facility to allow updating maps would allow a simpler solution.

3.9.4 Joins

3.9.4.1 Input Data

The following queries are based on a social media site that allows users to interact with their friends. collection("users") contains data on users and their friends:

{
  "name" : "Sarah",
  "age" : 13,
  "gender" : "female",
  "friends" : [ "Jim", "Mary", "Jennifer"]
}

{
  "name" : "Jim",
  "age" : 13,
  "gender" : "male",
  "friends" : [ "Sarah" ]
}
          
3.9.4.2 Solution in XQuery 3.1

The following query performs a join on Sarah's friend list to return the Object representing each of her friends:

for $sarah in collection("users"),
    $friend in collection("users")
where $sarah("name") = "Sarah"
  and $friend("name") = $sarah("friends")?*
return $friend 

3.9.5 Grouping Queries for JSON

Note:

These queries are based on similar queries in the XQuery 3.0 Use Cases.

3.9.5.1 Input Data

The input is a sequence (whose order is of no concern) that contains the following sales data, represented here in JSON notation:

{ "product" : "broiler", "store number" : 1, "quantity" : 20  },
{ "product" : "toaster", "store number" : 2, "quantity" : 100 },
{ "product" : "toaster", "store number" : 2, "quantity" : 50 },
{ "product" : "toaster", "store number" : 3, "quantity" : 50 },
{ "product" : "blender", "store number" : 3, "quantity" : 100 },
{ "product" : "blender", "store number" : 3, "quantity" : 150 },
{ "product" : "socks", "store number" : 1, "quantity" : 500 },
{ "product" : "socks", "store number" : 2, "quantity" : 10 },
{ "product" : "shirt", "store number" : 3, "quantity" : 10 }

We want to group sales by product, across stores.

3.9.5.2 Result
{
  "blender" : 250,
  "broiler" : 20,
  "shirt" : 10,
  "socks" : 510,
  "toaster" : 200
  }       
3.9.5.3 Solution in XQuery 3.1

We assume a function collection("sales") that returns a sequence of items representing the rows in this table.

Query:

map:merge((
  for $sales in collection("sales")
  let $pname := $sales("product")
  group by $pname
  return map { $pname : sum(for $s in $sales return $s("quantity")) }
))    

3.9.6 More Complex Grouping Queries for JSON

Now let's do a more complex grouping query, showing sales by category within each state. We need further data to describe the categories of products and the location of stores.

3.9.6.1 Input Data

collection("products") contains the following data:

{ "name" : "broiler", "category" : "kitchen", "price" : 100, "cost" : 70 },
{ "name" : "toaster", "category" : "kitchen", "price" : 30, "cost" : 10 },
{ "name" : "blender", "category" : "kitchen", "price" : 50, "cost" : 25 },
{ "name" : "socks", "category" : "clothes", "price" : 5, "cost" : 2 },
{ "name" : "shirt", "category" : "clothes", "price" : 10, "cost" : 3 }

collection("stores") contains the following data:

{ "store number" : 1, "state" : CA },
{ "store number" : 2, "state" : CA },
{ "store number" : 3, "state" : MA },
{ "store number" : 4, "state" : MA }
        
3.9.6.2 Result
            [
              { "CA" : 
                [
                  {"kitchen" : { "broiler" : 20, "toaster" : 150 }},
                  {"clothes" : { "socks" : 510 }}
                ]
              },
              { "MA" : 
                [ 
                  { "kitchen" : { "blender" : 250, "toaster" : 50 }},
                  { "clothes" : { "shirt" : 10 }}
                ]
              }
            ]
        
3.9.6.3 Solution in XQuery 3.1

The following query groups by state, then by category, then lists individual products and the sales associated with each.

Query:

array {
  for $store in json-doc('stores.json') ? *
  let $state := $store?state
  group by $state
  return
    map {
      $state :  array {
        for $product in json-doc('products.json') ? *
        let $category := $product?category
        group by $category
        return
          map {
            $category :  map:merge((
              for $sales in json-doc('sales.json') ? *
              where $sales?("store number") = $store?("store number") and $sales?product = $product?name
              let $pname := $sales?product
              group by $pname
              return map { $pname :  sum(for $s in $sales return $s?quantity)}
            ))
          }
      }
   }
}
        

3.9.7 JSON to JSON Transformations

The following query takes satellite data, and summarizes which satellites are visible. The data for the query is a simplified version of a Stellarium file that contains this information.

3.9.7.1 Input Data
{
  "creator" : "Satellites plugin version 0.6.4",
  "satellites" : {
    "AAU CUBESAT" : {
      "tle1" : "1 27846U 03031G 10322.04074654  .00000056  00000-0  45693-4 0  8768",
      "visible" : false
    },
    "AJISAI (EGS)" : {
      "tle1" : "1 16908U 86061A 10321.84797408 -.00000083  00000-0  10000-3 0  3696",
      "visible" : true
    },
    "AKARI (ASTRO-F)" : {
      "tle1" : "1 28939U 06005A 10321.96319841  .00000176  00000-0  48808-4 0  4294",
      "visible" : true
    }
  }
}

We want to query this data to return a summary that looks like this.

3.9.7.2 Result
{
  "visible" : [
     "AJISAI (EGS)",
     "AKARI (ASTRO-F)"
  ],
  "invisible" : [
     "AAU CUBESAT"
  ]
}       
3.9.7.3 Solution in XQuery 3.1

The following is a query that returns the desired result.

Query:

let $sats := json-doc("satellites.json")("satellites")
return map {
  "visible" : array {
     map:keys($sats)[$sats(.)("visible")]
  },
  "invisible" : array {
     map:keys($sats)[not($sats(.)("visible"))]
  }
}

3.9.8 Converting XML to JSON

JSON programmers frequently need to convert XML to JSON. The following query is based on a Wikipedia XML export format, using data from the category "Origami". Here is an excerpt of this data:

3.9.8.1 Input Data
<mediawiki>
  <siteinfo>
    <sitename>Wikipedia</sitename>

    <page>
      <title>Kawasaki's theorem</title>
      <id>14511776</id>
      <revision>
        <id>435519187</id>
        <timestamp>2011-06-21T20:08:56Z</timestamp>
        <contributor>
          <username>Some jerk on the Internet</username>
          <id>6636894</id>
        </contributor>

!!! SNIP !!!

    <page>
      <title>Origami techniques</title>
      <id>193590</id>
      <revision>
        <id>447687387</id>
        <timestamp>2011-08-31T17:21:49Z</timestamp>
        <contributor>
          <username>Dmcq</username>
          <id>3784322</id>
        </contributor>

!!! SNIP !!!

    <page>
      <title>Mathematics of paper folding</title>
      <id>232840</id>
      <revision>
        <id>440970828</id>
        <timestamp>2011-07-23T09:10:42Z</timestamp>
        <contributor>
          <username>Tabletop</username>
          <id>173687</id>
        </contributor>
       
3.9.8.2 Result
[
 {
  "title" : "Kawasaki's theorem",
  "id" : "14511776",
  "timestamp" : "2011-06-21T20:08:56Z",
  "authors" : ["Some jerk on the Internet" ]
 },
 {
  "title" : "Origami techniques",
  "id" : "193590",
  "timestamp" : "2011-08-31T17:21:49Z",
  "authors" : ["Dmcq" ]
 },
 {
  "title" : "Mathematics of paper folding",
  "id" : "232840",
  "timestamp" : "2011-07-23T09:10:42Z",
  "authors" : ["Tabletop" ]
 }
]
          
3.9.8.3 Solution in XQuery 3.1

The following query converts this data to JSON:

Query:

array {
 for $page in doc("Wikipedia-Origami.xml")//page
 return map {
  "title": string($page/title),
  "id" : string($page/id),
  "last updated" : string($page/revision[1]/timestamp),
  "authors" : array {
       for $a in $page/revision/contributor/username
       return string($a)
  }
 }
}          

3.9.9 Transforming JSON to SVG

Suppose a JavaScript implementation provides an interface for queries, and a JavaScript program contains the following data [1]:

3.9.9.1 Input Data
var data = {
   "color" : "blue",
   "closed" : true,
   "points" : [[10,10], [20,10], [20,20], [10,20]]
   };
          
3.9.9.2 Solution in XQuery 3.1

This data can be converted to SVG by placing the text of a query in a JavaScript variable and calling the appropriate JavaScript function to invoke the query:

var query =
 "declare variable stroke := attribute stroke { $data("color") };
  declare variable points := attribute points { $data("points")?*?* };
  if (closed) then
    <svg><polygon>{ $stroke, $points }</polygon></svg>
  else
    <svg><polyline>{ $stroke, $points }</polyline></svg>" 

This query can be invoked with a JavaScript API call:

jsoniq(data, query)
          

Here is the result of the above query:

<svg><polygon stroke="blue" points="10 10 20 10 20 20 10 20" /></svg>

3.9.10 Transforming Arrays to HTML Tables

The data in a JSON array is frequently displayed using HTML tables. The following query shows how to transform from the former to the latter.

3.9.10.1 Input Data

The following Object contains the labels desired for columns and rows, as well as the data for the table.

{
  "col labels" : ["singular", "plural"],
  "row labels" : ["1p", "2p", "3p"],
  "data" :
     [
        ["spinne", "spinnen"],
        ["spinnst", "spinnt"],
        ["spinnt", "spinnen"]
     ]
}
3.9.10.2 Solution in XQuery 3.1

The following query creates an HTML table, using the column headings and row labels as well as the data in the Object shown above.

<html>
  <body>

    <table>
      <tr> (: Column headings :)
         {
            <th> </th>,
            for $th in json-doc("table.json")("col labels")?*
            return <th>{ $th }</th>
         }
      </tr>
      {  (: Data for each row :)
         for $r at $i in json-doc("table.json")("data")?*
         return
            <tr>
             {
               <th>{ json-doc("table.json")("row labels")[$i]) }</th>,
               for $c in $r?*
               return <td>{ $c }</td>
             }
            </tr>
      }
    </table>

  </body>
</html>    

3.9.11 Windowing Queries

XQuery provides support for both sliding windows and tumbling windows, frequently used to analyze event streams or other sequential data. This simple windowing example converts a sequence of items to a table with three columns (using as many rows as necessary), and assigns a row number to each row.

3.9.11.1 Input Data
[
  { "color" : "Green" },
  { "color" : "Pink" },
  { "color" : "Lilac" },
  { "color" : "Turquoise" },
  { "color" : "Peach" },
  { "color" : "Opal" },
  { "color" : "Champagne" }
}
	  
3.9.11.2 Result

Result:

<table>
  <tr>
    <td>Green</td>
    <td>Pink</td>
    <td>Lilac</td>
  </tr>
  <tr>
    <td>Turquoise</td>
    <td>Peach</td>
    <td>Opal</td>
  </tr>
  <tr>
    <td>Champagne</td>
  </tr>
</table>
	  
3.9.11.3 Solution in XQuery 3.1

Query:

<table>{
  for tumbling window $w in json-doc("colors.json")?*
    start at $x when fn:true()
    end at $y when $y - $x = 2
  return
    <tr>{
      for $i in $w
      return
        <td>{ $i }</td>
    }</tr>
}</table>
	  

3.9.12 JSON views in middleware

This example assumes a middleware system that presents relational tables as JSON arrays. The following two tables are used as sample data.

3.9.12.1 Input Data
Users
userid firstname lastname
W0342 Walter Denisovich
M0535 Mick Goulish

The JSON representation this particular implementation provides for the above table looks like this:

[
  { "userid" : "W0342", "firstname" : "Walter", "lastname" : "Denisovich" },
  { "userid" : "M0535", "firstname" : "Mick", "lastname" : "Goulish" }
]       
Holdings
userid ticker shares
W0342 DIS 153212312
M0535 DIS 10
M0535 AIG 23412

The JSON representation this particular implementation provides for the above table looks like this:

[
  { "userid" : "W0342", "ticker" : "DIS", "shares" : 153212312 },
  { "userid" : "M0535", "ticker" : "DIS", "shares" : 10 },
  { "userid" : "M0535", "ticker" : "AIG", "shares" : 23412 }
]       
3.9.12.2 Solution in XQuery 3.1

The following query uses the fictitious vendor's vendor:table() function to retrieve the values from a table, and creates an Object for each user, with a list of the user's holdings in the value of that Object.

array {
  for $u in vendor:table("Users")
  order by $u("userid")
  return map {
    "userid" : $u("userid"),
    "first" :  $u("firstname"),
    "last" :   $u("lastname"),
    "holdings" : array {
         for $h in vendor:table("Holdings")
         where $h("userid") = $u("userid")
         order by $h("ticker")
         return {
            "ticker" : $u("ticker"),
            "share" : $u("shares")
         }
    }
  }
}       

3.9.13 In-Place Updates

The XQuery Update Facility allows XML data to be updated. These use cases explore what it means to update JSON in the same way. They are based on use cases for JSONiq's updating functions.

Suppose an application receives an order that contains a credit card number, and needs to put the user on probation.

3.9.13.1 Input Data

Data for an order:

{
  "user" : "Deadbeat Jim",
  "credit card" : VISA 4111 1111 1111 1111,
  "product" : "lottery tickets",
  "quantity" : 243
}
        

collection("users") contains the data for each individual user:

{
  "name" : "Deadbeat Jim",
  "address" : "1 E 161st St, Bronx, NY 10451",
  "risk tolerance" : "high"
}
        
3.9.13.2 Solution in XQuery 3.1 with Updates

The following query adds the pair "status" : "credit card declined" to the user's record.

let $dbj := collection("users")[ .("name") = "Deadbeat Jim" ]
return insert map { "status" : "credit card declined" } into $dbj
        

After the update is finished, the user's record looks like this:

{
  "name" : "Deadbeat Jim",
  "address" : "1 E 161st St, Bronx, NY 10451",
  "status" : "credit card declined",
  "risk tolerance" : "high"
}
        

3.9.14 Data Transformations

Many applications need to modify data before forwarding it to another source. The XQuery Update Facility provides an expression called a tranform expression that can be used to create modified copies. The transform expression uses updating expressions to perform a transformation.

3.9.14.1 Input Data

Suppose an application make videos available using feeds from Youtube. The following data comes from one such feed:

{
    "encoding" : "UTF-8",
    "feed" : {
        "author" : [
            {
                "name" : {
                    "$t" : "YouTube"
                },
                "uri" : {
                    "$t" : "http://www.youtube.com/"
                }
            }
        ],
        "category" : [
            {
                "scheme" : "http://schemas.google.com/g/2005#kind",
                "term" : "http://gdata.youtube.com/schemas/2007#video"
            }
        ],
        "entry" : [
            {
                "app$control" : {
                    "yt$state" : {
                        "$t" : "Syndication of this video was restricted by its owner.",
                        "name" : "restricted",
                        "reasonCode" : "limitedSyndication"
                    }
                },
                "author" : [
                    {
                        "name" : {
                            "$t" : "beyonceVEVO"
                        },
                        "uri" : {
                            "$t" : "http://gdata.youtube.com/feeds/api/users/beyoncevevo"
                        }
                    }
                ]
!!! SNIP !!!         
3.9.14.2 Solution in XQuery 3.1

The following query creates a modified copy of the feed by removing all entries that restrict syndication.

let $feed := json-doc("incoming.json")
return
  if ($entry("app$control")("yt$state")("name") = "restricted")
     map:remove($feed, "entry")
  else
     $feed

A References

RFC 2119
S. Bradner. Key Words for use in RFCs to Indicate Requirement Levels. IETF RFC 2119. See http://www.ietf.org/rfc/rfc2119.txt.
XQuery 3.0
XQuery 3.0: An XML Query Language, Jonathan Robie, Don Chamberlin, Michael Dyck, John Snelson, Editors. World Wide Web Consortium, 08 April 2014. This version is https://www.w3.org/TR/2014/REC-xquery-30-20140408/. The latest version is available at https://www.w3.org/TR/xquery-30/.
XQuery and XPath Data Model 3.1
XQuery and XPath Data Model (XDM) 3.1, Norman Walsh, John Snelson, Andrew Coleman, Editors. World Wide Web Consortium, 13 December 2016. This version is https://www.w3.org/TR/2016/CR-xpath-datamodel-31-20161213/. The latest version is available at https://www.w3.org/TR/xpath-datamodel-31/.
XPath 3.1
XML Path Language (XPath) 3.1, Jonathan Robie, Michael Dyck and Josh Spiegel, Editors. World Wide Web Consortium, 13 December 2016. This version is https://www.w3.org/TR/2016/CR-xpath-31-20161213/. The latest version is available at https://www.w3.org/TR/xpath-31/.
XQuery 3.1
XQuery 3.1: An XML Query Language, Jonathan Robie, Michael Dyck and Josh Spiegel, Editors. World Wide Web Consortium, 13 December 2016. This version is https://www.w3.org/TR/2016/CR-xquery-31-20161213/. The latest version is available at https://www.w3.org/TR/xquery-31/.
XSLT 3.0
XSL Transformations (XSLT) Version 3.0, Michael Kay, Editor. World Wide Web Consortium, 19 November 2015. This version is https://www.w3.org/TR/2015/CR-xslt-30-20151119/. The latest version is available at https://www.w3.org/TR/xslt-30/.
XQuery Update Facility 3.0
XQuery Update Facility 3.0, John Snelson, Editor. World Wide Web Consortium, 19 February 2015. This version is https://www.w3.org/TR/2015/WD-xquery-update-30-20150219/. The latest version is available at https://www.w3.org/TR/xquery-update-30/.
XQuery and XPath Full Text 3.0
XQuery and XPath Full Text 3.0, Mary Holstege, Jim Melton, Editors. World Wide Web Consortium, 24 November 2015. This version is https://www.w3.org/TR/2015/REC-xpath-full-text-30-20151124/. The latest version is available at https://www.w3.org/TR/xpath-full-text-30/.
JSONiq
Jonathan Robie, Matthias Brantner, Daniela Florescu, Ghislain Fourny, Till Westmann. JSONiq: XQuery for JSON, JSON for XQuery. See http://www.jsoniq.org/docs/JSONiqExtensionToXQuery/pdf/Language_Specification-0.4.42-JSONiq-en-US.pdf.
JSONiq Use Cases
Jonathan Robie, Matthias Brantner, Daniela Florescu, Ghislain Fourny, Till Westmann. JSONiq Use Cases. See http://www.jsoniq.org/docs/JSONiq-usecases/html-single/.

End Notes

[1]

This example is based on an example on Stefan Goessner's JSONT site (http://goessner.net/articles/jsont/).