The Tree Structure of XML Queries

Jonathan Robie, Software AG
Last modified: 15 October, 1999
First draft: 13 September, 1999

The structure of XML is fundamentally tree oriented. This document explores relationships found in the tree structures of XML, and derives XML Query Language requirements from this structure. We will see that supporting the tree structure of XML makes a query language very expressive, capable of combining hierarchy, sequence, position, and text in powerful ways. A query language for XML should be able to take advantage of this tree structure to express conditions that combine hierarchy and sequence, and it should be possible to preserve the containment and order found in the original document, even when they are not explicitly expressed in the query.

An XML document can be modeled as an ordered, labeled tree, with a document node serving as the root node. Without the document node, an XML document may be modeled as an ordered, labeled forest, containing only one root element, but also containing the XML declaration, the doctype declaration, and perhaps comments or processing instructions at the root level. Either forest-based or tree-based models for XML documents may be used consistently. In many contexts, it is useful to be able to apply a query to a set or list of document nodes, so a forest-based model may be more appropriate. Links allow more general graph relationships to be expressed among the subtrees found in documents. The relationships expressed with links are vital to the structure of XML documents, and queries that make use of links need to be supported.

The underlying tree structure for an XML document corresponds to a context free grammar; specifically, the context free grammar specified by the DTD or schema. If there is no DTD or schema, one may be constructed to describe any document. If an XML document is a tree structure corresponding to a grammar, it comes as no surprise that the relationships found in an XML document are similar to those encountered in transformational linguistics. Radford lists the following fundamental relationships that may be found among nodes in a parse tree:

The first two relationships involve hierarchy, the second two involve sequence. Hierarchy and sequence form two axes which may be used to locate any node in a document.

Hierarchy

The parent/child relationship and the ancestor/descendant relationship are both important in XML documents. It is important to be able to distinguish them. For instance, consider the following data:

<Section><Title>Basic Relationships</Title>
    <Para>Here are the basic tree relationships of transformational linguistics, 
  as described by Radford.</Para>
    <Section><Title>Dominates</Title> 
      <Para>Node A dominates node B if node B is found in the subtree of which 
      node A is the root.</Para>
    </Section>
</Section>

A query for the Titles that are children of Section elements at the root level returns only one Title, the one belonging to the outer-level Section. In particular, it must not return Title elements that belong to Section elements nested within the outermost Section.

The ability to query for all descendants is also important. Consider a query to find all the tables in a chapter. In the data shown below, this is equivalent to a query for all Table elements in the subtree dominated by Chapter; that is, all Table elements that are descendants of Chapter:

<Chapter><Title>Chapter One</Title>
    <Table><Title>Table One</Title>...</Table>
    <Section><Title>Section One</Title> 
      <Table><Title>Table Two</Title>...</Table>
    </Section>
</Section>

It is often important to maintain containment relationships (and sequence) in results even when they are not expressed explicitly in the query. Suppose we wanted to create a table of contents for the above data. A query might be used to find all the Chapter and Section elements. Since a table of contents generally reflects both the hierarchy and the sequence of items, Section elements should be found in the correct Chapter, and in the order found in the original document. If Section elements nest, the query can not know ahead of time how many levels will be found in the Section hierarchy, so the number of levels can not be expressed in the query itself. It must be possible to preserve relative hierarchy found in the document even when there is no prior knowledge of this hierarchy.

An XML query language needs to be able to express both parent/child and ancestor/descendant relationships. It should be able to maintain relative hierarchy when returning results. A set of representative queries involving hierarchy may be found in the Appendix.

Sequence

The sequence in which data occurs can often be meaningful, especially in document data or semistructured data that embeds structured data within a narrative. Consider the following data:

<Directions> 
    <Title> Washing Hair </Title>
    <Sequence>
        <Step> Wet Hair </Step>
        <Step> Lather </Step>
        <Step> Rinse </Step>
        <Step> Repeat </Step>
    </Sequence>
</Directions>

An XML query language should be able to use "before" and "after" conditions. For instance, in this data a query language should be able to list the Step elements that occur before the Step with the value "Lather" or list the Step elements that occur after the Step with the value "Lather".

One reason that sequence is important is that written language is sequential, and semi-structured data may combine data with natural language descriptions, as in the following excerpt from an example used in the HL7 Healthcare Initiative (supplied to me by Tom Lincoln):

<section><section.title>Procedure</section.title> 
  The patient was taken to the operating room where she was placed in supine position and 
  <Anesthesia>induced under general anesthesia.</Anesthesia> 
  <Prep>
  <action>Foley catheter was placed to decompress the bladder</action> 
  and the abdomen was then prepped and draped in sterile fashion. 
  </Prep>
  <paragraph /> 
  <Incision>
  A curvilinear incision was made 
  <Geography>in the midline immediately infraumbilical</Geography> 
  and the subcutaneous tissue was divided 
  <Instrument>using electrocautery.</Instrument> 
  </Incision>
  The fascia was identified and 
  <action>#2 0 Maxon stay sutures were placed on each side of the midline.</action> 
  <Incision>
  The fascia was divided using 
  <Instrument>electrocautery</Instrument> 
  and the peritoneum was entered. 
  </Incision>
  <Observation>The small bowel was identified</Observation> 
  and 
  <action>
  the 
  <Instrument>Hasson trocar</Instrument> 
  was placed under direct visualization. 
  </action>
  <action>
  The 
  <Instrument>trocar</Instrument> 
  was secured to the fascia using the stay sutures 
  </action>

A query that returns the action elements for this surgical procedure needs to return them in the correct sequence, since sequence is part of the meaning of the document. For instance, in the above data, the meaning of an Observation or action element is dependent on the Incision that precedes it, since what may be observed or done depends on the incisions that have been made. It may be relevant to ask what Incision was performed just prior to an action, and what Observations where made between the Incision and the action. These queries involve the "precedes" or "follows" relationship. It may also be useful to ask what text or element occurs immediately after the incision. This requires the "immediately precedes" relationship. As Beech, Malhotra, and Rys point out in "A Formal Data Model and Algebra for XML", it is important to be able to ignore intervening whitespace for the purpose of "immediately precedes", since the answer to this question would generally be a region of whitespace according to the XML Infoset, which contains information that should be disregarded when evaluating query conditions. It would also be reasonable to consider allowing full regular expressions to express conditions on sequences -- in fact, it may be more important and more natural to use regular expressions to express conditions on sequences than on paths.

Note that the above markup includes elements nested within other elements, and the text flow can be directly read by starting at the beginning and ignoring the tree structures defined in this document. The XML and SGML communities use the term "document order" to define a pre-order traversal of the document tree. Document order corresponds to the order in which a document's text is rendered when no transformations are performed, e.g. when rendering XML documents with CSS. Since individual subtrees of a document may reside in various data stores, and documents may contain multiple versions, the design of a query system that can impose this order on an entire document may not always be straightforward, but it is an important part of the semantics of a document.

Another reason for supporting sequence is that XML schemas are context free grammars, and the productions in these grammars involve sequence. If a schema is to be modified, it is useful to be able to search existing documents for instances that will no longer be valid when a given production is changed. The ability to use full regular expressions would also be helpful for this scenario.

An XML query language should be able to express conditions on the relative order of nodes (but not for attributes, which do not have a defined order). Full regular expressions for sequences are worth considering. An XML query language should also be able to return results in document order.

Position

Absolute position can also be useful in queries. For instance, consider the set of directions from earlier in this document:

<Directions> 
    <Title> Washing Hair </Title>
    <Sequence>
        <Step> Wet Hair </Step>
        <Step> Lather </Step>
        <Step> Rinse </Step>
        <Step> Repeat </Step>
    </Sequence>
</Directions>

It can be useful to be able to ask for the first Step, the last Step, etc.

Another application in which position is useful is parsing HTML documents or other documents that contain tables where there is no semantic markup. For instance, consider the following table, which reports scores for some fictitious sport:

Western League
Aardvarks 12 Weasels 10
Mosquitos 17 Slugs 2
Southern League
Tortoises 25 Hares 0
Platypii 17 Amoebae 16

Here is the HTML source for the above table:

<TABLE WIDTH="80%" BORDER="1">
   <TBODY> 
      <TR> 
         <TD COLSPAN="2"><B>Western League</B> </TD> 
      </TR> 
      <TR> 
         <TD COLSPAN="1">Aardvarks 12</TD> 
         <TD>Weasels 10</TD> 
      </TR> 
      <TR> 
         <TD COLSPAN="1">Mosquitos 17</TD> 
         <TD>Slugs 2</TD> 
      </TR> 
      <TR> 
         <TD COLSPAN="2"><B>Southern League</B></TD> 
      </TR> 
      <TR> 
         <TD COLSPAN="1">Tortoises 25</TD> 
         <TD>Hares 0</TD> 
      </TR> 
      <TR> 
         <TD COLSPAN="1">Platypii 17</TD> 
         <TD>Amoebae 16</TD> 
      </TR> 
   </TBODY> 
</TABLE>  

The semantics of this table are embedded in the position of the information. A row that contains only one column is a heading. A row that contains two columns has the scores for a game, with the winner in the first column, and the loser in the second column. Queries can take advantage of this fact to search for winners and losers in the table.

Position can also be a simple way to perform cardinality tests. For instance, in the HL7 data shown in the previous section, one way to search for Procedure elements containing more than one Incision is to seek the second Incision in a Procedure.

An XQL query language should be able to express position.

Top Level Nested Structures

When dealing with nested structures, such as tables that can nest within tables, it may be important to be able to find the top level structure, no matter where it occurs within other markup. For instance, suppose each of the tables in the following excerpt also contained tables:

<Chapter><Title>Chapter One</Title>
    <Table><Title>Table One</Title>...</Table>
    <Section><Title>Section One</Title> 
      <Table><Title>Table Two</Title>...</Table>
    </Section>
</Section>

In some queries, we may well want to return all tables, including the tables nested within others. In other queries, however, we may want only the top level structure.

An XML query language should allow either to be specified.

Combining Hierarchy, Sequence, Position, and Text

By combining conditions on hierarchy, sequence, and position, we can be very precise about complex relationships in the tree. For instance, consider the medical data given above. Here are some queries that combine hierarchy, sequence, and position:

These queries require that conditions based on sequence and position may be applied to the descendants of a node. For instance, Instrument is not a direct child of Procedure, so the second query in the list refers to the second Instrument in the descendant list of Procedure.

Many XML documents are very text-centric, and many systems that store XML contain full-text engines. In markup that intermixes text and structure, it can be extremely helpful to be able to mix full text conditions with structured conditions to express a sequence of text patterns and structures.

An XML query language should allow flexible combination of conditions based on hierarchy, sequence, and position, and should allow conditions on text to be expressed in sequences.

Links and Trees

Links allow more general graph relationships to be expressed among the subtrees found in documents. Some query languages for XML do not distinguish between a "refers-to" relationship and a parent/child relationship. This seems inconsistent with the way XML documents are generally used and understood. For instance, suppose the medical procedure shown above included links to other operations performed on the same patient. A query that searches for Incision elements within the procedure should not return Incision elements from other operations to which the procedure refers. To me, this is a very important distinction.

The semantic distinction between containment and reference is also reflected in the way that a document is managed. The nodes contained within a document are part of that document, and will be deleted if that document is deleted. Nodes contained within documents to which a given document refers are not part of that document, and will not be deleted if that document is deleted. Similarly, deleting a node deletes the subtree associated with it, but not nodes to which it refers.

Both XPath and XQL use an explicit function to dereference links. One advantage of this approach, though not exploited in these languages, is that it allows parameters to be supplied to the function to specify how the links should be dereferenced. For instance, parameters could specify what to do if a link can not be dereferenced, that a link should be dereferenced recursively to a particular depth, that only links within the document should be dereferenced, etc. I feel that this kind of flexibility and power is needed to be able to handle links well in a variety of environments.

In an XML query language, there must be a way to use the relationships expressed by links as part of a query. Reference should be distinguished from containment. Explicit functions are one useful way to dereference links.

Joins in Trees

I believe the need for joins in an XML query language has been well demonstrated. Joins can be integrated into a tree view of a document in a variety of ways. One way is shown in "A Formal Data Model and Algebra for XML" by Beech, Malhotra, and Rys. In their model, references introduce virtual, directional edges from nodes in one node to edges in one collection to nodes in another collection. This approach is consistent with a tree structured approach to XML queries, and also similar to the approach taken by XQL. In XQL, however, a join does not establish a reference, it adds nodes to a virtual tree, specifying where the joined nodes should be logically merged into the left-hand tree.

An XML query language must have some form of join that is consistent with a tree structured approach to XML queries.

Flattening

In XML Query Languages: Experiences and Exemplars, flattening is used to retrieve one result for every title/author pair in a document. This is an extremely powerful and useful transformation, and since the same information is frequently found repeated in data-oriented XML documents, it is important to support this in an XML query language. However, if a book has multiple authors, one might want to be able to create an identity for a title and the set of authors who wrote that book. Skolem functions based on a straight tuple of single values can not distinguish four books with the same title written by different authors from one book with four authors.

An XML query language should have a means of flattening nodes that share values.

Returning and Constructing Results

A query model for XML must address both the manner in which nodes are selected from a one or more documents and how results are constructed from the selected nodes. A variety of approaches are possible. One approach is abstraction from the original documents, maintaining the hierarchy and order of the original. Another approach is to construct results from variables bound during selection. This approach is very flexible and powerful, but means that hierarchy and sequence must be reconstructed as part of the query instead of using the hierarchy and sequence found in the original document. These approaches are not mutually exclusive; it is possible to abstract from the original document while providing a means for sorting, reordering, flattening, and the other transformations. Similarly, if the default is to explicitly construct results, means for using the original hierarchy and sequence could be added.

In many environments, some means must also be provided for maintaining the identity of the nodes returned by a query. For instance, in an Internet search engine, it is important to be able to provide hyperlinks into the original document, and in a programming API like the DOM, a query should allow the original document nodes to be returned.

An XML query language must provide some means for returning nodes while maintaining the original hierarchy and sequence, but should also allow data oriented transformations. It must also provide a means for maintaining the identity of the nodes returned by a query.

Appendix A: Structure Preserving Queries

This appendix uses XQL to illustrate structure preserving queries that should be possible in an XML query language. Operations that transform structure or add new nodes will be explored in a future appendix.

Hierarchy

This queries in this section use the following data:

<!DOCTYPE Book PUBLIC "-//Foo//DTD Book 4.0//EN" "book.dtd">
<Section><Title>Basic Relationships</Title>
    <Para>Here are the basic tree relationships of transformational linguistics, 
  as described by Radford.</Para>
    <Table><Title>Table One</Title>...</Table>
    <Section><Title>Dominates</Title> 
      <Para>Node A dominates node B if node B is found in the subtree of which 
      node A is the root.</Para>    
            <Table><Title>Table Two</Title>...</Table>
             <Image><Caption>Image One<Caption>...</Image>
    </Section>
</Section>

In XQL, a query returns the same results regardless of the environment, but the manner in which returns are realized is dependent on the environment. In a DOM environment, a node can be returned directly as a DOM Node. In a search engine, a link to the Node could be returned. In a text-oriented environment, the results are wrapped in an <xql:result /> element, and all descendants of the node are included in the output, as shown below:

<xql:result>
<Section><Title>Basic Relationships</Title>
    <Para>Here are the basic tree relationships of transformational linguistics, 
  as described by Radford.</Para>
    <Table><Title>Table One</Title>...</Table>
    <Section><Title>Dominates</Title> 
      <Para>Node A dominates node B if node B is found in the subtree of which 
      node A is the root.</Para>    
            <Table><Title>Table Two</Title>...</Table>
             <Image><Caption>Image One<Caption>...</Image>    </Section>
</Section>
</xql:result>

Note that the start and end tags of the top-level Section are bold to show that the Section node is the node actually returned by the query. In a DOM environment, this Section node would be returned as a single DOM Node, and in a hypertext environment the result of the query would be materialized as a link to this Section node. In our examples, we will show the serialized text representation, including the text belonging to the children of returned nodes, text which is added by the serialization process. The start and end tags of the nodes actually returned by the query are shown in bold.

Child of the Root Node

The / operator, when it occurs on the left, matches the root node, which corresponds roughly to the document itself. A well-formed XML document always has one element at the root level, the document element, which is a child of the root node. In addition, there may be doctype declarations, processing instructions, and comments at the root level of a document.

Query:
/
 Result:
<xql:result>
<!DOCTYPE Book PUBLIC "-//Foo//DTD Book 4.0//EN" "book.dtd">
<Section><Title>Basic Relationships</Title>
    <Para>Here are the basic tree relationships of transformational linguistics, 
  as described by Radford.</Para>
    <Table><Title>Table One</Title>...</Table>
    <Section><Title>Dominates</Title> 
      <Para>Node A dominates node B if node B is found in the subtree of which 
      node A is the root.</Para>    
            <Table><Title>Table Two</Title>...</Table>
             <Image><Caption>Image One<Caption>...</Image>
    </Section>
</Section>
</xql:result>

A wildcard matches all elements. When applied at the root level, it matches the root element:

Query:
/*
 Result:
<xql:result>
<Section><Title>Basic Relationships</Title>
    <Para>Here are the basic tree relationships of transformational linguistics, 
  as described by Radford.</Para>
    <Table><Title>Table One</Title>...</Table>
    <Section><Title>Dominates</Title> 
      <Para>Node A dominates node B if node B is found in the subtree of which 
      node A is the root.</Para>    
            <Table><Title>Table Two</Title>...</Table>
             <Image><Caption>Image One<Caption>...</Image>
    </Section>
</Section>
</xql:result>

If the above query is applied to one document, it will always return precisely one element, the document element. If it is applied to more than one document, it returns the document element for each document. If it is applied to a set of nodes, it returns the root element of the document to which each input node belongs.

Parent/Child:

XQL distinguishes the Parent/Child relationship from the Ancestor/Descendant relationship. The following query finds the Title of the top level Section; other Title elements contained within the Section are not returned:

Query:
/Section/Title
 Result:
<xql:result>
<Title>Basic Relationships</Title>
</xql:result>

Child/Parent:

To find the parent of a child node that meets certain criteria, XQL uses filters. The conditions for the child node are placed within a filter that applies to the parent.

Query:
/Section[Title="Basic Relationships"]
 Result:
<xql:result>
<Section><Title>Basic Relationships</Title>
    <Para>Here are the basic tree relationships of transformational linguistics, 
  as described by Radford.</Para>
    <Table><Title>Table One</Title>...</Table>
    <Section><Title>Dominates</Title> 
      <Para>Node A dominates node B if node B is found in the subtree of which 
      node A is the root.</Para>    
            <Table><Title>Table Two</Title>...</Table>
             <Image><Caption>Image One<Caption>...</Image>
    </Section>
</Section>
</xql:result>

Descendants of the Root

In XQL and XPath, the // operator performs a recursive descent of the document tree. If it occurs on the left-hand side of a path expression, it will descend through the entire document. The following query finds all the elements in the document:

Query:
//*
 Result:
<xql:result>
<Section><Title>Basic Relationships</Title>
    <Para>Here are the basic tree relationships of transformational linguistics, 
  as described by Radford.</Para>
    <Table><Title>Table One</Title>...</Table>;
    <Section><Title>Dominates</Title> 
      <Para>Node A dominates node B if node B is found in the subtree of which 
      node A is the root.</Para>    
            <Table><Title>Table Two</Title>...</Table>
             <Image><Caption>Image One<Caption>...</Image>
    </Section>
</Section>
</xql:result>

Ancestor/Descendant:

The // operator can also be used perform a recursive descent on each node in a set of nodes. For instance, the query "/Section" returns a set of one node, the Section node found at the root. The query "Section//Section" finds all Section nodes contained within it.

Query:
/Section//Section
 Result:
<xql:result>
<Section><Title>Dominates</Title> 
      <Para>Node A dominates node B if node B is found in the subtree of which 
      node A is the root.</Para>    
            <Table><Title>Table Two</Title>...</Table>
             <Image><Caption>Image One<Caption>...</Image>
</Section>
</xql:result>

Descendant/Ancestor:

Conditions for a descendant may be placed in a filter to return ancestors of such a descendant. The following query find all Section elements that contain a Table:

Query:
//Section[.//Table]
 Result:
<xql:result>
<Section><Title>Basic Relationships</Title>
    <Para>Here are the basic tree relationships of transformational linguistics, 
  as described by Radford.</Para>
    <Table><Title>Table One</Title>...</Table>
    <Section><Title>Dominates</Title> 
      <Para>Node A dominates node B if node B is found in the subtree of which 
      node A is the root.</Para>    
            <Table><Title>Table Two</Title>...</Table>
             <Image><Caption>Image One<Caption>...</Image>
    </Section>
</Section>
</xql:result>

Preserving Hierarchy in Results

Suppose a query is done to create a table of contents, and Section elements can nest to any level. A query may ask for all Section elements, and the results must be able to preserve hierarchy (and sequence), even though these are not specified in the query per se:

Query:
//Section
 Result:
<xql:result>
<Section><Title>Basic Relationships</Title>
    <Para>Here are the basic tree relationships of transformational linguistics, 
  as described by Radford.</Para>
    <Table><Title>Table One</Title>...</Table>
    <Section><Title>Dominates</Title>
      <Para>Node A dominates node B if node B is found in the subtree of which 
      node A is the root.</Para>    
            <Table><Title>Table Two</Title>...</Table>
             <Image><Caption>Image One<Caption>...</Image>
    </Section>
</Section>
</xql:result>

Note that only the Section nodes are directly contained in the query results, and that they are properly nested.

Since XQL preserves hierarchy in query results, a query that returns Section elements and their Title children could be written as follows:

Query:
//Section | //Section/Title
 Result:
<xql:result>
<Section><Title>Basic Relationships</Title>
    <Para>Here are the basic tree relationships of transformational linguistics, 
  as described by Radford.</Para>
    <Table><Title>Table One</Title>...</Table>
    <Section><Title>Dominates</Title> 
      <Para>Node A dominates node B if node B is found in the subtree of which 
      node A is the root.</Para>    
            <Table><Title>Table Two</Title>...</Table>
             <Image><Caption>Image One<Caption>...</Image>
    </Section>
</Section>
</xql:result>

Position

XQL provides a subscript operator to allow absolute position to be specified. Consider the following data:

<Directions> 
    <Title> Washing Hair </Title>
    <Sequence>
        <Step> Wet Hair </Step>
        <Step> Lather </Step>
        <Step> Rinse </Step>
        <Step> Repeat </Step>
    </Sequence>
</Directions>

In XQL, the first Step is Step[1]:

Query:
/Directions/Sequence/Step[1]
 Result:
<xql:result>
        <Step> Wet Hair </Step>
</xql:result>

Now consider an HTML table that contains scores from a sport:

<TABLE WIDTH="80%" BORDER="1">
   <TBODY> 
      <TR> 
         <TD COLSPAN="2"><B>Western League</B> </TD> 
      </TR> 
      <TR> 
         <TD COLSPAN="1">Aardvarks 12</TD> 
         <TD>Weasels 10</TD> 
      </TR> 
      <TR> 
         <TD COLSPAN="1">Mosquitos 17</TD> 
         <TD>Slugs 2</TD> 
      </TR> 
      <TR> 
         <TD COLSPAN="2"><B>Southern League</B></TD> 
      </TR> 
      <TR> 
         <TD COLSPAN="1">Tortoises 25</TD> 
         <TD>Hares 0</TD> 
      </TR> 
      <TR> 
         <TD COLSPAN="1">Platypii 17</TD> 
         <TD>Amoebae 16</TD> 
      </TR> 
   </TBODY> 
</TABLE>  

To do a search for scores in which the Aardvarks beat the Weasels, we can look for rows in which Aardvarks are in the first column and Weasels are in the second column:

Query:
//TR [TD[1] contains "Aardvarks" and TD[2] contains "Weasels"]
 Result:
<xql:result>
      <TR> 
         <TD COLSPAN="1">Aardvarks 12</TD> 
         <TD>Weasels 10</TD> 
      </TR>
</xql:result> 

Sequence

XQL originally provided two operators for sequence, "precedes" and "immediately precedes". The current draft provides only "precedes". One of the main reasons for this involved ambiguity of the "immediately precedes" relationship in the presence of whitespace-only text nodes in DOM implementations.

Consider the following data:

<Directions> 
    <Title> Washing Hair </Title>
    <Sequence>
        <Step> Wet Hair </Step>
        <Step> Lather </Step>
        <Step> Rinse </Step>
        <Step> Repeat </Step>
    </Sequence>
</Directions>

In XQL '99, the "before" and "after" operators are used to indicate sequence. The query "x before y" returns all x nodes which occur before some y node. The following query returns all steps that occur before the step that equals "Lather":

Query:
//(Step before Step="Lather")
 Result:
<xql:result>
        <Step> Wet Hair </Step>
</xql:result> 

Similarly, the query "x after y" returns all x nodes which occur after some y node. The following query returns all steps that occur after the step that equals "Lather":

Query:
//(Step after Step="Lather")
 Result:
<xql:result>
        <Step> Rinse </Step>
        <Step> Repeat </Step>
</xql:result> 

Note that the sequence found in the original document is maintained in these results. It is important to be able to preserve document order. If the sequence of the above Steps were reversed, it would change the meaning of the results.

Combining Hierarchy, Sequence, and Position

The following queries illustrate the ability of XQL to combine hierarchy, sequence, and position, maintaining the relative hierarchy and sequence in the results. Actually, our sections on sequence and position have already illustrated this to some extent, using hierarchy together with sequence or hierarchy together with position. In this section, we will discuss slightly more sophisticated queries.

The following query finds the same data as the last example in the section on position, but adds the constraints that the Step elements must occur in a Direction whose Title is "Washing Hair":

Query:
//Direction[Title="Washing Hair"]/(Step after Step="Lather")
 Result:
<xql:result>
        <Step> Rinse </Step>
        <Step> Repeat </Step>
</xql:result> 

In the main body of this paper, the section entitled "Combining Hierarchy, Sequence, and Position" contained data for a surgical procedure, together with a series of queries. We will show these queries in XQL now. Here is the data:

<section><section.title>Procedure</section.title> 
  The patient was taken to the operating room where she was placed in supine position and 
  <Anesthesia>induced under general anesthesia.</Anesthesia> 
  <Prep>
  <action>Foley catheter was placed to decompress the bladder</action> 
  and the abdomen was then prepped and draped in sterile fashion. 
  </Prep>
  <paragraph /> 
  <Incision>
  A curvilinear incision was made 
  <Geography>in the midline immediately infraumbilical</Geography> 
  and the subcutaneous tissue was divided 
  <Instrument>using electrocautery.</Instrument> 
  </Incision>
  The fascia was identified and 
  <action>#2 0 Maxon stay sutures were placed on each side of the midline.</action> 
  <Incision>
  The fascia was divided using 
  <Instrument>electrocautery</Instrument> 
  and the peritoneum was entered. 
  </Incision>
  <Observation>The small bowel was identified</Observation> 
  and 
  <action>
  the 
  <Instrument>Hasson trocar</Instrument> 
  was placed under direct visualization. 
  </action>
  <action>
  The 
  <Instrument>trocar</Instrument> 
  was secured to the fascia using the stay sutures 
  </action>

Now let's examine the queries.

What Instrument elements occur as descendants of the second Incision in the Procedure?

Query:
//section[section.title="Procedure"]//Incision[2]/Instrument
 Result:
<xql:result>
<Instrument>electrocautery</Instrument>
</xql:result>

What are the first two Instruments that occur in the procedure?

This illustrates the need to be able to indicate position among things that may occur at different levels of hierarchy.

Query:
(//section[section.title="Procedure"]//Instrument)[1-2]
 Result:
<xql:result>
  <Instrument>using electrocautery.</Instrument>
  <Instrument>electrocautery</Instrument>
</xql:result>

What Instruments were used in the first two Actions after the second Incision?

Query: (//action[1-2] after //Incision[2])//Instrument
 Result:
<xql:result>
  <Instrument>Hasson trocar</Instrument>
  <Instrument>trocar</Instrument>
</xql:result>

Show procedures where no Anaesthesia element occurs before the first Incision.

Query: //section[section.title="Procedure"][not(..//Anaesthesia before ..//Incision)]
 Result:
<xql:result>
   <xql:null />
</xql:result>

What happened between the first Incision and the second Incision?

Query: //section[section.title="Procedure"]//((* after Incision[1]) before Incision[2])
 Result:
<xql:result>
<action>#2 0 Maxon stay sutures were placed on each side of the midline.</action> 
</xql:result>