17600 – [XQ31ReqUC] Range predicates

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 17600 - [XQ31ReqUC] Range predicates

Summary: [XQ31ReqUC] Range predicates

Status:	NEW

Alias:	None

Product:	XPath / XQuery / XSLT
Classification:	Unclassified
Component:	Requirements for Future Versions (show other bugs)
Version:	Working drafts
Hardware:	PC Linux

Importance:	P2 normal
Target Milestone:	---
Assignee:	Jim Melton
QA Contact:	Mailing list for public feedback on specs from XSL and XML Query WGs

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2012-06-26 11:55 UTC by John Snelson
Modified:	2014-05-20 16:59 UTC (History)
CC List:	2 users (show)

See Also:

Attachments

Description John Snelson 2012-06-26 11:55:23 UTC

Users often need to select a subsequence of a given sequence, for instance, when paginating or sampling a sequence of data. Currently we have the fn:subsequence() function which can do this:

fn:subsequence($seq,1,10)

However after using positional predicates, users are often surprised that the following intuitive syntax does not work:

$seq[1 to 10]

Indeed this is such a useful shorthand that MarkLogic already implements it as an extension to XQuery. I think it would be good to standardize this usage and make it available to other XQuery users.

Comment 1 Michael Kay 2012-06-26 12:20:52 UTC

The proposed syntax was in an early draft of XQuery and was removed because the semantics are so convoluted. To retain orthogonality, we have to allow any sequence of integers to be used as a predicate, and then we get issues about what happens when the integers in this sequence are not monotonically increasing or contain duplicates, and messy problems if the predicate is supplied in a form such as [1 to @max] where the value of @max depends on the context item. It's not hard to write [position() = 1 to 10] and I suggest we leave it that way.

However, there's another form of range predicate that would be very useful, namely selecting all items in a sequence up to (and optionally including) the first one that satisfies some condition: something like

$sequence [[from|after]] condition [[to|before] condition]]

including the possibility of

$sequence from position()=3 to position()=5

but more usefully things like

$sequence before self::h1

or

$sequence after self::h1 before self::h2

Comment 2 John Snelson 2012-06-26 13:18:06 UTC

Really, I don't know how you can call the semantics convoluted. You've given the perfect syntactic expansion yourself:

$seq[1 to 10] === $seq[position() = (1 to 10)]

There isn't a problem with sequences with holes in - it still means the same thing.

Comment 3 Michael Kay 2012-06-26 13:26:14 UTC

>$seq[1 to 10] === $seq[position() = (1 to 10)]

For what class of predicates does this equivalence hold? Any predicate whose value is a sequence of integers? Does it have to be the same sequence of integers for every node in the sequence?

Or does it only apply to an expression written syntactically as a RangeExpr? That would be a horrible breach of orthogonality.

Comment 4 John Snelson 2012-06-26 13:27:43 UTC

> For what class of predicates does this equivalence hold? Any predicate whose
> value is a sequence of integers? Does it have to be the same sequence of
> integers for every node in the sequence?

For any sequence of numeric values.

> Or does it only apply to an expression written syntactically as a RangeExpr?
> That would be a horrible breach of orthogonality.

Agreed.

Comment 5 Michael Kay 2012-06-26 13:43:20 UTC

>For any sequence of numeric values.

Does it have to be a sequence of numeric values for every item in the input?

$seq[if (f(.)) then number(@min) to number(@max) else is-married(.)]

I think the overloading of predicates is bad enough already without introducing this.

Comment 6 John Snelson 2012-06-26 14:50:31 UTC

(In reply to comment #5)
> Does it have to be a sequence of numeric values for every item in the input?

No.

> $seq[if (f(.)) then number(@min) to number(@max) else is-married(.)]
> 
> I think the overloading of predicates is bad enough already without introducing
> this.

It might be slightly more complicated to evaluate, but it's actually extremely useful to users. The full semantics of a predicate expression "SEQ[PRED]" under this proposal could be expressed as such:

for $s at $p in SEQ
where
  typeswitch($s)
  case op:numeric+ return $p = $s
  default return fn:boolean($s)
return $s

This is not a very big departure from the present semantics:

for $s at $p in SEQ
where
  typeswitch($s)
  case op:numeric return $p eq $s
  default return fn:boolean($s)
return $s

Comment 7 Michael Kay 2012-06-26 15:57:44 UTC

>for $s at $p in SEQ
where
  typeswitch($s)
  case op:numeric+ return $p = $s
  default return fn:boolean($s)
return $s

Surely not? It depends on the type of PRED, rather than the type of $s. It must be something like this:

for $s at $p in SEQ
where
  typeswitch($s/(PRED))
  case op:numeric+ return $p = $s/(PRED)
  default return fn:boolean($s/(PRED))
return $s

Comment 8 John Snelson 2012-06-26 16:00:54 UTC

(In reply to comment #7)
> Surely not? It depends on the type of PRED, rather than the type of $s. It must
> be something like this:
> 
> for $s at $p in SEQ
> where
>   typeswitch($s/(PRED))
>   case op:numeric+ return $p = $s/(PRED)
>   default return fn:boolean($s/(PRED))
> return $s

Err, yes - that was a mistake.

Comment 9 Jonathan Robie 2014-05-20 16:59:31 UTC

Assigning to future requirements per Working Group decision (https://lists.w3.org/Archives/Member/w3c-xsl-query/2012Oct/0087.html).