This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 11821 - [FT] Thesaurus option FTRange
Summary: [FT] Thesaurus option FTRange
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Full Text 1.0 (show other bugs)
Version: Proposed Recommendation
Hardware: PC Windows NT
: P2 normal
Target Milestone: ---
Assignee: Jim Melton
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-01-20 10:33 UTC by Tim Mills
Modified: 2011-04-04 12:13 UTC (History)
5 users (show)

See Also:


Attachments

Description Tim Mills 2011-01-20 10:33:38 UTC
I apologize for bringing this up rather late in the day.

The rules:

[172]    	FTThesaurusID 	   ::=    	"at" URILiteral ("relationship" StringLiteral)? (FTRange "levels")?

[157]    	FTRange 	   ::=    	("exactly" AdditiveExpr)
| ("at" "least" AdditiveExpr)
| ("at" "most" AdditiveExpr)
| ("from" AdditiveExpr "to" AdditiveExpr)

permit queries such as:

doc("http://bstore1.example.com/full-text.xml")
/books/book[./content contains text "people" using
thesaurus at "http://bstore1.example.com/UsabilityThesaurus.xml"
relationship "NT" at most ./content/@levels levels]

i.e. that the number of levels can be dynamic.

All other match options are static - either string literals or URI literals.

Why is this part of one match option dynamic?

The static nature of other match options makes life easier with regard to pre-indexing.
Comment 1 Tim Mills 2011-01-20 10:37:45 UTC
Indeed, the FTThesaurusOption is part of the static context.  It weems unlikely that it should be allowed to depend on the dynamic context.
Comment 2 Michael Dyck 2011-01-21 08:56:29 UTC
[Personal response:]

Hm, yes, that's a problem. E.g., consider the query

    for $i in 1 to 5
    where $x contains text "cat"
        using thesaurus at "http://example.com/th" exactly $i levels
    return $i

Is it (meant to be) allowed, and does it mean what it appears to mean?
If so, the FTThesaurusOption can't be in the static context.
Comment 3 Christian Gruen 2011-02-05 23:30:43 UTC
True! I'd like to see the FTRange value restricted to a simple xs:integer as well. In our implementation, we decided to raise a parser error for dynamic level values.

Hope this helps,
Christian
Comment 4 Paul J. Lucas 2011-02-06 01:37:07 UTC
But what about allowing externally defined integer variables?  I think the simplest course of action is to make the evaluation implementation-defined and define *possible* error codes for implementations that do not support dynamic values.
Comment 5 Christian Gruen 2011-02-06 11:25:06 UTC
This might be a good solution; on the other hand, one could wonder why the URILiteral, and the operands of other match options such as FTStopWords, cannot be dynamic as well. Personally, I would vote for Tim's proposal - making the value static - as it looks like the most consistent solution to me.
Comment 6 Paul J. Lucas 2011-02-07 20:11:41 UTC
OK, I'll bite: why can't the operands of other match options be dynamic as well?

I suggest that in a large number of uses for XQuery, it won't be the the only programming language used in some larger application.  For example, if I have some web service that offers a thesaurus API to the world, the entire application is not written in XQuery, i.e., the web server isn't written in XQuery.

Instead, the web server (written in C or Java) will accept a request with parameters and ultimately call some function written in XQuery.  Said function will nee to be parameterized.  Why then can't (for example) all URILiterals, instead of being literals, be allowed to be dynamic?

declare variable $thesaurus-uri as xs:string external;

(: ... :) $doc contains text $query using thesaurus at $thesaurus-uri

To achieve such functionality now, developers are going to do ugly hack substitution things like:

(: ... :) $doc contains text $query using thesaurus at %%tTHESAURUS_URI%%

then do substitutions on the source code, compile the XQuery on the fly, then execute it.  If what are currently all URILiterals were allowed to be dynamic, such queries could be pre-compiled or "prepared" just like SQL statements can be prepared by using '?'.
Comment 7 Christian Gruen 2011-02-07 20:58:11 UTC
true, at first glance; I have many scenarios in mind as well, in which a more dynamic approach would offer additional possibilities. If I think of the feedback of our numerous existing XQFT users, however, most people would wish that the specs would actually allow *less* features.

Imo, the specification won't benefit from even more flexibilty; if the existing complexity won't satisfy all users, there are enough chances that the language gets extended in future versions. Last but not least, additional requests will delay the finalization of the Recommendation even more (as new test cases might get necessary, etc.), and threaten its wide application.

Just my two cents,
Christian
Comment 8 Paul J. Lucas 2011-02-07 21:20:53 UTC
The goal here (as far as I know) is NOT to do something for the benefit of the specification.  It's to do something for the benefit of the language.

You could say this is just a semantic argument, but I don't consider allowing expressions where previously only string literals were allowed is another "feature": it's just making an existing feature more flexible.

While it's true that more tests might need to be written, such a change is completely backwards compatible: all existing examples and tests would still be valid.

As far as delaying the finalization of the recommendation: do you want it fast or right?
Comment 9 Liam R E Quin 2011-02-07 22:12:40 UTC
+1 to making it static for 1.0 but only if also allowing an implementation to support a dynamic value for compatibility (we can't make incompatible changes to a Proposed Recommendation really).

| As far as delaying the finalization of the recommendation: do you want it fast
| or right?

Paul - a substantive change at this point would send the spec back to Working Draft in W3C Process, so we'd be talking 18 months to 2 years. At this point new features (including improvements to existing features) are for the next release, the current development version, 3.0.

You could usefully make a new issue against 3.0 for making the URILiterals dynamic rather than static - it will actually get addressed sooner that way in practice.
Comment 10 Christian Gruen 2011-02-07 22:20:25 UTC
Well, it's exactly the language we're talking about, and its existing users:
regarding all XQFT use cases I've come across so far, I see no practical
benefit in making the existing static context properties dynamic.

Personally, I agree with Tim's proposal. I'll stop arguing, but I'll be glad to
hear about the opinions of other readers.

Christian
Comment 11 Paul J. Lucas 2011-02-08 08:34:21 UTC
The XQuery use-cases aren't real-world use-cases by which I mean they assume that XQuery is the entire universe and don't consider that XQuery is merely a part of a solution.

Other than delaying the specification or speculating that some users might think it's too complicated, does anybody have any hard technical reasons why what are now URILiterals could not (not "should not") be dynamic strings?

However, if it has to wait for 3.0, fine.
Comment 12 Tim Mills 2011-02-08 08:45:54 UTC
Suppose an implementation of XQFT supports pre-indexing of documents, and has available at compile time a set of statically known full text indices.

By limiting match options to be part of the static context, the processor is able to determine statically at compile time whether an index may be used to accelerate the query when producing its execution plan.

Allowing dynamic values would mean that index selection would have to be delayed until the query is executed.
Comment 13 Pat Case 2011-02-08 14:19:54 UTC
I support restricting the FTRange value to a simple xs:integer. None of the users I represent need or want dynamic match options.
Comment 14 Paul J. Lucas 2011-02-08 15:24:28 UTC
If the static options are indeed constants, then of course an implementation is free to optimize using that knowledge.  This is no different than any other compiler for any other language.  However, I don't see how optimization is a compelling argument against allowing dynamic values.  Again, this is no different that SQL allowing the use of '?' in a prepared statement:

SELECT * FROM EMPLOYEE WHERE ID = ?
Comment 15 Tim Mills 2011-02-15 13:15:45 UTC
Note that FTThesaurusOption can be used via the query prolog.

[6]    	Prolog 	   ::=    	((DefaultNamespaceDecl | Setter | NamespaceDecl | Import | FTOptionDecl) Separator)* ((VarDecl | FunctionDecl | OptionDecl) Separator)*

[24]    	FTOptionDecl 	   ::=    	"declare" "ft-option" FTMatchOptions


I'm not sure that it is particularly desirable to permit any of the following queries:

EXAMPLE 1

declare ft-option using
thesaurus at "http://bstore1.example.com/UsabilityThesaurus.xml"
relationship "NT" at most $x levels;

declare variable $x := 1;

...

EXAMPLE 2

declare ft-option using
thesaurus at "http://bstore1.example.com/UsabilityThesaurus.xml"
relationship "NT" at most $x levels;

declare variable $x := (doc('foo.xml')//p[. contains text 'foo'])[1]/@number;

...


EXAMPLE 3

declare ft-option using
thesaurus at "http://bstore1.example.com/UsabilityThesaurus.xml"
relationship "NT" at most string-length(default-collation()) levels;

declare default collation
         "http://example.org/languages/Icelandic";

...

EXAMPLE 4

declare ft-option using
thesaurus at "http://bstore1.example.com/UsabilityThesaurus.xml"
relationship "NT" at most fn:number(<foo:bar>1</foo:bar>) levels;

declare namespace foo = "http://www.exaxmple.com/";

...
Comment 16 Michael Dyck 2011-03-01 19:32:29 UTC
At the joint meeting on 2011-02-22, the WGs decided to resolve this issue by altering the grammar to only allow integer literals when specifying the number of levels. (Specifically, in FTThesaurusID, FTRange is replaced by new symbol FTLiteralRange, whose definition is a copy of FTRange's with the AdditiveExprs changed to IntegerLiterals. You can see the EBNF in Bug 12036 comment #1.)

For compatibility with earlier versions of the spec, we added a Note to the effect that implementations may allow more general syntax.

Therefore, I am marking this issue resolved-fixed. Tim, if you agree with this resolution, please mark it closed.

(Note that the Test Suite has not yet been updated with respect to this change. That should happen soon.)