This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 3931 - [FT] Section 3.2.5: order of options
Summary: [FT] Section 3.2.5: order of options
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Full Text 1.0 (show other bugs)
Version: Working drafts
Hardware: PC Windows XP
: P2 normal
Target Milestone: ---
Assignee: Jim Melton
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-11-01 00:34 UTC by Michael Rys
Modified: 2007-04-19 15:12 UTC (History)
0 users

See Also:


Attachments

Description Michael Rys 2006-11-01 00:34:26 UTC
/book[@number="1"]//p ftcontains "propagation of errors"
with stemming with stop words ("a", "the", "of")

Does stemming apply to stop words list too or do you first get stop words applied is that query different from:

/book[@number="1"]//p ftcontains "propagation of errors"
with stop words ("a", "the", "of") with stemming?
Comment 1 Pat Case 2007-01-31 18:48:13 UTC
In 3.2 FTMatchOptions, it says that 
FTMatchOptions are applied in the order in which they are written in the query.
So yes the queries are different. Stemming and stop words are applied in the order written.

You stem words in the query. You replace stop words with any word. So the user can stem the words in the query before or after the stop words are replaced.

So if we have
"to be or not to be" with stemming with stop words ("be")
--You would stem all the words in query including "be", then replace "be". You 
might end up with "being" in your query, but it doesn't matter because you will 
find any word in the position of the "be" which includes "being" anyway.

"to be or not to be" with stop words ("not") with stemming 
--You would replace "be", then stem all the words left in query. You will get the results you get above.

Either way it would be spitting in the wind to say "with stop word x" and to search for X in your query, but you will get the same results, possibly nasty, results. And it appears that only when they are the same that there might have been a difference in the results.

In 4.2.3 The evaluate function, we say:
Ordering among match options is necessary because match options are not always 
commutative. For example, synonym(stem(word)) is not always the same as stem(synonym(word)). Naturally, match options may be reordered when they commute, 
but this is an optimization issue and is beyond the scope of this document.

So I don't think there is any need to protect users and we have said that implementations can vary the order to optimize, so I think the spec is clear here, the results are satisfactory, and there is no need for change. Might I mark this won't fix?
Comment 2 Jim Melton 2007-02-18 23:09:01 UTC
Changes were adopted by the TF at its recent F2F that respond to your questions.  Specifically, it was agreed that the various FTMatchOptions would be applied in a specific sequence, as you agreed during the meeting.  This bug remains open pending review of the actual changes to the spec. 
Comment 3 Jim Melton 2007-04-19 15:12:37 UTC
The changes agreed for this bug have been reviewed by the TF, which approves the wording.  Therefore, this bug is marked FIXED.  Since you were present when the review took place we assume that you are satisfied with this resolution to your comment and have also marked it CLOSED.