This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 5975 - FT and FTUC: Attribute tokenization and searching
Summary: FT and FTUC: Attribute tokenization and searching
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Full Text 1.0 (show other bugs)
Version: Candidate Recommendation
Hardware: PC Windows XP
: P2 normal
Target Milestone: ---
Assignee: Michael Dyck
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-08-19 11:58 UTC by Julia Imhof
Modified: 2011-01-08 00:20 UTC (History)
5 users (show)

See Also:


Attachments

Description Julia Imhof 2008-08-19 11:58:07 UTC
Please clarify in the specification whether attribute tokenization and searching is supported or not:

- in the Use Case 4.2.1 (OTHER: Query on Attribute values) in the FTUC document the query context is an attribute
- in the FT Candidate Recommendation in 3.6.3. Distance Selection it is written (after the second example): "The phrase "Improving Web Site Usability" would satisfy the given full-text selection, but it occurs in an attribute value, and so is not subject to tokenization"


If is is supported, please clarify whether it is optional in the Conformance Section.
Comment 1 Michael Dyck 2008-08-19 17:13:21 UTC
[Personal response]

(In reply to comment #0)
> Please clarify in the specification whether attribute tokenization and
> searching is supported or not:

The value of an attribute is tokenized and searched when it is in the
string value of a search content item, which I believe is only the case
when the attribute *is* the search context item. That is, the attribute
node must be an item in the value of the FTContainsExpr's RangeExpr.

> - in the FT Candidate Recommendation in 3.6.3. Distance Selection it is
> written (after the second example): "The phrase "Improving Web Site
> Usability" would satisfy the given full-text selection, but it occurs in
> an attribute value, and so is not subject to tokenization"

That sentence should probably have ended with "in this example". Would that
have sufficed to preempt this issue?

The wording around that example is being changed for Bug 5886, and will
make this point clearer too.
Comment 2 Pat Case 2008-08-19 18:25:33 UTC
Folks,

I agree that the language document requires clarification on this point.

I propose that the FTTF consider adding the following simple text to 2.2.1 (Full-Text Contains Expression) Description:
A full-text contains expression searches element text implicitly across descendant elements. It may also search attribute values, but only explicitly.

I prefer that attribute searching not be optional, but can live with optional attribute searching.

Pat Case, Libary of Congress, FTTF, personal response


Comment 3 Michael Dyck 2008-08-19 19:40:57 UTC
(In reply to comment #2)
> 
> I prefer that attribute searching not be optional, but can live with
> optional attribute searching.

I don't think "optional" is the word you should be using there.  There are
no optional features (see section 5.2) that particularly relate to the
treatment of attribute nodes, and no proposal (that I know of) to change
that.

To put it another way: the features that allow a query to search an
attribute's value are not optional features.
Comment 4 zhen hua liu 2008-09-15 20:59:11 UTC
If attribute content needs to be searched, the attribute node has to be expliclty stated in the search context. 

For example
let $x = <roo><a b="foo bar">text</a></r>
return
$x/a/@b ftcontains "foo"

This
searches the content of attribute @b which should return true

let $x = <roo><a b="foo bar">text</a></r>
return
$x/a ftcontains "foo"

This searches the atomization of element node "a", which is "text", it does
NOT search content of attribute "b", so this returns false.
Comment 5 Mary Holstege 2008-10-30 18:36:18 UTC
The WG considered this comment and agreed to clarify the situation by 
noting, in sections 1.1 (Full-Text Search and XML), 2.2.1 (Full-Text Contains Expression, Description), and 4.1 (Tokenization) that since tokenization applies to the string value of a node, as a consequence the contents of attribute nodes (as well as comment nodes, processing instructions) are generally not part of the tokenization and not included in the search unless the attribute node (or comment node, etc.) is the target of the search directly. A reference to the XQuery 1.0 and XPath 2.0 Data Model, which specifies "string value", has been added.

If you are satisfied with this resolution, please signify that by marking this bug as CLOSED.