Internationalization Comments on XQuery 1.0 and XPath 2.0 Full-Text 1.0
Version reviewed: http://www.w3.org/TR/2007/WD-xpath-full-text-10-20070518/
Lead reviewer and date of initial review: Felix Sasaki, Jun 2007
These are comments on behalf of the Internationalization Core WG, unless otherwise stated. The "Owner" column indicates who has been assigned the responsibility of tracking discussions on a given comment.
We recommend that responses to the comments in this table use a separate email for each point. This makes it far easier to track threads. Click on the icons in the right-most column to see email discussions.
ID | Location | Subject | Comment | Owner | Ed. / Subs. |
|
---|---|---|---|---|---|---|
1 | 3.3.6 | language matching in XPath/XQuery |
Section 3.3.6 defines a language option and is used to select language-specific behaviors, such as selecting stop word lists. We believe that implementations should be advised to implement one of the matching schemes defined in BCP 47 (in RFC 4647) when selecting content or behavior. That's because the specific language requested may not be available. We suggest the following wording: The language option specified might not exactly match the available language resources. An implementation MAY use language tag matching (such as one of the algorithms defined in [BCP 47, currently "Part II, RFC 4647"]) to determine the best available match. Matching and defaulting behavior are implementation defined. |
AP | S | |
2 | 1.1 | Notion of language |
The second numbered item in the first list contains: There is an expectation that a full-text search will support language-based searches which substring search cannot. We propose to add at the end of that list item the following sentence: Note that language is used as a broad term here and throughout this document. Language
information can encompass information about the users language, region, scripts, variants etc., which all influence the search result. Compare
for example a search for Rationale: We think that readers should be made aware of the information beyond language used in search. |
FS | E | |
3 | 2.2.1, and 3.3.6 | xml:lang vs. language option in a query |
XQuery 1.0 and XPath 2.0 Full-Text allows using language information as an input to a query, to trigger e.g. the choice of a language-specific stop word list: //p ftcontains "salon de the" with default stop words language "fr" Language information can be given in the static context as described in sec. 3.3.6, in a query as described in sec. 2.2.1, and
via We think that the relation between language information given via An XQuery 1.0 and XPath 2.0 Full-Text processor SHOULD try to use the information available in xml:lang for processing of collations, as well as the various match options defined in Section 3.3 Match Options. Questions which arise are: what has higher precedence ( //p ftcontains "salon de the" with default stop words language "fr" and <p>salon de the</p> or <p>... <phrase xml:lang="en">salon de the</phrase> ...</p> In 3.3.6 you write that the relation between |
FS | S | |
4 | General | Need for the term "word"? |
Throughout the document, you use the terms "word" and "token" interchangebly. We propose to drop the term "word", since (as you
note in sec. 1.1) it is language-specific. This proposal includes renaming of expressions like The background of this proposal becomes obvious in sec.
3.2. That section defines the |
FS | S | |
5 | 3.3.6 | xs:language vs. xml:lang |
You write: The StringLiteral following the keyword language designates one language. It must be castable to "xs:language" We propose to use the data type definition for |
FS | S | |
6 | 3.3.5 | White space in stop words |
You describe a stop words as a literal. We wonder whether a stop word is allowed to contain white space. The last example in
sec. 3.3.5 looks like as if white space is used as a separator between the stop words |
FS | S |