This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
Editorial Some of the entries of the Diacritics Matrix in 3.2.2 do not clearly describe what the intended comparison operation for the given case should be. In particular, the entries for - entry for UCC / "insensitive", which states "compare as if with and without" (well, what???) - 4 entries for UCC+CDS / "with" + "without diacritics", which use an exemplary query. The reader has no clue how to interpret those exemplary queries and even if they are meant to show how to reduce the "with" and "without" options to the other options, there are several problems with those queries. E.g. in the entry for CDS / "with diacritics" the query stated there: "resume diacritics insensitive" not in "resume" (i) is syntactically not what it meant to be (probably: "resume" diacritics insensitive not in "resume"), (ii) depends on diacritic options higher up the query tree, or a specified default for the diacritic option (note that the second "resume" term is matched according to that diacritic setting); and (iii) can never have a match in the default case where the second "resume" is matched insensitive as well. So maybe, this query should be: "resume" diacritics insensitive not in "resume" diacritics sensitive (which would indeed be an equivalent rewrite for "resume" with diacritics, because the term "resume" is spelled deliberately without diacritics in the second subquery), but then what would be the case for "without diacritics"? Also the rewriting relies that we have control over whether the query term contains diacritics itself and how it would need to be transposed in case it did. In general, however, we cannot assume this. E.g. consider the query: $node ftcontains $term with diacritics /jochen
Here is my proposal to fix the matrix. 1. UCC/"insensitive" should read: compare base characters only, disregarding diacritics The row 3 and 4 (for with+without diacritics) should be dropped. Instead add the following sentence after the table and the Note: For options "with diacritics" and "without diacritics" the underlying comparison is the same as for "diacritics insensitive", however only tokens are considered that contain, respectively, do not contain characters with diacritical marks. I hope this improves it. /Jochen
I support Jochen's proposal to reduce the diacritics options in v.1 to sensitive and insensitive. If this proposal is accepted, it may also prompt closing of Bug 3927.
The proposed change was discussed by the TF at its February 1/2 F2F and was adopted in principal. We are marking this bug FIXED. Since you were present when we adopted this resolution, and agreed to that resolution, we are also marking it CLOSED.