Bug 8651 - [SER] What does it mean to compare without consideration of case?
[SER] What does it mean to compare without consideration of case?
Status: CLOSED FIXED
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Serialization 1.0
Recommendation
PC Windows XP
: P2 normal
: ---
Assigned To: Henry Zongaro
Mailing list for public feedback on specs from XSL and XML Query WGs
http://www.w3.org/TR/2007/REC-xslt-xq...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-01-05 14:54 UTC by Henry Zongaro
Modified: 2010-06-29 13:54 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Henry Zongaro 2010-01-05 14:54:22 UTC
In two places, the Serialization recommendation indicates that comparisons of strings of characters should be performed without regard to case.  In section 7.1,[1] the second paragraph following the numbered list begins, "The HTML output method MUST recognize the names of HTML elements regardless of case."  In section 6.1.13, we have "making the comparison without consideration of casing and leading/trailing spaces" [2]

Two errata have also been issued that use a similar formulation.  Erratum SE.E5 [3] added the phrase "making the comparison without consideration of case and leading or trailing spaces" to section 7.4.13.  The yet-to-be-published erratum SE.E14 [4] adds the phrase "if the value of the attribute node actually is equal to the name of the attribute without regard to case" to section 7.2.

[1] http://www.w3.org/TR/2007/REC-xslt-xquery-serialization-20070123/#HTML_MARKUP
[2] http://www.w3.org/TR/2007/REC-xslt-xquery-serialization-20070123/#XHTML_INCLUDE-CONTENT-TYPE
[3] http://www.w3.org/XML/2007/qt-errata/xslt-xquery-serialization-errata.html#E5
[4] http://www.w3.org/Bugs/Public/show_bug.cgi?id=7829
Comment 1 Henry Zongaro 2010-01-05 15:25:23 UTC
I propose the following resolution to this problem:

In section 1.1,[5] add the definition:

. [Definition:  Where this specification indicates that two strings are to be <b>compared without regard to case</b>, the serializer <rfc2119>MUST</rfc2119> translate any characters in the range #x41 (LATIN CAPITAL LETTER A) to #x5A (LATIN CAPITAL LETTER Z), inclusive, to the corresponding lower-case letters in the range #x61 (LATIN SMALL LETTER A) to #x7A (LATIN SMALL LETTER Z) only for the purposes of making the comparison.  The comparison succeeds if the two strings are the same length and the code point of each characters in the first string is equal to the code point of the character in the corresponding position in the second string.

In section 7.1,[1] change "regardless of case" to "<termref def="caseless-compare">making the comparison without regard to case</termref>".

In section 6.1.13,[2] change "making the comparison without consideration of
casing and leading/trailing spaces" to "<termref def="caseless-compare">making the comparison without regard to case</termref>, after first stripping leading and trailing spaces from the value of the attribute solely for the purposes of comparison."

In section 7.4.13 as modified by erratum SE.E5,[3] change "making the comparison without consideration of case and leading or trailing spaces" to to "<termref def="caseless-compare">making the comparison without regard to case</termref>, after first stripping leading and trailing spaces from the value of the attribute solely for the purposes of comparison."

In section 7.2 as modified by erratum SE.E14,[4] change "is equal to the name of
the attribute without regard to case" to "is equal to the name of the attribute, <termref def="caseless-compare">making the comparison without regard to case</termref>."

[5] http://www.w3.org/TR/2007/REC-xslt-xquery-serialization-20070123/#terminology
Comment 2 Henry Zongaro 2010-01-06 16:49:48 UTC
It's probably clear from comment #1, but I believe the intent was that the case of a character is ignored only if the character is in the ASCII range.  So, for instance, #x131 (LATIN SMALL LETTER DOTLESS I) would ordinarily be treated as equal to #x49 (LATIN CAPITAL LETTER I) in a caseless string comparison, but an element named &#305; should not be recognized as an HTML I element under the rules of section 7.1.

The most recent public draft of HTML 5.0 [6] defines the term "ASCII case-insensitive" to mean the same thing as the term "compared without regard to case" that I've proposed.  That draft uses that term in defining Boolean attributes, in defining the permitted values of enumerated attributes (including http-equiv), and defines HTML tag names to use characters only in the ASCII range - all the places noted by this bug report.  There's no reason to believe that HTML 5.0 has placed additional constraints in these areas rather than simply clarified the rules.

[6] http://www.w3.org/TR/2009/WD-html5-20090825/infrastructure.html#case-sensitivity-and-string-comparison
Comment 3 Henry Zongaro 2010-01-13 19:08:08 UTC
At their joint call of the XQuery and XSL Working Groups of 2010-01-12, the working groups adopted the proposal in comment #1.[7]  As not many XSL WG members were present, I will bring this back to the XSL Working Group for ratification.

[7] http://lists.w3.org/Archives/Member/w3c-xsl-query/2010Jan/0055.html (Member-only link to minutes of joint teleconference)
Comment 4 Henry Zongaro 2010-03-17 14:39:31 UTC
Bug was marked resolved/fixed by an unknown intruder.  Reopening.
Comment 5 Henry Zongaro 2010-06-03 21:12:22 UTC
At its teleconference of 3 June 2010,[8] the XSL Working Group ratified the
decision to adopt the proposal made in comment #1.  This will be Serialization
erratum SE.E17.

[8] http://lists.w3.org/Archives/Member/w3c-xsl-wg/2010Jun/0011.html
(Member-only link)