W3C Architecture DomainW3C Internationalization (I18n) Activity: Making the World Wide Web truly world wide!

Related links

Other reviews

Review radar

Core WG home page

Internationalization Comments on XmlHttpRequest

Version reviewed: http://www.w3.org/TR/2007/WD-XMLHttpRequest-20071026/
Lead reviewer and date of initial review: Addison Phillips, November 2007
Subject lead in: [XHR]

These are comments on behalf of the Internationalization Core WG, unless otherwise stated. The "Owner" column indicates who has been assigned the responsibility of tracking discussions on a given comment.

We recommend that responses to the comments in this table use a separate email for each point. This makes it far easier to track threads. Click on the icons in the right-most column to see email discussions.

ID Location Subject Comment Owner Ed. /
1 general Editorial needs

The introduction is rather stiffly written. Section 2 really should be divided into subsections. Among other things, it is difficult to reference parts of section 2 in its current state.

2 1.2 Define conforming XML user agent

The definition of "Conforming XML user agent" is not written in the same normative style as that of "conforming user agent" (which appears just above it in section 1.2). In particular, it should use normative RFC 2119 language.

3 1.2.1 define what DOM subsets means Section 1.2.1 requires that conforming user agents support some subset of DOM Events, DOM Core, and Window Object, but doesn't specify what that subset is. This makes the normative language difficult to conform to AP S
4 clarify case-insensitive limited to ASCII-only HTTP header values

Section 1.2.2 defines case-insensitive matching as follows:


There is a case-insensitive match of strings s1 and s2 if after uppercasing both strings (by mapping a-z to A-Z) they are identical.


This doesn't make clear that this only applies to a limited domain of ASCII-only HTTP headers rather than serving as an overall definition of case-insensitivity.

It should also mention that this is the default case mapping for Latin text.

There are languages (Turkish, for example) in which the default mapping doesn't apply and this potentially causes problems for matching: when case-mapping is instantiated in these locales, by default they do the "wrong thing".

In any case, I would propose that this be changed as suggested below, since some programmers forget about locale-specific rules in their default case-mappings:


There is a case-insensitive match of strings s1 and s2 if they compare identically using the default case foldings defined by Unicode (which equates the ranges [a-z] and [A-Z]). Note that these do not include language-specific mappings, such as the dotted/dotless 'i' mappings in Turkish or Azerbaijani (see Unicode Section 3.13 and the CaseFolding.txt file in the UCD).


5 Section 2 charset detection health warning

Before the section on charset detection in Section 2, there should be a health warning stating something like:


For interoperability, the use of a Unicode encoding, particularly UTF-8, is RECOMMENDED. Non-Unicode encodings are difficult to detect and effectively limit the range of character data that can be transmitted reliably.


6 Section 2, setRequestHeader combining of multiple Accept-Language headers

The section on setRequestHeader says:


If the header argument is in the list of request headers either usemultiple headers, combine the values or use a combination of those


It then gives and example in which the headers are combinedalgorithmically (basically, concatenating them). However, some headers, such as Accept-Language, use q-weights and other structure and this approach may not work acceptably in those cases. Perhaps provide some guidance on these cases?

7 send() Encoding of DOMString

In the send() method, if 'data' is a DOMString, it is always encoded as UTF-8 (good). But this seems at odds with the ability to specify different encodings in the headers, etc. Really the currently specified behavior is the behavior the I18N WG would recommend. Perhaps it should explicitly state that string data is always sent as UTF-8?

8 send() content negotiation handling At the end of the section on the send method, this para appears:


If the user agent implements server-driven content-negotiation it should set Accept-Language, Accept-Encoding and Accept-Charset headers as appropriate; it must not automatically set the Accept header. Responses to such requests must have the content-encodings automatically decoded. [RFC2616]


The Accept-Language header is currently the only mechanism available in XHR for locale management. It may be important for locale-sensitive interactions to convey a language or locale to the server. Thus, it would be useful to separately mention the Accept-Language header and its use in informing the server of language/locale preference. In addition, we suggest you recommend the use of the BCP 47's Lookup algorithm (found in the RFC 4647 portion of BCP 47) for matching the A-L header.

Note: the normative words "should", "must not", and "must" all appear in non-normative form.

9 general locale information

In addition to the comments below, we note that there is barely mention of language or locale negotiation or locale considerations in this document. This is probably appropriate given the scope of this document, focused strictly on the XmlHttpRequest object. However, it should be noted that lack of these capabilities will require non-interoperable custom implementations. Standardization of language/locale negotiation for AJAX and REST type interactions (that rely on XHR) should be described somewhere. This may represent a work item for the Internationalization Core WG. In particular, we note these documents:


Page template by Richard Ishida (ishida@w3.org).