Bugzilla – Bug 17918
i18n-ISSUE-136: Recognition of number formats in Number state
Last modified: 2013-02-01 02:49:44 UTC
This was was cloned from bug 16165 as part of operation convergence.
Originally filed: 2012-02-29 19:51:00 +0000
Original reporter: Richard Ishida <firstname.lastname@example.org>
#0 Richard Ishida 2012-02-29 19:51:05 +0000
188.8.131.52.13 Number state
"The algorithm to convert a string to a number, given a string input, is as follows: If applying the rules for parsing floating point number values to input results in an error, then return an error; otherwise, return the resulting number."
Does this refer to conversion of the text the user inputs into the field, ie. if the user types in
will the rules for parsing floating point number values be applied to that to determine whether or not that is a valid number?
Or does this apply only to interpretation of the attribute value or the value passed to the value by the browser after interpretation of the user input? I assumed that this is the case, but it isn't clear.
I think a note would be useful, both here and for other states where formats vary around the world, to clarify this and btw to remind browser developers that the normal way of representing numbers differs around the world and they will should expect to have to convert the user input to the expected internal representation (but not change the user's input itself).
New spec location:
Discussed at the Internationalization WG meeting 2012-11-01: What's really causing the confusion is the phrase "User agents must not allow the user to set the value...".
If we understand the intent of the spec correctly, the user doesn't set the value at all; instead, the user types a string in a localized format, it gets interpreted and converted to a value, using algorithms that are not specified in HTML. The conversion algorithms discussed in the spec have nothing to do with user input; they handle text that's part of the HTML source or form submissions.
The phrase above contradicts this intent and should be fixed.
In addition, the note in the section uses a somewhat extreme example (Persian or Arabic numbers), so it's not obvious that more subtle differences also need to be handled, such as different decimal and grouping separators for German or for India.
The user agent provides a mechanism by which the user is to be able to change the value. It's an indirect manipulation, but the user is still setting the value.
This is just like I am setting the value of this textarea as I type in it, but I'm not personally going into the system RAM with an oscilloscope and probe to manipulate the element's state; the browser is handling my key presses and through a long sequence of operations this results in the value changing. But I'm still changing it.
To put it another way, a cook boils an egg by putting the egg in water and heating the water. The cook isn't strictly speaking boiling the egg directly; it's heated by the water, which is heated by the stove, etc. But the cook is still said to be boiling the egg.
Does that make sense?
Your explanation makes sense, but I think that the section itself isn't clear.
As Norbert pointed out, the example is given as:
For example, a user agent in Persian or Arabic markets might support Persian and Arabic numeric input (converting it to the format required for submission as described above).
This isn't clear enough: it calls out a "more special case" for Persian digits, but potentially leaves intact an impression that ASCII input must conform to the "algorithm to convert a string to a number". That algorithm is intended for the internal, locale-neutral representation and is not suitable for an international user's input.
So, the problem isn't with the algorithms applied or the processing done internally to the user-agent. It's that the intervening processing of the user's input (as well as the presentation of the value to the user) is not clearly enough earmarked as needing to be localized. I would suggest that the way to clarify this is to modify the example given above to say:
For example, a user agent might support numeric input in the user's local format, converting it to the format required for submission as described above. This might include handling different grouping or decimal separators (such as "12.456,78", as customary in many European locales) or local digit shaping (such as the use of digits in Arabic, Persian, Devanagari, Thai, and other scripts) or a combination thereof.
Finally, there is no guidance for how the page author can control the appearance of the input field in the page. Ideally, the user agent would follow @lang of the element when presenting the value so that the page looks consistent. That is, this:
<label lang="en-GB">How much are you willing to pay:<input type=number min=0 step=0.01 name=price value=12345.67></label>
Looks like this:
How much are you willing to pay: [ 12345,67 ]
How to configure the control's l10n settings is intentionally left unanswered at the moment; there's a bug filed for adding a locale="" attribute or some such to control that.
What text is it that gives 'an impression that ASCII input must conform to the "algorithm to convert a string to a number"'? As far as I can tell, there's nothing that says that. Please don't read between the lines, the spec only means what it says, nothing more.
(I'll try to expand the example along the lines of what you suggest.)
I'm trying not to read between the lines, but I think changing the note would help eliminate the impression of numeric input being a direct interpretation of the user's input. I have no technical problem with the text, as it (correctly) pertains to the actual representation/handling of the value. Strengthening the note to convey the separation of presentation/parsing of the user's input from the internal processing could handle the issue.
Regarding the second thing, let's cover that in the other bug(s).
(In reply to comment #4)
> For example, a user agent might support numeric input in the user's local
> format, converting it to the format required for submission as described
> above. This might include handling different grouping or decimal separators
> (such as "12.456,78", as customary in many European locales) or local digit
> shaping (such as the use of digits in Arabic, Persian, Devanagari, Thai, and
> other scripts) or a combination thereof.
Please avoid "local digit shaping". It means displaying ASCII digit codepoints with totally different glyphs.
The difference between using local digit shapes on display and using the actual local digit codepoints in the case at hand may be irrelevant because there is an additional conversion step, but as far as I understand, local digit shaping is discouraged (it would address just part of localization needs for number representations anyway).
So please change "local digit shaping (such as the use of digits in" to "using local digits (e.g. digits in".
Ok, I added more text in two places. Please reopen if it's still unclear.
Checked in as WHATWG revision r7686.
Check-in comment: Try adding yet more explanatory text.