This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 6834 - add more number bases to form inputs?
Summary: add more number bases to form inputs?
Status: CLOSED WONTFIX
Alias: None
Product: HTML WG
Classification: Unclassified
Component: pre-LC1 HTML5 spec (editor: Ian Hickson) (show other bugs)
Version: unspecified
Hardware: All All
: P4 enhancement
Target Milestone: ---
Assignee: Ian 'Hixie' Hickson
QA Contact: HTML WG Bugzilla archive list
URL: http://www.w3.org/TR/html5/single-page/
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-04-20 09:47 UTC by Nick Levinson
Modified: 2010-10-04 14:56 UTC (History)
3 users (show)

See Also:


Attachments

Description Nick Levinson 2009-04-20 09:47:29 UTC
For form input type states, either number base extensibility or at least some discrete number bases other than 10 should be added, although the prime audience might be limited to IT professionals. The relevant states are number and range, in sections 4.10.4.1.12 and 4.10.4.1.13, respectively, in <http://www.w3.org/TR/html5/single-page/>, Working Draft 12 February 2009, as accessed 4-19-09.

If specifying and implementing extensibility to any positive integer base are feasible, an approach would be to add two new states, number-baseful and range-baseful, to incorporate the existing states number and range into the new states as special cases that are largely predefined, to let a page author specify whether all characters are legal or only certain characters (or none) in addition to those needed for possible valid numbers, to strip or ignore certain substrings usually used to identify a number base (e.g., 0x, b, h, or leading 0 or, with more difficulty, trailing subscript n) for sortation, to sort based on the new fields starting at the rightmost character before the dot (unlike for text) with accommodation for what is to the right of the dot including extended notation regardless of base, to base sorting on the characters listed by the page author in the intended sort order (including ordinary digits), and to let the page author specify whether sorting would be case-insensitive (it usually is for hex but not for base-85). The valid floating point number described in section 2.4.3.3 is base-10, so its definition would have to be extended or complemented.

If extensibility is not worth the complexity, discrete bases 2, 8, 16, and 85 might be good candidates, if for base 85 using character representations in lieu of the less-than and greater-than angle brackets is workable in such a field (the angle brackets are specified in RFC 1924, section 4.2). If discreteness is preferred, the new states could be number-2, range-2, number-16, and so on, with each field having its characteristics predefined.

Thank you.

-- 
Nick
Comment 1 Lachlan Hunt 2009-04-20 10:36:48 UTC
What use cases are you trying to address and what problems are you trying to solve?  If numbers other than base 10 are needed, why isn't type=text with a pattern attribute to restrict the allowable characters sufficient?  Can you provide significant examples of sites accepting non-base-10 numerical input for something?
Comment 2 Nick Levinson 2009-04-24 08:24:34 UTC
As I more or less noted (and why I set priority to P4), the need is infrequent and specialized, largely to computer scientists, of whom many from time to time deal in hexadecimal and by extension binary, although octal is much less common. Base 85 is established for compact expression of IPv6 IP addresses, which are not yet in use anywhere to my knowledge, but they're not in use because of difficulties in transitioning from IPv4, and I understand (if I'm right) that part of the delay is that Internet backbone computers may not be implementing IPv6 yet and won't until firmware is replaced as they age out. But it's fair to expect IPv6 to come into implementation during the lifespan of HTML5.

Mathematicians might justify a wider range of bases, thus extensibility for which others can write extensions on a base-by-base basis.

For programmers, the most interesting base is 16. Bases 2 and 8 can be handled with a base-10 field with the caveat of lacking entry validation against ineligible digits, a solution for which might require a non-HTML facility, such as a CGI script. And base 85 requires dangerous characters, albeit perhaps solubly.

But probably the most important reason for a database management system to have multiple data types is that sorting methods differ. E.g., by text-sorting rules "200" comes before "30" because of left-to-right evaluation while by number-sorting rules "Z" comes before "a" because of the codepoint sequence for a single-character comparison. In base 16, this is not a problem until users type in different cases, which in hexadecimal is a meaningless difference, so, to meet users' expectations and thus a usability standard, case has to be made sortationally irrelevant to hex entries, even if for style consistency a user wants case-checking.

Add to that common practices in identifying the base of a number as part of a spaceless string and usability in meeting users' expectations suggests a need to accept a wider range of characters during entry, to allow single-zero-padding, to perform more flexible entry validation, and to ignore substrings for sortation. In that case, bases 2 and 8 need their own fields or field extensibility and even decimal needs modification to accept the sometime practice of suffixing "d" when other numbers are comparably base-labeled.

Bases 2 and 8, if they don't provide entry validation against oversized digits such as "9" will fail  to meet the usability expectation that when "Q" is rejected in a number field and when "f" is rejected in a present-day (decimal) number field then "9" should be rejected in an octal field and "3" in a binary field. (The "8" and the "2", respectively, also should be rejected unless used in base identifier strings.)

Falling short in usability is not a technical failure, but if it results in people not noticing their own errors they'll blame software for technical errors such as bad sorting or inconsistent behavior. The software may as well be designed to accommodate these expectations before finalization than await the virtually inevitable "why doesn't this work" afterwards. Not offering multiple bases won't be criticized on that ground but not validating or sorting as expected will be.

Perhaps multiple-base capability is too specialized a demand, although computer scientists tend to be early adopters of computer technology, so I think it'll be used, for example, to allow one person to receive and diagnose another user's issue.

Thank you.

-- 
Nick
Comment 3 Philip Taylor 2009-04-24 13:43:42 UTC
As Lachlan said, you can already do client-side validation with the 'pattern' attribute, like:
  <input type="text" pattern="[01]+">
  <input type="text" pattern="[0-9a-fA-F]+">
  <input type="text" pattern="0b[01]+|0x[0-9a-fA-F]+">
etc.

Sorting doesn't seem relevant to HTML - that's an issue for the server's application code or its database, and has nothing to do with HTML form inputs.
Comment 4 Nick Levinson 2009-04-25 06:58:15 UTC
HTML5 explicitly provides for sorting. See sections 4.11.2.1 (nonnormative) and 4.11.2.2, 4.11.2.2.2-.3, and for the future possibly 10.4.4 (all apparently normative by default).

Sorting is a very major part of database usage and HTML5 in providing as many data types as it does actively supports sorting, not just input validation. Most of the data types that HTML5 separates out would require different rules for sorting. The types might have been separated only for ease in input validation but their existence also supports sorting. For example, the URL type can be sorted by TLD, domain, directory, query component, or full URL, the email type by TLD, domain, userinfo, or whether multiple, and, of course, number, chronological, and text fields by respective rules.

Were validation the only goal of HTML5's distinction of types, adding bases other than 10 would not be hard to justify or execute anyway, although the patterns should be more sophisticated to handle base identifiers within number strings. But clearly sortation is part of the draft HTML5 specification.

Thank you.

-- 
Nick
Comment 5 Ian 'Hixie' Hickson 2009-06-28 10:33:22 UTC
The spec doesn't preclude user agents from implementing type=number in a way that exposes other bases, so as far as I can tell this is already possible.

For use cases such as entering large amounts of hex data, however, I would recommend using <textarea> rather than <input type=number>. It's not like people are going to be entering chunks of hex using a spin control.

For sorting in datagrids, the sorting is done by the script, so HTML doesn't have to provide any particular base support.
Comment 6 Maciej Stachowiak 2010-03-14 14:48:04 UTC
This bug predates the HTML Working Group Decision Policy.

If you are satisfied with the resolution of this bug, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
  http://dev.w3.org/html5/decision-policy/decision-policy.html

This bug is now being moved to VERIFIED. Please respond within two weeks. If this bug is not closed, reopened or escalated within two weeks, it may be marked as NoReply and will no longer be considered a pending comment.