Input Modes

D Input Modes

The attribute inputMode provides a hint to the user agent to select an appropriate input mode for the text input expected in an associated form control. The input mode may be a keyboard configuration, an input method editor (also called front end processor) or any other setting affecting input on the device(s) used.

Using inputMode, the author can give hints to the agent that make form input easier for the user. Authors should provide inputMode attributes wherever possible, making sure that the values used cover a wide range of devices.

D.1 `inputMode` Attribute Value Syntax

The value of the inputMode attribute is a white space separated list of tokens. Tokens are either sequences of alphabetic letters or absolute URIs. The later can be distinguished from the former by noting that absolute URIs contain a ':'. Tokens are case-sensitive. All the tokens consisting of alphabetic letters only are defined in this specification, in D.3 List of Tokens (or a successor of this specification).

This specification does not define any URIs for use as tokens, but allows others to define such URIs for extensibility. This may become necessary for devices with input modes that cannot be covered by the tokens provided here. The URI should dereference to a human-readable description of the input mode associated with the use of the URI as a token. This description should describe the input mode indicated by this token, and whether and how this token modifies other tokens or is modified by other tokens.

D.2 User Agent Behavior

Upon entering an empty form control with an inputMode attribute, the user agent should select the input mode indicated by the inputMode attribute value. User agents should not use the inputMode attribute to set the input mode when entering a form control with text already present. To set the appropriate input mode when entering a form control that already contains text, user agents should rely on platform-specific conventions.

User agents should make available all the input modes which are supported by the (operating) system/device(s) they run on/have access to, and which are installed for regular use by the user. This is typically only a small subset of the input modes that can be described with the tokens defined here.

The following simple algorithm is used to define how user agents match the values of an inputMode attribute to the input modes they can provide. This algorithm does not have to be implemented directly; user agents just have to behave as if they used it. The algorithm is not designed to produce "obvious" or "desirable" results for every possible combination of tokens, but to produce correct behavior for frequent token combinations and predictable behavior in all cases.

First, each of the input modes available is represented by one or more lists of tokens. An input mode may correspond to more than one list of tokens; as an example, on a system set up for a Greek user, both "greek upper" and "user upper" would correspond to the same input mode. No two lists will be the same.

Second, the inputMode attribute is scanned from front to back. For each token t in the inputMode attribute, if in the remaining list of tokens representing available input modes there is any list of tokens that contains t, then all lists of tokens representing available input modes that do not contain t are removed. If there is no remaining list of tokens that contains t, then t is ignored.

Third, if one or more lists of tokens are left, and they all correspond to the same input mode, then this input mode is chosen. If no list is left (meaning that there was none at the start) or if the remaining lists correspond to more than one input mode, then no input mode is chosen.

Example: Assume the list of lists of tokens representing the available input modes is: {"cyrillic upper", "cyrillic lower", "cyrillic", "latin", "user upper", "user lower"}, then the following inputMode values select the following input modes: "cyrillic title" selects "cyrillic", "cyrillic lower" selects "cyrillic lower", "lower cyrillic" selects "cyrillic lower", "latin upper" selects "latin", but "upper latin" does select "cyrillic upper" or "user upper" if they correspond to the same input mode, and does not select any input mode if "cyrillic upper" and "user upper" do not correspond to the same input mode.

D.3 List of Tokens

Tokens defined in this specification are separated into two categories: Script tokens and modifiers. In inputMode attributes, script tokens should always be listed before modifiers.

D.3.1 Script Tokens

Script tokens provide a general indication the set of characters that is covered by an input mode. In most cases, script tokens correspond directly to Unicode Scripts (see http://www.unicode.org/Public/UNIDATA/Scripts.txt). Some tokens correspond to the block names in Java class java.lang.Character.UnicodeBlock (see http://java.sun.com/j2se/1.4/docs/api/java/lang/Character.UnicodeBlock.html; see also Unicode Block names at http://www.unicode.org/Public/UNIDATA/Blocks.txt). However, this neither means that an input mode has to allow input for all the characters in the script or block, nor that an input mode is limited to only characters from that specific script. As an example, a "latin" keyboard doesn't cover all the characters in the Latin script, and includes punctuation which is not assigned to the Latin script. The version of the Unicode Standards that these script names are taken from is 3.2.

Input Mode Token	Comments
arabic	Unicode script name
armenian	Unicode script name
bengali	Unicode script name
bopomofo	Unicode script name
braille	used to input braille patterns (not to indicate a braille input device)
buhid	Unicode script name (Unicode 3.2)
canadianAboriginal	Unicode script name
cherokee	Unicode script name
cyrillic	Unicode script name
devanagari	Unicode script name
ethiopic	Unicode script name
georgian	Unicode script name
greek	Unicode script name
gujarati	Unicode script name
gurmukhi	Unicode script name
han	Unicode script name
hangul	Unicode script name
hanunoo	Unicode script name (Unicode 3.2)
hebrew	Unicode script name
hiragana	Unicode script name (may include other Japanese scripts produced by conversion from hiragana)
ipa	International Phonetic Alphabet
kannada	Unicode script name
katakana	Unicode script name (full-width, not half-width)
khmer	Unicode script name
lao	Unicode script name
latin	Unicode script name
malayalam	Unicode script name
math	mathematical symbols and related characters
mongolian	Unicode script name
myanmar	Unicode script name
ogham	Unicode script name
oriya	Unicode script name
runic	Unicode script name
sinhala	Unicode script name
syriac	Unicode script name
tagalog	Unicode script name (Unicode 3.2)
tagbanwa	Unicode script name (Unicode 3.2)
tamil	Unicode script name
telugu	Unicode script name
thaana	Unicode script name
thai	Unicode script name
tibetan	Unicode script name
user	Special value denoting the 'native' input of the user (e.g. to input her name or text in her native language).
yi	Unicode script name
oldItalic	Unicode script name
gothic	Unicode script name
deseret	Unicode script name
hanja	Subset of 'han' used in writing Korean
kanji	Subset of 'han' used in writing Japanese
simplifiedHanzi	Subset of 'han' used in writing Simplified Chinese
traditionalHanzi	Subset of 'han' used in writing Traditional Chinese

D.3.2 Modifier Tokens

Modifier tokens can be added to the scripts they apply to more closely specify the kind of characters expected in the form field. Traditional PC keyboards do not need most modifier tokens (indeed, users on such devices would be quite confused if the software decided to change case on its own; CAPS lock for upperCase may be an exception). However, modifier tokens can be very helpful to set input modes for small devices.

Input Mode Token	Comments
lowerCase	lower case (for bicameral scripts)
upperCase	upper case (for bicameral scripts)
titleCase	title case (for bicameral scripts): words start with an upper case letter
startUpper	start input with one upper case letter, then continue with lower case letters
digits	digits of a particular script (e.g. inputMode='thai digits')
symbols	symbols, punctuation (suitable for a particular script)
predict	indicates that text prediction should be switched on (e.g. for running text)

D.4 Relationship to XML Schema pattern facets

User agents may use information available in an XML Schema pattern facet to set the input mode. Note that a pattern facet is a hard restriction on the lexical value of an instance data node, and can specify different restrictions for different parts of the data item. Attribute inputMode is a soft hint about the kinds of characters that the user may most probably start to input into the form control. Attribute inputMode is provided in addition to pattern facets for the following reasons:

The set of allowable characters specified in a pattern may be so wide that it is not possible to deduce a reasonable input mode setting. Nevertheless, there frequently is a kind of characters that will be input by the user with high probability. In such a case, inputMode allows to set the input mode for the user's convenience.
In some cases, it would be possible to derive the input mode setting from the pattern because the set of characters allowed in the pattern closely corresponds to a set of characters covered by an inputMode attribute value. However, such a derivation would require a lot of data and calculations on the user agent.
Small devices may leave the checking of patterns to the server, but will easily be able to switch to those input modes that they support. Being able to make data entry for the user easier is of particular importance on small devices.

D.5 Examples

This is an example of a form for Japanese address input. It is shown in table form; it will be replaced by actual syntax in a later version of this specification.

Caption:	`inputMode`
Family name	hiragana
(in kana)	katakana
Given name	hiragana
(in kana)	katakana
Zip code	latin digits
Address	hiragana
(in kana)	katakana
Email	latin lowerCase
Telephone	latin digits
Comments	user predict


Table of Contents	Prev	Next	Bottom