B Input Modes

Editorial note  
The following proposal was presented to the XForms Working Group and is presently under consideration. We invite feedback and comments.

Attribute name: inputMode

Attribute values: See list below

Semantics: The 'inputMode' attribute provides an indication to the user agent to select an appropriate input mode for the text input expected in the input field. The input mode may be a keyboard configuration, an input method editor (also called front end processor) or a particular configuration thereoff, or any other setting affecting input on the device(s) used.

Upon entering an empty text input field with an inputMode attribute, the user agent SHOULD set the configuration so that the characters indicated by the attribute value can be input easily. User agents MAY use information about the text already present to set the appropriate input mode when entering a text input field that already contains text, or when moving around in such a text field.

User agents SHOULD NOT use the inputMode attribute to set the input mode when entering a field with text already present. User agents SHOULD recognize all the input modes which are supported by the (operating) system/device(s) they run on/have access to, and which are installed for regular use by the user. User agents are not required to recognize all of the attribute values, only those that they support. Unrecognized attribute values SHOULD be treated the same way as if the attribute were not present. Unrecognized attribute values MUST NOT result in an user agent error. Future versions of this specification may add new attribute values.

User agents MAY use information available in an XML Schema pattern facet to set the input mode. Note that a pattern facet is a hard restriction on the contents of a data item, and can specify different restrictions for different parts of the data item. 'inputMode' is a soft hint about the kinds of characters that the user may most probably (start to) input into the text field. 'inputMode' is provided in addition to pattern facets for the following reasons:

  1. The set of allowable characters specified in a pattern may be so wide that it is not possible to deduce a reasonable input mode setting. Nevertheless, there frequently is a kind of characters that will be input by the user with high probability. In such a case, inputMode allows to set the input mode for the user's convenience.

  2. In some cases, it would be possible to derive the input mode setting from the pattern because the set of characters allowed in the pattern closely corresponds to a set of characters covered by an 'inputMode' attribute value. However, such a derivation would require a lot of data and calculations on the user agent.

  3. Small devices may leave the checking of patterns to the server, but will easily be able to switch to those input modes that they support. Being able to make data entry for the user easier is of particular importance on small devices.

List of allowed attribute values: [there are three sections: 1) in, 2) questionable, 3) out]

Where there are no comments, the values are the Unicode Block names (see [UnicodeBlocks]: http://www.unicode.org/Public/UNIDATA/Blocks.txt). The block names are upper-cased, and use underlines for spaces, so that they correspond to the values in the Java java.lang.Character.UnicodeBlock class (see [JavaUnicodeBlocks]: http://java.sun.com/j2se/1.4/docs/api/java/lang/Character.UnicodeBlock.html). The version of The Unicode Standards that these block names are taken from is 3.1.

Most block names selected as attribute values are equivalent to script names. Please also see [UnicodeScripts]: http://www.unicode.org/Public/UNIDATA/Scripts.txt for further information. The block names have been chosen because they are more formally defined.

Block names containing words such as 'extended' or 'supplement' have not been included in the list of attribute values; The characters these blocks include are covered by the (non-'extended') attribute value with the same script name. [a list of excluded block names can be found below]

Block names from [JavaUnicodeBlocks], corresponding to [UnicodeBlocks]: With the exception of 'UNIFIED_CANADIAN_ABORIGINAL_SYLLABICS' (who's script name is CANADIAN-ABORIGINAL), these names all also appear in [UnicodeScripts]:

ARABIC
ARMENIAN
BENGALI
BOPOMOFO
CHEROKEE
CYRILLIC
DEVANAGARI
ETHIOPIC
GEORGIAN
GREEK
GUJARATI
GURMUKHI
HEBREW
HIRAGANA
KANNADA
KATAKANA
KHMER
LAO
MALAYALAM
MONGOLIAN
MYANMAR
OGHAM
ORIYA
RUNIC
SINHALA
SYRIAC
TAMIL
TELUGU
THAANA
THAI
TIBETAN
UNIFIED_CANADIAN_ABORIGINAL_SYLLABICS

Additional block names not in [JavaUnicodeBlocks] (added to [UnicodeBlocks] for Unicode 3.1). These are identical to the names in [UnicodeScripts] (with the change of '-' to '_'):

OLD_ITALIC
GOTHIC
DESERET

Additional attribute values from [JavaInputSubset]: http://java.sun.com/j2se/1.4/docs/api/java/awt/im/InputSubset.html:

FULLWIDTH_DIGITS

Constant for the fullwidth digits included in the Unicode halfwidth and fullwidth forms character block.

FULLWIDTH_LATIN

Constant for the fullwidth ASCII variants subset of the Unicode halfwidth and fullwidth forms character block.

HALFWIDTH_KATAKANA

Constant for the halfwidth katakana subset of the Unicode halfwidth and fullwidth forms character block.

HANJA

Constant for all Han characters used in writing Korean, including a subset of the CJK unified ideographs as well as Korean Han characters defined in higher planes.

KANJI

Constant for all Han characters used in writing Japanese, including a subset of the CJK unified ideographs as well as Japanese Han characters defined in higher planes.

LATIN

Constant for all Latin characters, including the characters in the BASIC_LATIN, LATIN_1_SUPPLEMENT, LATIN_EXTENDED_A, LATIN_EXTENDED_B Unicode character blocks.

LATIN_DIGITS

Constant for the digits included in the BASIC_LATIN Unicode character block.

SIMPLIFIED_HANZI

Constant for all Han characters used in writing Simplified Chinese, including a subset of the CJK unified ideographs as well as Simplified Chinese Han characters defined in higher planes.

TRADITIONAL_HANZI

Constant for all Han characters used in writing Traditional Chinese, including a subset of the CJK unified ideographs as well as Traditional Chinese Han characters defined in higher planes.

Block names [JavaUnicodeBlocks] for which inclusion is *unclear* (with questions):

BRAILLE_PATTERNS

Are there devices/forms where it is useful to indicate that actual braille patterns are requested (in contrast to braille device input that is immediately converted to other characters)?

GENERAL_PUNCTUATION

We probably need something to stand for symbol input modes on handhelds and mobiles. But this may not be the right name. Also, even if there is such an input mode, do we expect it to start a field with?

CJK_SYMBOLS_AND_PUNCTUATION

Are there devices (e.g. mobiles) with separate input modes for punctuation in different scripts? Does such punctuation start a field? Should we use a modifier rather than a separate list of values for such punctuation modes?

CJK_UNIFIED_IDEOGRAPHS

We have values for Kanji, Hanja, and simplified/traditional Hanzi (i.e. for ideograph input optimized for Japanese, Korean, and simplified/traditional Chinese). Do we need another value that encompasses all CJK ideographs?

HANGUL_JAMO/HANGUL_SYLLABLES

We need something for Hangul, but neither of these seems appropriate. Maybe just HANGUL.

IPA_EXTENSIONS

are there ipa keyboard layouts or input devices?

MATHEMATICAL_OPERATORS

are there math-specific keyboard layouts or input devices?

YI_SYLLABLES

We need something for Yi. Is this the right name?

From [UnicodeBlocks], only in V3.1:

Byzantine Musical Symbols and Musical Symbols

Do they need support? Are there input devices?

Script names [UnicodeScript] not yet covered (see comments above):

HAN
HANGUL
YI

Additional questions:

- Do we need other attribute values for digits (e.g. Devanagari, Thai,...) or a modifier value for digits?
- Do we need values for upper-case/lower-case, mixed-case (i.e. starting with upper-case letter, as many fields e.g. on Palm), or modifier values?
- What is the value for mixed Kanji/Kana?
- Do we need other values for compatibility, e.g. half-width hangul?

Values from various lists that have been excluded:

Excluded from [JavaUnicodeBlocks]:

ALPHABETIC_PRESENTATION_FORMS
ARABIC_PRESENTATION_FORMS_A
ARABIC_PRESENTATION_FORMS_B
ARROWS
BASIC_LATIN BLOCK_ELEMENTS
BOPOMOFO_EXTENDED
BOX_DRAWING
CJK_COMPATIBILIT
CJK_COMPATIBILITY_FORMS
CJK_COMPATIBILITY_IDEOGRAPHS
CJK_RADICALS_SUPPLEMENT
CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A
COMBINING_DIACRITICAL_MARKS
COMBINING_HALF_MARKS
COMBINING_MARKS_FOR_SYMBOLS
CONTROL_PICTURES CURRENCY_SYMBOLS
DINGBATS
ENCLOSED_ALPHANUMERICS
ENCLOSED_CJK_LETTERS_AND_MONTHS
GEOMETRIC_SHAPES
GREEK_EXTENDED
HALFWIDTH_AND_FULLWIDTH_FORMS
HANGUL_COMPATIBILITY_JAMO
IDEOGRAPHIC_DESCRIPTION_CHARACTERS
KANBUN
KANGXI_RADICALS
LATIN_1_SUPPLEMENT
LATIN_EXTENDED_A
LATIN_EXTENDED_ADDITIONAL
LATIN_EXTENDED_B
LETTERLIKE_SYMBOLS
MISCELLANEOUS_SYMBOLS
MISCELLANEOUS_TECHNICAL
NUMBER_FORMS
OPTICAL_CHARACTER_RECOGNITION
PRIVATE_USE_AREA
SMALL_FORM_VARIANTS
SPACING_MODIFIER_LETTERS
SPECIALS
SUPERSCRIPTS_AND_SUBSCRIPTS
SURROGATES_AREA
YI_RADICALS

Excluded from [UnicodeBlocks] and not in [JavaUnicodeBlocks]:

High Private Use Surrogates
High Surrogates
Low Surrogates
Mathematical Alphanumeric Symbols
CJK Unified Ideographs Extension B
CJK Compatibility Ideographs Supplement
Tags
Private Use