B Input Modes

Editorial note  
The XForms Working Group invites feedback and comments on the open issues in this section.

Attribute inputMode provides a hint to the user agent to select an appropriate input mode for the text input expected in an associated form control. The input mode may be a keyboard configuration, an input method editor (also called front end processor) or any other setting affecting input on the device(s) used.

Upon entering an empty form control with an inputMode attribute, the user agent should set the configuration so that the characters indicated by the attribute value can be input easily. User agents may use information about the text already present to set the appropriate input mode when entering a form control that already contains text, or when moving around in such a form control.

User agents should not use the inputMode attribute to set the input mode when entering a form control with text already present. User agents should, however, recognize all the input modes which are supported by the (operating) system/device(s) they run on/have access to, and which are installed for regular use by the user. User agents are not required to recognize all of the attribute values--only those that they support. Unrecognized attribute values should be treated the same way as if the attribute were not present. Unrecognized attribute values must not result in an user agent error. Future versions of this specification may add new attribute values.

User agents may use information available in an XML Schema pattern facet to set the input mode. Note that a pattern facet is a hard restriction on the contents of a data item, and can specify different restrictions for different parts of the data item. inputMode is a soft hint about the kinds of characters that the user may most probably (start to) input into the form control. inputMode is provided in addition to pattern facets for the following reasons:

  1. The set of allowable characters specified in a pattern may be so wide that it is not possible to deduce a reasonable input mode setting. Nevertheless, there frequently is a kind of characters that will be input by the user with high probability. In such a case, inputMode allows to set the input mode for the user's convenience.

  2. In some cases, it would be possible to derive the input mode setting from the pattern because the set of characters allowed in the pattern closely corresponds to a set of characters covered by an inputMode attribute value. However, such a derivation would require a lot of data and calculations on the user agent.

  3. Small devices may leave the checking of patterns to the server, but will easily be able to switch to those input modes that they support. Being able to make data entry for the user easier is of particular importance on small devices.

B.1 List of Possible Input Modes

The list of allowed attribute values is divided into three sections: 1) included values, 2) questionable values, and 3) excluded values. The XForms Working Group invites feedback on narrowing down the final choices.

Where there are no comments, the values are the Unicode Block names (see http://www.unicode.org/Public/UNIDATA/Blocks.txt). The block names are upper-cased, and use underlines for spaces, so that they correspond to the values in the Java java.lang.Character.UnicodeBlock class (see http://java.sun.com/j2se/1.4/docs/api/java/lang/Character.UnicodeBlock.html). The version of The Unicode Standards that these block names are taken from is 3.1.

Editorial note  
Do we say anything about future Unicode versions?

Most block names selected as attribute values are equivalent to script names. Please also see UnicodeScripts (http://www.unicode.org/Public/UNIDATA/Scripts.txt) for further information. The block names have been chosen because they are more formally defined.

Block names containing words such as 'extended' or 'supplement' have not been included in the list of attribute values; The characters these blocks include are covered by the (non-'extended') attribute value with the same script name. [a list of excluded block names can be found below]

Block names from JavaUnicodeBlocks, corresponding to UnicodeBlocks. With the exception of 'UNIFIED_CANADIAN_ABORIGINAL_SYLLABICS' (with the script name CANADIAN-ABORIGINAL), these names all also appear in UnicodeScripts:

Input Mode Comments
ARABIC
ARMENIAN
BENGALI
BOPOMOFO
CHEROKEE
CYRILLIC
DEVANAGARI
ETHIOPIC
GEORGIAN
GREEK
GUJARATI
GURMUKHI
HEBREW
HIRAGANA
KANNADA
KATAKANA
KHMER
LAO
MALAYALAM
MONGOLIAN
MYANMAR
OGHAM
ORIYA
RUNIC
SINHALA
SYRIAC
TAMIL
TELUGU
THAANA
THAI
TIBETAN
CANADIAN_ABORIGINAL
OLD_ITALIC Additional block names not in JavaUnicodeBlocks (added to UnicodeBlocks for Unicode 3.1). These are identical to the names in UnicodeScripts (with the change of '-' to '_')
GOTHIC
DESERET
FULLWIDTH_DIGITS Constant for the fullwidth digits included in the Unicode halfwidth and fullwidth forms character block.
FULLWIDTH_LATIN Constant for the fullwidth ASCII variants subset of the Unicode halfwidth and fullwidth forms character block.
HALFWIDTH_KATAKANA Constant for the halfwidth katakana subset of the Unicode halfwidth and fullwidth forms character block.
HANJA Constant for all Han characters used in writing Korean, including a subset of the CJK unified ideographs as well as Korean Han characters defined in higher planes.
KANJI Constant for all Han characters used in writing Japanese, including a subset of the CJK unified ideographs as well as Japanese Han characters defined in higher planes.
LATIN Constant for all Latin characters, including the characters in the BASIC_LATIN, LATIN_1_SUPPLEMENT, LATIN_EXTENDED_A, LATIN_EXTENDED_B Unicode character blocks.
LATIN_DIGITS Constant for the digits included in the BASIC_LATIN Unicode character block.
SIMPLIFIED_HANZI Constant for all Han characters used in writing Simplified Chinese, including a subset of the CJK unified ideographs as well as Simplified Chinese Han characters defined in higher planes.
TRADITIONAL_HANZI Constant for all Han characters used in writing Traditional Chinese, including a subset of the CJK unified ideographs as well as Traditional Chinese Han characters defined in higher planes.

B.1.1 Questionable Values

Block names for which inclusion is *unclear* (with questions):

Input Mode Comments
BRAILLE_PATTERNS
Editorial note  
Are there devices/forms where it is useful to indicate that actual braille patterns are requested (in contrast to braille device input that is immediately converted to other characters)?
GENERAL_PUNCTUATION
Editorial note  
We probably need something to stand for symbol input modes on handhelds and mobiles. But this may not be the right name. Also, even if there is such an input mode, do we expect data entry to begin with one of these?
CJK_SYMBOLS_AND_PUNCTUATION
Editorial note  
Are there devices (e.g. mobiles) with separate input modes for punctuation in different scripts? Would such punctuation occur at the start of data entry? Should we use a modifier rather than a separate list of values for such punctuation modes?
CJK_UNIFIED_IDEOGRAPHS
Editorial note  
We have values for Kanji, Hanja, and simplified/traditional Hanzi (i.e. for ideograph input optimized for Japanese, Korean, and simplified/traditional Chinese). Do we need another value that encompasses all CJK ideographs?
HANGUL_JAMO
Editorial note  
We need something for Hangul, but neither of these seems appropriate. Maybe just HANGUL.
HANGUL_SYLLABLES
IPA_EXTENSIONS
Editorial note  
Are there ipa keyboard layouts or input devices?
MATHEMATICAL_OPERATORS
Editorial note  
Are there math-specific keyboard layouts or input devices?
YI_SYLLABLES
Editorial note  
We need something for Yi. Is this the right name?
Musical Symbols and Byzantine Musical Symbols
Editorial note  
Do they need support? Are there input devices?
Editorial note  
Script names not yet covered (see comments above): HAN, HANGUL, YI.
Editorial note  
Additional questions below:

- Do we need other attribute values for digits (e.g. Devanagari, Thai,...) or a modifier value for digits?
- Do we need values for upper-case/lower-case, mixed-case (i.e. starting with upper-case letter, e.g. on PalmPilot), or modifier values?
- What is the value for mixed Kanji/Kana?
- Do we need other values for compatibility, e.g. half-width hangul?

B.1.2 Excluded Values

Values found in JavaUnicodeBlocks that have been excluded:

Input Mode
ALPHABETIC_PRESENTATION_FORMS
ARABIC_PRESENTATION_FORMS_A
ARABIC_PRESENTATION_FORMS_B
ARROWS
BASIC_LATIN BLOCK_ELEMENTS
BOPOMOFO_EXTENDED
BOX_DRAWING
CJK_COMPATIBILIT
CJK_COMPATIBILITY_FORMS
CJK_COMPATIBILITY_IDEOGRAPHS
CJK_RADICALS_SUPPLEMENT
CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A
COMBINING_DIACRITICAL_MARKS
COMBINING_HALF_MARKS
COMBINING_MARKS_FOR_SYMBOLS
CONTROL_PICTURES
CURRENCY_SYMBOLS
DINGBATS
ENCLOSED_ALPHANUMERICS
ENCLOSED_CJK_LETTERS_AND_MONTHS
GEOMETRIC_SHAPES
GREEK_EXTENDED
HALFWIDTH_AND_FULLWIDTH_FORMS
HANGUL_COMPATIBILITY_JAMO
IDEOGRAPHIC_DESCRIPTION_CHARACTERS
KANBUN
KANGXI_RADICALS
LATIN_1_SUPPLEMENT
LATIN_EXTENDED_A
LATIN_EXTENDED_ADDITIONAL
LATIN_EXTENDED_B
LETTERLIKE_SYMBOLS
MISCELLANEOUS_SYMBOLS
MISCELLANEOUS_TECHNICAL
NUMBER_FORMS
OPTICAL_CHARACTER_RECOGNITION
PRIVATE_USE_AREA
SMALL_FORM_VARIANTS
SPACING_MODIFIER_LETTERS
SPECIALS
SUPERSCRIPTS_AND_SUBSCRIPTS
SURROGATES_AREA
YI_RADICALS

And additionally, excluded from UnicodeBlocks and not in JavaUnicodeBlocks:

High Private Use Surrogates
High Surrogates
Low Surrogates
Mathematical Alphanumeric Symbols
CJK Unified Ideographs Extension B
CJK Compatibility Ideographs Supplement
Tags
Private Use