Input Method Editor API

Abstract

This specification defines an “IME API” that provides Web applications with scripted access to an IME (input-method editor) associated with a hosting user agent. This IME API includes:

An InputMethodContext interface, which provides methods to retrieve detailed data from an in-progress IME composition.
A Composition dictionary, which represents read-only attributes about the current composition, such as the actual text and its style.

This API is designed to be used in conjunction with DOM events [DOM-LEVEL-3-EVENTS].

1. Introduction

This section is non-normative.

Even though existing Web-platform APIs allow developers to implement very complicated Web applications, such as visual chat applications or WYSIWYG presentation editors, developers have difficulties when implementing Web applications that involve input-method editors. To mitigate the difficulties, the DOM Level 3 Events specification[DOM-LEVEL-3-EVENTS] introduces composition events to retrieve composition text while it is being composed in an associated IME.

However, Web applications can still run into difficulties when they manipulate IMEs on non-editable elements such as the <canvas> element; those difficulties include the fact that a Web application cannot do the following:

indicate to the user whether the Web application renders composition text by itself, or needs to ask user agents to render it
determine the place where user agents render composition text
determine the place where user agents render candidate windows

To solve these IME-related problems, this specification introduces an IME API that allows Web applications to interact with the IME. This specification introduces interfaces for compositions, so Web applications can read detailed composition data. A Composition object provides a reference to an ongoing IME composition, so Web applications can retrieve the composition text and its attributes. In addition, this API also gives Web applications the ability to give a hint as to where to position a composition window.

There are also proposed standards for changing IME mode via CSS (ime-mode property in CSS3 UI) and controlling generic input modality (inputmode attribute), but they are independent of this API and they are for solving different issues.

Consider the following examples.

The first example shows the source for a Web application that renders composition text by itself on a canvas.

Example 1

<!DOCTYPE html>
<html>
<head>
<script language="javascript" type="text/javascript">
// Remembers cursor position.
var cursor_x = 0;
var cursor_y = 0;

// Remembers cursor position after current composition is committed.
var next_cursor_x = 0;

function init() {
    var node = document.getElementById(’canvas0’);
    // Associates an IME with this node.
    if (node.getInputContext && node.getInputContext().open()) {
        node.addEventListener(’compositionupdate’, onCompositionUpdate, false);
        node.addEventListener(’compositionend’, onCompositionEnd, false);
    }
}

function getStartAndLengthFromComposition(composition) {
    var start = 0;
    var length = 0;
    // For the brevity of this example, code to convert composition's Range
    // object to (start, length) pair is omitted.
    return { start: start, length: length };
}

function onCompositionUpdate(event) {
    var canvas = document.getElementById(’canvas0’);
    var context = canvas.getContext(’2d’);
    var inputContext = canvas.getInputContext();
    var composition = inputContext.composition;
    var text = composition.text.textContent;

    // Position of drawing text.  Assuming font height is 16px.
    var x = cursor_x;
    var y = cursor_y;

    // Clear the drawing area including the 5px underline area.
    // Note: For brevity, a case that text is shortened (e.g. as a result of
    //       typeing backspace) is omitted.
    var width = context.measureText(text).width;
    next_cursor_x = cursor_x + width;
    context.fillStyle = ’white’;
    context.fillRect(cursor_x, cursor_y, width, 16 + 5);

    // Use { start, length } for simplicity.
    var range = getStartAndLengthFromComposition(composition);

    // Render the clauses in the composition.
    // Note: This code assumes LTR text.
    var texts = [ text.substring(0, range.start);
                  text.substring(range.start, range.length);
                  text.substring(range.start + range.length,
                                 text.length - (range.start + range.length)) ];

    // Set font drawing style.
    context.textBaseline = ’top’;
    context.font = ”16px sans-serif”;
    context.fillStyle = ’black’;

    for (var i = 0; i < texts.length; i++) {
      var metrics = context.measureText(texts[i]);
      // Draw the text of this clause.
      context.fillText(texts[i], x, y);
      // Draw an underline under the text. For brevity, this code draws a bold
      // underline for the selected clause, or a thin underline for others.
      if (i == 1) { // Selected clause.
          context.fillRect(x + 1, y + 16 + 1, metrics.width - 1, 2);
          // Let the IME know where the caret is.
          inputContext.setCaretRectangle(canvas, x, y, metrics.width, 16 + 5);
      } else {
          context.fillRect(x + 1, y + 16 + 1, metrics.width - 1, 1);
      }
      x += metrics.width;
    }

    // Prevent the browser from drawing composition text.
    event.preventDefault();
}

function onCompositionEnd(event) {
    cursor_x = next_cursor_x;
    // Line breaks are not handled yet.
}
</script>
</head>
<body>
<canvas id=”canvas0” width=”640” height=”480”></canvas>
</body>
</html>

The next example shows the source which gives a hint for an IME where the application wants it to avoid placing UI elements. A simple web search page which gives a user suggestions while the user is doing composition.

Example 2

<!DOCTYPE html>
<html>
<head>
<style type=”text/css”>
#search0 {
    max-width: 400px;
}

#input0 {
    width: 100%;
}

#suggest0 {
    width: 100%;
    list-style: none;
    margin: 0;
    padding: 0;
    border-style: solid;
    border-width: 1px;
    border-color: #000;
}

.suggest {
    margin: 0;
    padding: 0;
}
</style>
<script language="javascript" type="text/javascript">
function init() {
    var node = document.getElementById(’input0’);
    // Associate an IME with this node.
    node.getInputContext().enabled = true;
    // This code only handles the compositionupdate event for brevity of the
    // example, but of course other input field changes should also be handled.
    node.addEventListener(’compositionupdate’, onCompositionUpdate, false);
}

// Sends an XHR request to get search suggestions.
// Upon receiving the result, expandSuggest() is called back.
function getSuggests(query) {
    // For brevity, implementation of this function is omitted.
}

function expandSuggest(candidates) {
    // Callback after getting search suggestions.
    var suggest = document.getElementById(’suggest0’);
    var i;
    for (i = 0; i < suggest.childNodes.length; i++) {
        suggest.removeChild(suggest.childNodes[0]);
    }
    for (i = 0; i < candidates.length; i++) {
        suggest.appendChild(document.createElement("li"));
        suggest.childNodes[i].textContent = candidates[i];
    }
    // Set exclusion area hint for the IME.
    var input = document.getElementById(’input0’);
    var context = input.getInputContext();
    var relative_x = input.offsetLeft - suggest.offsetLeft;
    var relative_y = input.offsetTop - suggest.offsetTop;
    context.setExclusionRectangle(input, relative_x, relative_y,
                                         input.offsetWidth, input.offsetHeight);
}

function onCompositionUpdate(event) {
    var query = document.getElementById(’input0’).value;
    getSuggests(query);
}
</script>
</head>
<body>
<div id=”search0”>
  <input type=”text” id=”input0” placeholder=”search here”>
  <ul id=”suggest0”></ul>
</div>
</body>
</html>

Please also refer to the separate use cases for Input Method Editor API document.

2. Background: What’s an Input Method Editor?

This section is non-normative.

An IME (input-method editor) is an application that allows a standard keyboard (such as a US-101 keyboard) to be used to type characters and symbols that are not directly represented on the keyboard itself. In China, Japan, and Korea, IMEs are used ubiquitously to enable standard keyboards to be employed to type the very large number of characters required for writing in Chinese, Japanese, and Korean.

On platforms with touch-based input device such as mobile phones, an IME also plays a role to type text that a simple on-screen keyboard cannot type directly.

A system IME is an IME already installed on a user's system.

An IME consists of two modules; a composer and a converter.

A composer is a context-free parser that composes non-ASCII characters (including phonetic characters) from keystrokes, e.g. Hiragana or Pinyin.

A converter is a context-sensitive parser that looks up a dictionary to convert phonetic characters to a set of ideographic characters, e.g. Kanji.

An IME clause is a grammatical word produced in an IME.

An IME selected clause is an IME clause currently being converted by an IME.

An IME composition is an instance of text produced in an IME. For IMEs that can produce multiple words, an IME composition consists of multiple IME clauses. For IMEs that produce only one word, an IME composition is equal to an IME clause.

When an IME receives keystrokes, it sends the keystrokes to a composer and receives phonetic characters matching to the keystrokes. When an IME receives phonetic characters from a composer, it sends the phonetic characters to a converter and receives the list of ideographic characters matching to the phonetic characters. The following figure shows the basic structure of an IME.

2.1 Composer

This section is non-normative.

A composer consists of two types of composers: a phonetic composer and a radical composer.

A phonetic composer composes a phonetic character from its ASCII representation.

A radical composer composes a phonetic character from phonetic radicals.

A phonetic radical is a character component of a Latin character, a Chinese character, or a Korean character. A Latin character can consist of an ASCII character and accent marks, e.g. the character ‘á’ consists of the ASCII character ‘a’ and the accent mark ‘´’. A Chinese character can consist of Chinese character components that refer to its semantic origins, e.g. the Chinese character ‘略’ consists of two components ‘田’ and ‘各’. A Korean character consists of Korean character components that represent consonants or vowels, e.g. the Korean character ‘가’ consists of the consonant ‘ㄱ’ and the vowel ‘ㅏ’.

A composition window is a window that shows ASCII characters being composed by phonetic composers or phonetic radicals being composed by radical composers.

An IME usually shows the text being composed by a composer with its own style to distinguish it from the existing text. Even though most of composers output phonetic characters, some composers (such as Bopomofo composers) output a placeholder character instead of phonetic characters while composing text.

Issue 1

need to define composition window

Issue 2

probably should define radical

Issue 3

probably should define clause here too

2.1.1 Phonetic Composer

This section is non-normative.

Phonetic composers are not only used for typing Simplified Chinese and Japanese, but also used for typing non-ASCII characters (such as mathematical symbols, Yi, Amharic, etc.) with a US-101 keyboard. Each of these languages has a mapping table from its character to a sequence of ASCII characters representing its pronunciation: e.g., ‘か’ to ‘ka’ in Japanese, and; ‘卡’ to ‘ka’ in Simplified Chinese. This mapping table is called as Romaji for Japanese and Pinyin for Simplified Chinese, respectively. A phonetic composer uses these mapping tables to compose a phonetic character from a sequence of ASCII characters produced by a US keyboard.

An example of a phonetic composer for Simplified Chinese outputs the ASCII characters that were input by the user, as its composition text.

Fig. 2 Composition text (Simplified Chinese)

On the other hand, a typical phonetic composer for Japanese outputs phonetic characters when the typed ASCII characters have corresponding phonetic characters.

An example of a phonetic composer for mathematical symbols outputs composed mathematical symbol and shows the source keystrokes in its own window, which is an example of a composition window.

2.1.2 Radical Composer

Radical composers are mainly used for typing Traditional Chinese and Korean with phonetic keyboards. Each phonetic keyboard of these languages can produce phonetic radicals: e.g., typing ‘r’ produces ‘ㄱ’ on a Korean keyboard; typing ‘o’ produces ‘人’ on a Traditional-Chinese (or Bopomofo) keyboard, etc. A radical composer composes a phonetic character from phonetic radicals given by these keyboards: e.g., typing ‘ㄱ’ (r) and ‘ㅏ’ (k) produces ‘가’ on a Korean keyboard; typing ‘人’ (o), ‘弓’ (n), and ‘火’ (f) produces ‘你’ on a Traditional-Chinese keyboard, etc.

A radical composer for Korean outputs the phonetic radicals as its composition text.

A radical composer for Traditional Chinese outputs a placeholder character (U+3000) and shows the phonetic radicals being composed to its own window. This window is an example of a composition window.

Fig. 6 Radical composer (Traditional Chinese)

Some platforms (such as Mac and Linux) use radical composers for typing accented characters used in European countries. For example, typing ‘ ̈ ’ (option+u) and ‘a’ (a) produces ‘ä’ on US keyboards of Mac.

2.1.3 On-Screen Keyboard

On touch-based platforms without hardware keyboard like mobile phone or tablet platforms, some kind of on-screen keyboard is displayed to help a user typing text, which occupies some part of the screen. A user uses this keyboard to type composition text.

Fig. 8 An example of an on-screen keyboard (English)

The layout of an on-screen keyboard may vary depending on language or its input modality (e.g. a telephone number input field requires number buttons only).

Fig. 9 An example of an on-screen keyboard (Japanese)

2.2 Converter

A converter is a context-sensitive parser used for replacing the outputs of a composer to ideographic characters on Chinese, Japanese, and Korean.

Note

Korean seldom uses ideographic characters.

Because Chinese, Japanese, and Korean have many homonyms, each sequence of phonetic characters usually matches many ideographic characters: e.g., a Japanese phonetic character ‘か’ matches Japanese ideographic characters ‘化’, ‘科’, ‘課’, etc.; Pinyin characters ‘ka’ matches Simplified-Chinese ideographic characters ‘卡’, ‘喀’, ‘咯’, etc.; Bopomofo characters ‘人弓’ matches Traditional-Chinese ideographic characters ‘乞’, ‘亿’, ‘亇’, etc.

A converter looks up a dictionary and shows a list of candidates of possible ideographic characters so a user can choose one. This list is known as a candidate list. A candidate list is known as a candidate window when it has its own window.

Some Japanese IMEs show annotations in its candidate window for a character that is not so easy to distinguish from other characters (such as full-width alphabets, full-width Katakanas, and half-width Katakanas, etc.), as shown in the following figure.

The next figure shows a candidate window of a Simplified-Chinese IME.

Fig. 11 Candidate window (Simplified Chinese)

And the next figure shows a candidate window of a Traditional-Chinese IME.

Fig. 12 Candidate window (Traditional Chinese)

Some techniques are used to improve conversion quality. For example, a converter integrates an MRU (Most-Recently Used) list. Even though there are many ideographic characters for each phonetic character (or phonetic radical), a user does not usually use all these ideographic characters. A converter uses an MRU list to filter out ideographic characters not used so often from a candidate list. Another example is a grammar parser. A converter that integrates a grammar parser splits the given phonetic characters into grammatical clauses and converts only one clause at a time. When a sequence of phonetic characters consists of n clauses and the i-th clause has m_i candidates, the total number of the candidates for the input characters becomes (m_1 * m_2 * … * m_n). To reduce the number of candidates owned by a converter, a converter usually processes one clause at a time. This clause is called the selected clause.

An IME usually renders a selected clause with a special style to distinguish it from other clauses, as shown in the following figure.

When a converter converts two or more clauses, it chooses candidates for the selected clause so it becomes grammatically consistent with the surrounding clauses: e.g., Japanese converters usually output ‘危機一髪’ (not ‘危機一発’) for Japanese phonetic characters ‘ききいっぱつ’ because ‘危機一発’ is grammatically incorrect.

On a mobile platform, candidates may not appear in a separate window, but occupies some part of the screen for the user to choose the candidate word that they intend.

Fig. 14 Composition on mobile platform (Japanese)

Fig. 15 Composition on mobile platform (Japanese)

7. The InputMethodContext Interface

interface InputMethodContext {
    readonly    attribute Composition composition;
                attribute boolean     enabled;
    readonly    attribute DOMString   locale;
    void    confirmComposition ();
    void    setCaretRectangle (Node anchor, long x, long y, long w, long h);
    void    setExclusionRectangle (Node anchor, long x, long y, long w, long h);
    boolean open ();
};

7.1 Attributes

composition of type Composition, readonly

Represents the detailed information of the ongoing IME composition. When an IME is not composing text, this value MUST be null.

When assigned, updates the composition information of the hosting user agent.

enabled of type boolean,

Controls the state of the IME associated with this context.

The value represents whether a user agent activates this IME when the given node gains the input focus.

When this attribute is set to be true, a user agent activates this IME when the given node gains the input focus and sends composition events to the given node even though the node is not an editable one, such as a <canvas> element.

locale of type DOMString, readonly

Represents the locale of the current input method as a BCP-47 tag (e.g. "en-US"). The locale MAY be the empty string when inapplicable or unknown.

7.2 Methods

confirmComposition

Finishes the ongoing composition of the hosting user agent.

When a Web application calls this function, a user agent sends a compositionend event and a textInput event as a user types an ‘Accept’ key as written in “Input Method Editors” section the DOM Level 3 Events specification [DOM-LEVEL-3-EVENTS].

Note

This function is just copied from WebKit, to solicit opinions from developers.

No parameters.

Return type: void

open

Requests the hosting user agent to associate the context with an IME. This function returns true if a user agent can associate its IME with the context.

No parameters.

Return type: boolean

setCaretRectangle

Notifies the rectangle of composition text to a user agent. When a user agent renders a candidate window or a composition window, it uses this rectangle to prevent these windows from being rendered on this rectangle.

On Windows, this rectangle is used as a parameter for ImmSetCandidateWindow(). On Mac, this rectangle is sent when it calls [firstRectForCharacterRange:]. On Linux (GTK), this rectangle is used as a parameter for gtk_im_context_set_cursor_location().

The anchor parameter represents the DOM node against which the rectangle is positioned. This MAY be a different node than the node that listens to composition events which has focus and therefore a Web application can draw composition text where it does not have focus.
The x, and y are the offsets to the top-left of the rectangle relative to anchor node's top-left.
The w, and h are width and height of the rectangle.

A user agent MAY need to convert these coordinates to the screen coordinates when it shows a candidate window.

Parameter	Type	Nullable	Optional
anchor	`Node`	✘	✘
x	`long`	✘	✘
y	`long`	✘	✘
w	`long`	✘	✘
h	`long`	✘	✘

Return type: void

setExclusionRectangle

Gives a hint for a user agent to avoid showing any input related UI elements (e.g. on-screen keyboard, a candidate window) on the given rectangle, for an application to show some input-related UI elements (such as search suggestions) in the rectangle.

A user agent MAY use this hint to explicitly control the position for a candidate window, or determine zoom level and view port when on-screen keyboard comes in.

The anchor parameter represents the DOM node against which the rectangle is positioned.
The x, and y are the offsets to the top-left of the rectangle relative to anchor node's top-left.
The w, and h are width and height of the rectangle.

Parameter	Type	Nullable	Optional
anchor	`Node`	✘	✘
x	`long`	✘	✘
y	`long`	✘	✘
w	`long`	✘	✘
h	`long`	✘	✘

Return type: void

8. Best Practices

This section is non-normative.

This specification provides an interface for developing IME-aware Web applications.

This section describes practices for some use-cases.

8.1 Drawing Composition Text

This section is non-normative.

If a Web application wants to draw composition text by itself, it SHOULD handle the compositionupdate event to get notified from the IME that the composition text has been changed, and then use the interface described in this document to retrieve composition and let the IME know where the composition text is drawn by calling setCaretRectangle() method. If setCaretRectangle() is not called, IME will not have information about where to show IME UIs, and it may show UIs at an obtrusive position. To avoid this situation, a user agent may set some decent default position in the vicinity of the focused input field. Optionally it MAY call setExclusionRectangle() method to hint to the IME that a certain area is not suitable for showing the IME UI.

When a Web application draws composition text, it MUST call preventDefault() in compositionupdate handler so that the user agent will not draw the text.

When a Web application wants to handle DOM level3 composition events on a non-<input>, <textarea>, or contenteditable node, it MUST enable the IME via getInputContext().open() to associate its node with composition events.

The following diagram shows the flow of events among the keyboard, the IME, the user agent, and the Web application when a user types ‘kyouha’ to convert to ‘今日は’.

Fig. 16 Event flow of IME and an IME-aware Web application.

Input Method Editor API

W3C Working Draft 04 April 2013

Abstract

Status of This Document

Table of Contents

1. Introduction

2. Background: What’s an Input Method Editor?

2.1 Composer

2.1.1 Phonetic Composer

2.1.2 Radical Composer

2.1.3 On-Screen Keyboard

2.2 Converter

3. Conformance

4. Terminology and Algorithms

5. The getInputContext() method

5.1 Methods

6. The Composition Dictionary

6.1 Dictionary `Composition` Members

7. The InputMethodContext Interface

7.1 Attributes

7.2 Methods

8. Best Practices

8.1 Drawing Composition Text

A. References

A.1 Normative references

Input Method Editor API

W3C Working Draft 04 April 2013

Abstract

Status of This Document

Table of Contents

1. Introduction

2. Background: What’s an Input Method Editor?

2.1 Composer

2.1.1 Phonetic Composer

2.1.2 Radical Composer

2.1.3 On-Screen Keyboard

2.2 Converter

3. Conformance

4. Terminology and Algorithms

5. The getInputContext() method

5.1 Methods

6. The Composition Dictionary

6.1 Dictionary Composition Members

7. The InputMethodContext Interface

7.1 Attributes

7.2 Methods

8. Best Practices

8.1 Drawing Composition Text

A. References

A.1 Normative references

6.1 Dictionary `Composition` Members