W3C

Requirements for the Ink Markup Language

W3C Note 22 January 2003

This version:
http://www.w3.org/TR/2003/NOTE-inkreqs-20030122/
Latest version:
http://www.w3.org/TR/inkreqs/
Previous version:
This is the first public version
Editors:
Yi-Min Chee, IBM ymchee@us.ibm.com
Sai Prasad, Intel sai.prasad@intel.com
Contributors:
See Acknowledgements

Abstract

This document describes requirements for the Ink Markup Language that will be used in the multimodal interaction framework as proposed by the W3C Multimodal Interaction Working Group. The Ink Markup Language will serve as the data format for representing ink entered with an electronic pen or stylus in a multimodal system. The markup will allow for the input and processing of handwriting, gestures, sketches, music and other notational languages in web-based multimodal applications. In the context of the W3C Multimodal Interaction Framework, the markup provides a common format for the exchange of ink data between components such as handwriting and gesture recognizers, signature verifiers, and other ink-aware modules.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. The latest status of this document series is maintained at the W3C.

W3C's Multimodal Interaction Activity is developing specifications for extending the Web to support multiple modes of interaction. One mode of interaction that is expected to play a role in many multimodal use cases is pen input. The requirements described in this document will be used to guide the development of a markup language for representing ink data captured by a pen-enabled multimodal system. The Ink Markup Language is intended to be deployed within the W3C Multimodal Interaction Framework as a means for exchanging and storing such information.

This document is a NOTE made available by the W3C for archival purposes, and is not expected to undergo frequent changes. Publication of this Note by W3C indicates no endorsement by W3C or the W3C Team, or any W3C Members. A list of current W3C technical reports and publications, including Recommendations, Working Drafts, and Notes can be found at http://www.w3.org/TR/.

This document has been produced as part of the W3C Multimodal Interaction Activity, following the procedures set out for the W3C Process. The authors of this document are members of the Multimodal Interaction Working Group (W3C Members only). This is a Royalty Free Working Group, as described in W3C's Current Patent Practice NOTE. Working Group participants are required to provide patent disclosures.

Please send comments about this document to the public mailing list: www-multimodal@w3.org (public archives). To subscribe, send an email to <www-multimodal-request@w3.org> with the word subscribe in the subject line (include the word unsubscribe if you want to unsubscribe).

Table of Contents

Introduction

The Ink Markup Language is the data format used to represent ink entered with an electronic pen or stylus in a multimodal system.

These requirements have been compiled based on review of the fundamental Multimodal Interaction Requirements and additional considerations pertaining to the role of the markup in pen-enabled systems. For each requirement that has been derived (in whole or in part) from fundamental Multimodal Interaction Requirements, the derivation is noted.

The Ink Markup will consist of primitive elements that represent low-level ink data information and application-specific elements that characterize meta information about the ink. Examples of primitive elements are device and screen context characteristics, and pen traces. Application-specific elements provide a higher level description of the ink data. For example, a segment tag could represent a group of ink traces that belong to a field in a form. Consequently, the requirements for the Ink Markup Language could fall in either of the two categories. This document does not attempt to classify requirements based on whether they are low-level or application specific.

The requirements are organized into six categories: General Requirements, Input Processing, Output Processing, Architectural, Mobility, and Multimodal Synchronization.

1. General Requirements

1.1 Application Scope

The Ink Markup is intended to enable a broad range of ink related applications.

(INK-G1): The Ink Markup MUST have a rich set of expressive features to directly support a range of digital ink capabilities as needed by the wide spectrum of multimodal applications targeted by the MMIWG.
   Derived from MMI-G1

Examples of pen-enabled multimodal applications can be found in the Multimodal Interaction Use Cases and an internal presentation (W3C Members only) to the working group.

Pen-enabled applications can be divided into broad application types depending upon the way in which the ink data is used.

(INK-G2): The Ink Markup MUST support both persistent ink documents and interactive applications (MUST specify).

Persistent ink documents include forms in a distributed forms processing application. Another example is a distributed recognition environment where ink is captured on a device and processed at a later time and place. In this case, user and device profile information might need to be included with the ink data at the time of capture, because the recognizer may not have access to it.

Electronic whiteboarding is an example of an interactive application.

1.2 Recognition Support

Recognition of digital ink is expected to be an important application of the Ink Markup.

(INK-G3): The Ink Markup MUST provide a mechanism to reference external resources and constraints that are common across recognition-based input modalities (MUST specify).
   Derived from MMI-G3, MMI-I16

Examples of resources are syntactic constraints such as grammars and lexicons. The markup must link to these resources to avoid duplication of effort in authoring.

(INK-G4): The Ink Markup MUST provide means to indicate the language that the handwritten ink represents (MUST specify).
   Derived from MMI-G8

In this context, the term language refers not only to human languages, but also to music, mathematical, and other notations.

This requirement ensures that the Ink Markup has multilingual support. The language information is important not only as input to handwriting recognizers but also to represent the results from the recognizer.

This language information might come from the user profile, application or device context, or it might be generated by a language identification module. It could be treated as an external resource (see INK-G2 above).

It is assumed that the Ink Markup will need to support annotations on ink made in different languages. For example, a particular ink document may contain handwritten ink that represents multiple languages. Even for a particular piece of ink data, it may be necessary to allow for annotations in multiple languages which may not be the same as the language of the handwritten ink itself. In addition, the character encoding of the ink document may not match the language represented by its ink data. For example, a document which is encoded in ISO646-US may contain ink data which represents handwritten katakana.

(INK-G5): The Ink Markup MUST allow for the representation of all information needed for the training of adaptive recognition systems.

In order for the Ink Markup to be widely adopted as the data representation for handwriting recognition systems, it must contain an application-specific module covering all elements required to support handwriting recognition training and development. For example, in training a handwriting recognition system, it is often necessary to label the ink data at various levels (e.g. corresponding to characters, words, and phrases) and to be able to affiliate traces or portions of traces with those labels.

1.3 Ease of Use and Adoption

(INK-G6): The Ink Markup MUST not require special hardware to implement (MUST specify).
   Derived from MMI-G11.

(INK-G7): The Ink Markup MUST be readable by developers/authors of multimodal applications and be able to be automatically generated by software (MUST specify).
   Derived from MMI-G10

Readable means we do not need a software decoder to make sense of the ink data. However, we need to weigh the benefits of readability against the increased storage/bandwidth that would be required in order to make each element of the markup human readable. For example, it might make sense at the trace level, but not at the granularity of individual points/samples.

(INK-G8): The Ink Markup MUST be aligned with the W3C specifications for security and privacy (MUST specify).
   Derived from MMI-G13.

Ink data often represents information that is sensitive, such as a signature. Similarly, the application of privacy issues to user profile data is relevant to the Ink Markup. However, at this time, we don't anticipate any specific measures within the Ink Markup itself for guaranteeing security and privacy of ink data.

(INK-G9): The Ink Markup SHOULD provide a mechanism for checking markup integrity (NICE to specify).

This refers to the integrity of the ink data itself, rather than well-formedness or validity of the Ink Markup, which can be verified by the use of readily available validation tools. One example of a simple integrity check would be an optional count attribute on items within an element, such as the number of points within a trace, which could be validated against the contents found between the trace's start-tag and end-tag.

2. Input Processing Requirements

2.1 Ink Capture

(INK-I1): The Ink Markup MUST allow for the capture of a wide variety of different data channels according to the needs of the target engine (i.e. processor) or engine type, and it must allow for extension into new channel definitions to support future devices and applications (MUST specify).
    Derived from MMI-I1

Typical data channels include x- and y-coordinates, pen tip force, and pen angle.

Examples of engine types include handwriting/gesture recognition and signature verification.

(INK-I2): The Ink Markup MUST allow for recording of information about user profile and ink capture that contributes to the performance of recognition algorithms used in handwriting and gesture recognition (MUST specify).
   Derived from MMI-G14, MMI-I1, MMI-I10, MMI-A5

User profile includes writer information such as handedness, age, style and skill. Device profile includes elements such as manufacturer, model, tablet resolution, sampling mode, and sampling rate.

(INK-I3): The Ink Markup MUST allow for the representation of information about the screen context in which the ink data was captured (MUST specify).
   See also INK-A5

Screen context includes such information as the relationship between the digitizer and the display device during pen input. This information can be important for applications such as handwriting and gesture recognition. For example, the display resolution may affect the size of the user's handwriting, and this information can be used during normalization of the ink data for recognition.

(INK-I4): The Ink Markup MUST provide a means to represent trace attributes that could influence interpretation (e.g. highlighting vs. strikethrough) (MUST specify).

(INK-I5): The Ink Markup SHOULD define mechanisms to allow clients to query device characteristics that may be important in determining how to adapt to platform capabilities, e.g., quality metrics (NICE to specify).
    Derived from MMI-I8, MMI-I17, MMI-A15

The Ink Markup should provide device and context information that can be used to decide how to prioritize ink input relative to other modalities (this could be in addition to a higher-level processor's judgment of the uncertainty of its semantic interpretation of the ink). The markup doesn't need to know about which other modalities it is being combined with, but it needs to provide the necessary information to the higher layers for making decisions. For example, the Ink Markup can provide device level information that can be cached by a driver or the layers above.

The multimodal system may also invoke different ink processors depending on how the ink was captured. For example, ink captured on low-resolution devices may be handled by a specific processor.

2.2 External Events

(INK-I6): The Ink Markup MUST provide a way to preserve relative positioning of external events with respect to the ink stream (MUST specify).

Time-stamps will allow Ink Markup data to be integrated with other types of logged data. Alternatively, the Ink Markup will support time-stamped comment fields, which may contain application-specific (even proprietary) information, that may be used to integrate references to external events directly into the ink event stream.

(INK-I7): The Ink Markup MUST allow for the representation of changes to screen context or to ink attributes that are the result of commonly occurring control events in pen applications (MUST specify).

An example of a control event which results in an ink attribute change is a user action (button press, voice command, or other means) which switches pen colors.

3. Output Processing Requirements

Since MMI is intended for design of interactions, the ink specification should consider the role of ink in the output portion of interactions as well as input.

Consideration of the output portion of interactions requires two elements: output of ink, and; synchronization of ink input/output with other output modalities. Aside from making ink visible as it is being written, ink will also be displayed later or in different places, and in some cases may be generated and used by the application as a user prompt (e.g. in a cursive writing tutorial).

However, the ink specification is intended primarily to support representation of captured ink. Output specifications generally are driven by many requirements that are distinct from input requirements. In particular, it will often be desirable to modify the attributes of ink prior to display, for example in color, line width, or even resolution. Adding the flexibility to handle a wide range of output attributes would add unnecessary complexity to the specification and minimum implementation. Accordingly, the Ink Markup will concentrate instead on those attributes that are required for accurate representation of ink capture, not display. It will include attributes for capture that may have use in display, such as color or force, but display is not the driving consideration.

It is possible that some display services, such as windowing systems or browsers, may implement support for display of Ink Markup directly. More generally, we anticipate the transcoding of the Ink Markup to other encodings that are more widely supported (e.g. image formats or SVG), for use on display devices that do not directly support the Ink Markup.

(INK-O1): The Ink Markup MUST be displayable on a wide variety of output platforms and must be convertible to a variety of different display formats (MUST specify).
   Derived from MMI-O8

It must also support transformation to alternative output media types: to text via handwriting recognition, to commands via gesture recognition etc.

(INK-O2): The Ink Markup MUST support the specification of how presentation of ink, when used as output media, could be adapted or styled on different devices (MUST specify).
    Derived from MMI-O9, MMI-A13

4. Architectural Requirements

4.1 Reuse

(INK-A1): The Ink Markup MUST reuse standard language specifications, where possible (MUST specify).
   Derived from MMI-A1, MMI-S12

One example is the use of CC/PP to encode device characteristics.

4.2 Modularity

(INK-A2): The Ink Markup MUST be defined in a modular fashion. Mandatory functionality should be encapsulated in a base module which is usable by other XML based languages (MUST specify).
   Derived from MMI-A2

(INK-A3): The Ink Markup MUST provide for modular extensions. It must specify mechanisms for defining and using extension modules for application specific functionality.
   Cross reference with INK-A2

4.3 Deployment

(INK-A4): The Ink Markup MUST support multiple levels of granularity of ink streaming for real-time input (MUST specify).

(INK-A5): The Ink Markup MUST provide information to enable 'consistent' input behavior and display across devices with differing capabilities, and across multiple sessions (MUST specify).
    Derived from MMI-A6, MMI-A12

A user may start to fill out a form on one device using ink and complete the form on a different device with a different display and digitizer. The specification must allow merging of ink data with previously entered data that may have been captured with different devices and contexts, and even in different modalities.

At the session level, the user may change modalities, e.g., from speech to pen, or from a PDA to a tablet PC. The UI should allow editing of previous entries even though modality or device characteristics are different. This influences the ink specification indirectly in that it requires sufficient device information to allow correct merging of new pen data with previously entered data.

(INK-A6): The Ink Markup MUST provide a mechanism to support changes in delivery context that impact ink characteristics or their interpretation (MUST specify).
   Derived from MMI-A4, MMI-S1

One way to support changes in the delivery context is to start a new ink session whenever a change is detected.

4.4 Integration

(INK-A7): The Ink Markup MUST allow other systems to identify subsections or portions of an ink document for annotation or markup purposes (MUST specify).

The purpose of this requirement is to provide reliable external references to an ink document. This is a MUST specify requirement for static documents. In addition, it would be desirable to have a mechanism for either preserving or invalidating references when document editing or modification occurs.

(INK-A8): The Ink Markup SHOULD provide mechanisms to facilitate random access and application specific grouping or structuring of ink traces (NICE to specify).

In this context, random access refers to the ability to navigate to a trace or group of traces within an ink document without undue processing overhead.

(INK-A9): The Ink Markup MUST allow other systems to embed ink documents within documents of other types (MUST specify).

The embedding capability should support anchoring, alignment, and registration of ink within other documents.

5. Mobility Requirements

(INK-R1): The Ink Markup MUST allow lightweight client implementations for thin mobile devices (MUST specify).
   Derived from MMI-R1

(INK-R2): The Ink Markup MUST provide a mechanism to allow for communications over low bandwidth network connections (MUST specify).
   Derived from MMI-G15, MMI-R2, MMI-S13.

This has an impact on how ink is transmitted over the network. Low bandwidth may require smaller transmission size of ink documents. Common mechanisms for achieving this goal include downsampling and compression.

6. Multimodal Synchronization Requirements

The Ink Markup must provide mechanisms to allow higher levels of an MMI system to coordinate the ink input with other aspects of the user interface dialog, including both input and output modalities. Consequently, the Ink Markup will preserve information about time, time delays, and granularity that is useful to the MMI manager in coordinating the elements of the user interface. This information should also enable manipulation of ink data by output interface components to synchronize rendering of ink data with other output data.

The temporal information of ink data may also be useful to other processes involved in interpretation of MMI interactions, if it can be resolved against temporal information from other aspects of the UI, allowing accurate reconstruction of the sequence of user prompts and responses.

(INK-S1): The Ink Markup MUST preserve and make available sufficient time information to support chronological positioning and sequencing, grouping and synchronization of data and events across different input modalities (MUST specify).
    Derived from MMI-G4, MMI-G5, MMI-G6 MMI-I2, MMI-I3,MMI-I7, MMI-I18, MMI-I19 MMI-O1, MMI-O2, MMI-O3, MMI-O4, MMI-O5, MMI-O6, MMI-O7 MMI-A17, MMI-A18.

Ink elements must be time-stamped to support this requirement. Ink elements include traces and groups of traces..

Acknowledgements

This document was jointly prepared by members of the pen input subgroup of the W3C Multimodal Interaction Working Group.

Special acknowledgements to Giovanni Seni (Motorola) and Greg Russell (IBM) for their contributions.

Appendices

Appendix A. Glossary

Accuracy: (1) Percentage of words correctly transcribed by a handwriting recognition engine; (2) Error bounds of a coordinate measurement, relative to a physical reference frame

Annotation: Elements in an Ink Markup file that describe meta-data, or semantic information, about the traces themselves (See ink annotation)

Application-specific elements: Provide higher-level description of the digital ink captured in the primitive elements

Bounding box: A minimal-sized rectangle that encloses a group of traces

Canvas: Widget or window in a graphical user interface where ink is drawn during ink capture

Capture: Digitally recording physical measurements of handwriting, typically using a stylus

Channel: A measurement recorded by the digitizer (e.g. coordinates, force, tilt) when capturing pen input.

Chunks: A group of pen traces

Compression: The coding of data to save storage space or transmission time

Device: See digitizer

Digital ink: An electronic representation of the pen movement, pressure, and other characteristics of handwritten input using a digitizing device

Digitizer: A hardware device capable of sensing the digital pen tip position. The digital pen can be a passive stylus containing no electronic components, or an active stylus containing electronic components (a.k.a. tablet).

Electronic ink: See digital ink

Events: An action, either human or machine generated; for example, page turn, pen up, or ink color change

Force: The pressure applied to a writing implement, typically measured in grams, ounces, or newtons

Gesture: Collection of ink traces that indicate a certain action to be performed

Ink: See digital ink

Ink annotation: A handwritten note or markup referencing (by proximity) another visible writing or printed matter

Ink attribute: A basic named value for an ink trace, such as color and width

Ink document: A collection of one or more pages containing ink traces

Ink label: A descriptive or identifying word or phrase accompanying some ink traces

Ink point: An element in the stream of data recorded by a real-time digitizer of handwriting; for example, a tuple <x, y, pressure, tilt>

Ink-enabled system: A system capable of recording digital ink data

Primitive elements: Set of rudimentary elements sufficient for all basic ink applications

Recognition grammar: Specification of words and patterns of words that a recognizer should expect when processing input ink

Resolution: The minimal change or difference in a measurement (coordinate, force, tilt) that a digitizer reports

Sampling rate: The frequency at which a digitizer reports coordinate (or other) information. Sampling rate is not always directly related to the bandwidth.

Screen context: The characteristics of the display area and the correspondence between the display area and the ink-capturing device.

Semantic: A contextual interpretation of handwriting, such as character, word, sentence, and paragraph

Session: (1) The span of time from a user beginning an interaction to ending the interaction with the system; (2) The data gathered during this span of time.

Signature verification: Confirmation that a presented signature is the same as the one on file (a.k.a. one-to-one matching)

Streaming: Continuously sending handwriting events over a communication channel

Stroke: Ink resulting from an elementary pen movement, such as bounded by two consecutive velocity extrema. A sequence of strokes constitutes a trace.

Tilt angle: The angle of the pen with respect to the writing surface, which is usually measured as angles of the projection onto x and y vertical planes

Trace: A complete pen-down movement bounded by two pen-up movements or a complete pen-up movement. A sequence of traces accumulates to meaningful units, such as characters and words.