W3C

EMMA 1.0: Last Call Disposition of Comments

28 November, 2007

Editor(s):
Michael Johnston, AT&T
Deborah Dahl, W3C Invited Expert

Abstract

This document details the responses made by the Multimodal Interaction Working Group to issues raised during the first Last Call (beginning 29 September 2005 and ending 28 October 2005) and the second Last Call (beginning 9 April, 2007 and ending 30 April, 2007). Comments were provided by other W3C Working Groups and the public via the www-multimodal-request@w3.org (archive) mailing list.

Status

This document of the W3C's Multimodal Interaction Working Group describes the disposition of comments as of 27 September, 2007 on the first and second Last Call Working Drafts of Extensible Multimodal Annotation (EMMA) Version 1.0. It may be updated, replaced or rendered obsolete by other W3C documents at any time.

For background on this work, please see the Multimodal InteractionActivity Statement.

Table of Contents


1. Introduction

This document describes the disposition of comments in relation to Extensible Multimodal Annotation (EMMA) Version 1.0 (http://www.w3.org/TR/emma/). The goal is to allow the readers to understand the background behind the modifications made to the specification. In the meantime it provides an useful check point for the people who submitted comments to evaluate the resolutions applied by the W3C's Multimodal Interaction Working Group.
In this document each issue is described by the name of the commentator, a description of the issue, and either the resolution or the reason that the issue was not resolved. For some of the issues the status is "Waiting Response", because the Working Group didn't received a formal acceptance/denial or the acceptance was pending to the resolution applied.

This document provides the analysis of the issues that were submitted and resolved as part of the Last Call Review periods.

2. Summary

ItemCommentatorNatureDisposition
WAI-PF-1Al Gilman (2005-12-14)Clarification / Typo / Editorial Accepted
WAI-PF-2.1Al Gilman (2005-12-14)Clarification / Typo / Editorial Accepted
WAI-PF-2.2Al Gilman (2005-12-14)Clarification / Typo / Editorial Accepted
WAI-PF-3Al Gilman (2005-12-14)Clarification / Typo / Editorial Accepted
WAI-PF-4Al Gilman (2005-12-14)Clarification / Typo / Editorial Accepted
WAI-PF5Al Gilman (2005-12-14)Clarification / Typo / Editorial Accepted
WAI-PF-6Al Gilman (2005-12-14)Clarification / Typo / Editorial Accepted
WAI-PF-7aAl Gilman (2005-12-14)Clarification / Typo / Editorial Accepted
WA-PF-7bAl Gilman (2005-12-14)Clarification / Typo / Editorial Accepted
WAI-PF-7cAl Gilman (2005-12-14)Clarification / Typo / Editorial Accepted
WAI-PF7dAl Gilman (2005-12-14)Clarification / Typo / Editorial Accepted
WAI-PF7eAl Gilman (2005-12-14)Clarification / Typo / Editorial Accepted
i18N-1Felix Sasaki (2005-10-26)Technical ErrorAccepted
i18N-2Felix Sasaki (2005-10-26)Technical ErrorAccepted
i18N-3Felix Sasaki (2005-10-26)Technical ErrorAccepted
i18N-4Felix Sasaki (2005-10-26)Technical ErrorAccepted
i18N-5Felix Sasaki (2005-10-26)Clarification / Typo / Editorial Accepted
i18N-6Felix Sasaki (2005-10-26)Clarification / Typo / Editorial Withdrawn
i18N-7Felix Sasaki (2005-10-26)Technical ErrorAccepted
i18N-8Felix Sasaki (2005-10-26)Technical ErrorAccepted
i18N-9Felix Sasaki (2005-10-26)Feature RequestAccepted
i18N-10Felix Sasaki (2005-10-26)Technical ErrorAccepted
i18N-11Felix Sasaki (2005-10-26)Technical ErrorAccepted
i18N-12Felix Sasaki (2005-10-26)Technical ErrorAccepted
i18N-13Felix Sasaki (2005-10-26)Feature RequestAccepted
i18N-2-1Richard Ishida (2007-05-02)Feature RequestAccepted
i18N-2-2Richard Ishida (2007-05-02)Change to Existing FeatureAccepted
i18N-2-3Richard Ishida (2007-05-02)Change to Existing FeatureAccepted
i18N-2-4Richard Ishida (2007-05-02)Feature RequestAccepted
i18N-2-5Richard Ishida (2007-05-02)Change to Existing FeatureAccepted
i18N-2-6Richard Ishida (2007-05-02)Technical ErrorAccepted
i18N-2-7Richard Ishida (2007-05-02)Technical ErrorAccepted
SW-1Jin Liu (2006-10-2)Feature RequestAccepted
SW-2Jin Liu (2006-10-2)Feature RequestAccepted
SW-3Jin Liu (2006-10-2)Feature RequestAccepted
SW-4Jin Liu (2006-10-2)Feature RequestAccepted
VB-A1Paolo Baggia (2006-04-03)Clarification / Typo / Editorial Accepted
VB-A1.1Paolo Baggia (2006-04-03)Change to Existing FeatureAccepted
VB-A1.2Paolo Baggia (2006-04-03)Clarification / Typo / Editorial Accepted
VB-A2Paolo Baggia (2006-04-03)Clarification / Typo / Editorial Accepted
VB-A3Paolo Baggia (2006-04-03)Clarification / Typo / Editorial Accepted
VB-A4Paolo Baggia (2006-04-03)Clarification / Typo / Editorial Accepted
VB-A5Paolo Baggia (2006-04-03)Change to Existing FeatureAccepted
VB-BPaolo Baggia (2006-04-03)Clarification / Typo / Editorial Accepted
Public-01Paolo Martini (2007-04-25)Change to Existing FeatureAccepted
ITS-01Christian Lieske (2007-05-03)Feature RequestAccepted
ITS-02Christian Lieske (2007-05-03)Feature RequestAccepted

2.1 Clarifications, Typographical, and Other Editorial

Issue WAI-PF-1

From Al Gilman (2005-12-14):

1. We are concerned that in an approach that focuses on input and output modalities that are "widely used today" Assistive Technology devices might be left out in practice. Although theoretically it seems to be possible to apply EMMA to all types of input and output devices (modalities), including Assistive Technology, the important question is "Who is going to write the device-specific code for Assistive Technology devices?"
If this is outside the scope of EMMA, please let us know who we should address with this question.

Resolution: Rejected

We share the concern of the WAI group as to whether the introduction of new protocols such as EMMA could adversely impact assistive technology, and the EMMA subgroup have discussed this in some detail in response to your feedback.
EMMA is a markup for the representation and annotation of user inputs and is intended to enable support for modalities beyond keyboard and mouse such as speech and pen. As such EMMA can play an important role in enabling the representation of user inputs from assistive technology devices. The EMMA group would greatly welcome your feedback on classifications on different kinds of assistive devices that could be used as values of emma:mode.
The broader issue concerns providing support for assistive technologies while the minimizing the burden on application developers building multimodal applications. We see three ways in which assistive devices may operate with multimodal applications:
1. The application developer building the interaction manager (IM) for the multimodal application builds it specifically with support for particular assistive devices. The IM might for example use different timeouts or break up the dialog differently depending on the kind of assistive device in use. In this case the assistive technology will produce EMMA representation of the user input, annotated to indicate the kind of device it is from, and IM will have specific dialog/interaction logic for that device.
2. The application developer does not directly provide support for the assistive devices but the developer of the assistive technology provides EMMA as a representation of the input on the assistive device. For example, for an application with speech input, the assistive technology would generate EMMA for the assistive device that looks like a sequence of words from speech recognition.
3. The third case is more like what we believe is prevalent today and likely (unfortunately) to remain the case for most devices where the assistive technology, generally at an operating system level, serves as an emulator of the keyboard and/or mouse. In this case, the only way to ensure that multimodal applications also support assistive devices is to establish best practices for multimodal application design. One principle would be that in any case where the interaction manager expects a verbal input, be it from speech or handwriting recognition it will also accept input from the keyboard. Another would be that if commands can be issued in one mode e.g. gui they can also be issue in the other e.g. speech (symmetry among the modes).
Since EMMA does not provide an authoring language for interaction management or authoring of applications this lies outwith the scope of the EMMA specification itself. Within the MMI group this relates most closely to the multimodal architecture work and work on interaction management.
The EMMA subgroup are starting to compile a list of best practices for authoring applications that consume EMMA but see this as better suited to a separate best practices Note rather than as part of the EMMA specification.

Email Trail:

Issue WAI-PF-2.1

From Al Gilman (2005-12-14):

System and Environment
Composite input should provide environmental information. Since input is used to define a response, the system response should take into account environmental conditions that should be captured at input time. Here are some examples:
Signal to Noise Ratio (SNR)
Lighting conditions
Power changes (may throw out input or prompt user to re-enter information)
In the case of a low SNR you might want to change the volume, pitch, or if the system provides it - captioning. Sustained SNR issues may result in noise cancellation to improve voice recognition. This should be included with EMMA structural elements. Some of these issues could be reflected in confidence but the confidence factor provides no information as to why the confidence level is low and how to adapt the system.

Resolution: Rejected

System and environment issues were initially addressed within the MMI working group and includes the kinds of information described above along with other factors such as the location of the device. That work is now called DCCI (Delivery Context Interfaces) and is has moved to the Ubiquitous Web Applications working group:
http://www.w3.org/TR/2005/WD-DPF-20051111/
In the Multimodal architecture work within the MMI group, DCI (previously DPF) is accessed directly from the interaction manager, rather than through the annotation of EMMA inputs.
http://www.w3.org/TR/mmi-arch/
We believe it is important for system and environment information to be accessed directly through DCI from the IM because the interaction should be able to adjust whether the user provides an input or not (EMMA is only going to arrive to the IM when the user makes an input). For example, if the interaction manager will adapt and use visual prompts rather than audio when the SNR is beneath a threshold. This adaption should occur regardless of whether the user has produced a spoken input or not.
One possible reason for attachment of DCCI information to EMMA documents would be for logging what the conditions were when a particular input was received. For this case, the emma:info element can be used as a container for an xml serialization of system and environment information accessed through the DCCI.

Email Trail:

Issue WAI-PF-2.2

From Al Gilman (2005-12-14):

User factors
How does the Emma group plan to address user capabilities. ... At the Emma input level or somewhere else in the system? Example: I may have a hearing impairment changing the situation for me over another person. If multiple people are accessing a system it may be important to address the user and their specific capabilities for adaptive response.

Resolution: Rejected

Along with system and environment factors, and device factors, user preferences for e.g. choice of mode, volume level etc are intended to be accessed using the DCI:
http://www.w3.org/TR/2005/WD-DPF-20051111/
The preferences for a specific user should be queried based on the user's id from the DCCI and then those preferences used by the interaction manager to adapt the interaction. The EMMA group discussed the possibility of having an explicit user-id annotation and EMMA and concluded that this information is frequently provided explicitly by the user as an input and therefore is application data and so should not be standardized in EMMA. Typically user ids will come from entering a value in a form and this will be submitted as a user input. This will either be done directly from XHTML or perhaps in some cases enclosed in an EMMA message (e.g. if the user id is specified by voice). The id may also come from a cookie, or be determined based on the user's phone number of other more detailed info from a mobile provider. In all of these cases, the user id (and other information such as authentication) is not an annotation of a user input. A user id may be transmitted as the payload of a piece of EMMA markup, as application data inside emma:interpretation but will not be encoded as an emma annotation.
Again for logging purposes, the user id or information describing the user could be stored within emma:info

Email Trail:

Issue WAI-PF-3

From Al Gilman (2005-12-14):

Settling time
How does this technology address settling time and multiple keys being hit. People with mobility impairments may push more than one key, inadvertently hit specific keys, or experience tremors whereby it needs to be smoothed. This may or may not effect confidence factors but again the "why" question comes up. This information may need to be processed in the drivers.

Resolution: Rejected

The issue appears to be at a different level from EMMA. In many cases this will be a matter of the driver used for the keyboard input device. In the case where keyboard input is used to fill a field in a form, and then it is sent when the user hits return or a SEND/GO button then any editing or correction takes place before the input is sent and the interaction manager would only see the final string. If there is a more complex direct interface from the keystrokes to the interaction manager (each keystroke being sent individually) then details regarding the nature of the keyboard input could be encoded in the application semantics.

Email Trail:

Issue WAI-PF-4

From Al Gilman (2005-12-14):

Directional information
Should we have an emma:directional information? Examples are right, left, up, down, end, top, north, south, east, west, next, previous. These could be used to navigate a menu with arrow keys, voice reco, etc. They could be used to navigate a map also. This addresses device independence. This helps with intent-based events.
We should include into and out of to address navigation up and down the hierarchy of a document as in DAISY. The device used to generate this information should be irrelevant. Start, Stop, reduce speed, may also be an addition. These higher levels of navigation may be used to control a media player independent of the device.

Resolution: Rejected

Specific intents such as up down left right etc are part of the application semantics and so are not standardized as part of EMMA. EMMA provides containers for the representation of intents and a way to specify various kinds of annotations on those intents but it is outwith the scope of EMMA to standardize the semantic representation of user intents.

Email Trail:

Issue WAI-PF5

From Al Gilman (2005-12-14):

Zoom: What about Zoom out?

Resolution: Accepted

In order to clarify the example we will change the speech from 'zoom' to 'zoom in'. Zoom out is of course another possible command but this is intended here as an example rather than an exhaustive presentation of map manipulation commands.

Email Trail:

Issue WAI-PF-6

From Al Gilman (2005-12-14):

Device independence and keyboard equivalents
For the laptop/desktop class of client devices, there has been a "safe haven" input channel provided by the keyboard interface. Users who cannot control other input methods have assistive technologies that at least emulate the keyboard, and so full command of applications is required from the keyboard. Compare with Checkpoints 1.1 and 1.2 of the User Agent Accessibility Guidelines 1.0 [UAAG10].
[UAAG10] http://www.w3.org/TR/UAAG10-TECHS/guidelines.html#gl-device-independence
How does this MMI Framework support having the User Agent supply the user with alternate input bindings for un-supported modalities expected by the application?
How will applications developed in this MMI Framework (EMMA applications) meet the "full functionality from keyboard" requirement, or what equivalent facilitation is supported?

Resolution: Rejected

The general principle of allowing people to interact more flexibly depending on needs and device capabilities, is part of the broader work in the MMI group on multimodal architecture and interfaces. EMMA is at a different level. EMMA provides a standardized markup for containing and annotating interpretations of particular user inputs. It does not standardize the authoring of the logic of the application. At the architecture level this is likely to be a matter of specifying best practices for multimodal application authoring. There is a need for best practices as different levels. On one level there should be best practices for the design of multimodal applications so that they can support a broad range of modalities and tailor the interaction (timeouts etc) on the basis of annotations (e.g medium, mode) and information from the DCI. At another, more pragmatic, level of best practices multimodal applications should be designed so that in addition to support new modalities such as speech they also support keyboard and mouse so that assistive devices which emulate keyboard and/or mouse input can be used to interact with these applications. One principle would be that verbal inputs such as speech and handwriting have 'alternate bindings' to keyboard input fields. Another would be that if an application supports pointing using a device such as a pen or touchscreen
any mechanism supporting pointing (e.g. pen, touchscreen, trackball) should also support mouse input.

Email Trail:

Issue WAI-PF-7a

From Al Gilman (2005-12-14):

Use cases
To make things more concrete, we have compiled the following use cases to be investigated by the MMI group as Assistive Technology use cases which might bear requirements beyond the typical mainstream use cases. We are willing to discuss these with you in more detail with the goal of coming to a joint conclusion about their feasibility in EMMA.
(a) Input by switch. The user is using an on-screen keyboard and inputs each character by scanning over the rows and columns of the keys and hitting the switch for row and column selection. This takes significantly more time than the average user would take to type in the characters. Would this switch-based input be treated like any keyboard input (keyboard emulation)? If yes, could the author impose time constraints that would be a barrier to the switch user? Or, alternatively, would this use case require device-specific (switch-specific) code?

Resolution: Rejected

Imposing time constraints is not something that is done by EMMA rather it is a matter of interaction management. In this particular case we think such constraints are unlikely, general fields for keyboard input do not 'time out'. If a switch was being used to generate substitute speech input then there could be a problem with timeouts (in fact probably a problem for almost any keyboard input). Again this maybe a matter of best practices and the best practice should be that when speech input is supported, keyboard input should also be supported, and for the keyboard input there should be no timeout.

Email Trail:

Issue WA-PF-7b

From Al Gilman (2005-12-14):

Word prediction. Is there a way for word prediction programs to communicate with the interaction manager (or other pertinent components of the framework) in order to find out about what input is expected from the user? For example, could a grammar that is used for parsing be passed on to a word prediction program in the front end?

Resolution: Rejected

Again this certainly lies outside the scope of EMMA, since EMMA does not define grammar formats or interaction management. The W3C SRGS grammar specification, from the Voice Browser working group could potentially be used by a word prediction system.

Email Trail:

Issue WAI-PF-7c

From Al Gilman (2005-12-14):

User overwrites default output parameters. For example, voice output could be described in an application with EMMA and SSML. Can the user overwrite (slow down or speed up) the speech rate of the speech output?

Resolution: Rejected

EMMA is solely used for the representation of user inputs and so does not address voice output. Within the MMI framework the way to achieve this would be to specify the user preference for speech output rate in the DCI and have the interaction manager query the DCI in order to determine the speech rate. The voice modality component is then responsible for honoring users' preferences regarding speech including dynamic changes. The working group responsible for this component is the Voice Browser working group and requirements for this mechanism should be raised there.

Email Trail:

Issue WAI-PF7d

From Al Gilman (2005-12-14):

WordAloud (http://www.wordaloud.co.uk/). This is a program that displays text a word at a time, in big letters on the screen, additionally with speech output. How could this special output modality be accommodated with EMMA?

Resolution: Rejected

EMMA is solely used for the representation and annotation of user inputs and does not address output. At a later stage the EMMA group maybe address output but at this time the language is solely for input.

Email Trail:

Issue WAI-PF7e

From Al Gilman (2005-12-14):

Aspire Reader (http://www.aequustechnologies.com/), This is a daisy reader and browser that also supports speech output, word highlighting, enhanced navigations, extra text and auditory descriptions that explain the page outline and content as you go, alterative renderings such as following through key points of content and game control type navigation. Alternative texts are for the struggling student (for example a new immigrant)

Resolution: Rejected

EMMA is solely used for the representation and annotation of user inputs and does not address output. At a later stage the EMMA group maybe address output but at this time the language is solely for input.

Email Trail:

Issue i18N-5

From Felix Sasaki (2005-10-26):

On terminology: Please reference standards like XForms RELAX-NG, SIP, TCP, SOAP, HTTP, SMTP, MRCP etc. if you mention them.

Resolution: Accepted

Agreed, although in most cases these would be informative references rather than normative ones.

Email Trail:

Issue i18N-6

From Felix Sasaki (2005-10-26):

Section 2.2 Your list of data models is a little bit confusing. A proposal: List the DOM, the infoset and the XPath 2.0 data model.

Resolution: Rejected

Section 2.2 is about the use of constraints on the structure and content of EMMA documents. Your comment seems to be more related to the data model exposed to EMMA processors.

Email Trail:

Issue VB-A1

From Paolo Baggia (2006-04-03):

EMMA profile
Describe in EMMA spec a VoiceXML 2.0/2.1 profile, either in an Appendix or in a Section of the specification. This profile should describe the mandatory annotations to allow a complete integration in a VoiceXML 2.0/2.1 compliant browser.
The VoiceXML 2.0/2.1 requires four annotations related to an input. They are described normatively in http://www.w3.org/TR/2004/REC-voicexml20-20040316/#dml2.3.1 as shadow variables related to a form input item. The same values are also accessible from the application.lastresult$ variable, see http://www.w3.org/TR/2004/REC-voicexml20-20040316/#dml2.3.1
The annotations are the following:
- name$.utterance
which might be conveyed by "emma:token" attribute (http://www.w3.org/TR/emma/#s4.2.1)
- name$.confidence
which might be conveyed by "emma:confidence" attribute (http://www.w3.org/TR/emma/#s4.2.8) The range of values seem to be fine: 0.0 - 1.0, but some checks could be made in the schema of both the specs.
- name$.inputmode
which might be conveyed by "emma:mode" attribute (http://www.w3.org/TR/emma/#s4.2.11) Proposal 1.1 for a discussion of its values
- name$.interpretation, is an ECMA script value containing the semantic result which has to be derived by the content of "emma:interpretation"
As regards the N-best results, see http://www.w3.org/TR/2004/REC-voicexml20-20040316/#dml5.1.5for details, the one-of element should be suitable to convey them to the voice Browser.

Resolution: Rejected

The Multimodal working group sees significant benefit in the creation of an EMMA profile for VoiceXML 2.0/2.1. However, the group rejects the request to have this work within the EMMA specification itself. The request might best be resolved by a W3C Note on these issues, or maybe more broadly on the whole chain that connects a VoiceXML page, to SRGS+SISR grammars and then EMMA to return speech/dtmf results to VoiceXML. We suggest that this document should be edited by VBWG with some support from MMIWG.

Email Trail:

Issue VB-A1.2

From Paolo Baggia (2006-04-03):

Optional/mandatory
The profile should clarify which is mandatory and which is optional. For instance N-best are an optional feature for VoiceXML 2.0/2.1, while the other annotations are mandatory.

Resolution: Accepted (w/modifications)

With respect to the specification of what is optional and mandatory for the profile that information should be part of the EMMA VXML profile we propose should be edited within the VBWG (See VB.A1). As regards the option/mandatory status of EMMA features separate from any specific profile we had reviewed them in detail for the whole EMMA specification and this will be reflected in the next draft.

Email Trail:

Issue VB-A2

From Paolo Baggia (2006-04-03):

Consider 'noinput' and 'nomatch'
Besides user input from a successful recognition, there are several other types of results that VoiceXML applications deal with that should be part of a VoiceXML profile for EMMA as well as the ones suggested in Proposal 1.
'noinput' and 'nomatch' situations are mandatory for VoiceXML 2.0/2.1. Since EMMA can also represent these, the EMMA annotations for 'noinput' and 'nomatch' should be part of the VoiceXML EMMA profile.
Note that in VoiceXML 'nomatch' may carry recognition results as described in Proposal 1 to be inserted in the application.lastresult$ variable only.

Resolution: Deferred

These comments are extremely useful for future versions of EMMA but go beyond the goal and requirements of the current specification.

Email Trail:

Issue VB-A3

From Paolo Baggia (2006-04-03):

DTMF/speech
It is very important that EMMA will be usable for either speech or DTMF input results, because VoiceXML2.0/2.1 allows both these inputmode values. We expect that the VoiceXML profile in EMMA will make this clear to enforce a complete usage of EMMA for Voice Browser applications.

Resolution: Deferred

These comments are extremely useful for future versions of EMMA but o beyond the goal and requirements of the current specification.

Email Trail:

Issue VB-A4

From Paolo Baggia (2006-04-03):

Record results
EMMA can represent the results of a record operation, see the description of the record element of VoiceXML http://www.w3.org/TR/2004/REC-voicexml20-20040316/#dml2.3.6, so the EMMA annotations for recordings should also be part of a VoiceXML profile. This feature is optional in VoiceXML 2.0/2.1.

Resolution: Deferred

These comments are extremely useful for future versions of EMMA but go beyond the goal and requirements of the current specification.

Email Trail:

Issue VB-B

From Paolo Baggia (2006-04-03):

EMMA and the evolution of VoiceXML
For the evolution of VoiceXML the current time is too premature to give precise feedback, but the clear intention is to take care of an extended usage of EMMA inside a future VoiceXML application.
This includes, but it is not limited to:
- leave access to the whole EMMA document inside the application.lastresult variable (both a raw EMMA document and a processed one, i.e. in ECMA-262 format)
- include proper media-types to allow a clear indication if the raw results are expressed in EMMA or other formats (e.g. NLSML). The same for the processed results.
Other possible evolutions will be to have a simple way to pass EMMA results from VoiceXML to other modules to allow further processing.
A last point is that EMMA should be used to return results of Speaker Identification Verification (SIV) too. Voice Browser SIV subgroup is working to create a few examples to circulate them to you to get feedbacks.
We will highly appreciate your comments on these ideas to better address this subject in the context of the evolution of Voice Browser standards.

Resolution: Deferred

These comments are extremely useful for future versions of EMMA but go beyond the goal and requirements of the current specification.

Email Trail:

2.2 Technical Errors

Issue i18N-1

From Felix Sasaki (2005-10-26):

Reference to RFC 1738 -------------------------------------------------------------------------
RFC 1738 is obsoleted by RFC 3986 (URI Generic Syntax). It would be good if you could refer to RFC 3986 instead of 1738. The best thing would be if you could add a normative reference to RFC 3987 (Internationalized Resource Identifiers (IRIs).

Resolution: Accepted

Agreed, document has been revised as suggested.

Email Trail:

Issue i18N-2

From Felix Sasaki (2005-10-26):

General: Reference to RFC1766
--------------------------------------------------
RFC 1766 is obsoleted by 3066 (Tags for the Identification of Languages). What is essential here is the reference to a BCP (best common practice), which is for language identification BCP 47. Currently bcp 47 is represented by RFC 3066, so could you change the reference to "IETF BCP 47, currently represented by RFC 3066"?

Resolution: Accepted

Agreed, document has been revised as suggested.

Email Trail:

Issue i18N-3

From Felix Sasaki (2005-10-26):

General and sec. 2.4.1: References to XML and XMLNS
-------------------------------------------------------------------------------------
As for XML, you reference version 1.0. As for XMLNS, you reference version 1.1. Is there a reason for the mismatch of the versions?

Resolution: Accepted

Thank you for pointing this out. We have updated the specification to reference XML 1.1 and XMLNS 1.1.

Email Trail:

Issue i18N-4

From Felix Sasaki (2005-10-26):

Sec. 1.2, definition of "URI: Uniform Resource Identifier"\
------------------------------------------------------------------------------------
Here you refer to XML Schema for URIs. It would be good if you could also refer to the underlying RFCs (see comment 1).

Resolution: Accepted

Agreed, document has been revised as suggested.

Email Trail:

Issue i18N-7

From Felix Sasaki (2005-10-26):

On terminology: "An EMMA attribute is prefixed ..." should be "An EMMA attribute is prefixed (qualified) ...". Also: "An EMMA attribute is not prefixed ..." should be "An EMMA attribute is not prefixed (unqualified) ..."

Resolution: Accepted

Thanks for pointing this out. We have investigated the use of these terms in recent specifications and revised the EMMA specification 2.3 as follows to clarify the terminology:
"An EMMA attribute is qualified with the EMMA namespace prefix if the attribute can also be used as an in-line annotation on elements in the application's namespace. Most of the EMMA annotation attributes in Section 4.2 are in this category. An EMMA attribute is not qualified with the EMMA namespace prefix if the attribute only appears on an EMMA element. This rule ensures consistent usage of the attributes across all examples."

Email Trail:

Issue i18N-8

From Felix Sasaki (2005-10-26):

Have you thought of using RFC 2119 to indicate requirements levels (e.g. with "must", "should", "must not" etc.)?

Resolution: Accepted

Agreed, in response we have conducted an extensive review of the document revising language as needed and adding in capitalization in accordance with RFC 2119. Also added of a small paragraph near the beginning of the document indicating this.

Email Trail:

Issue i18N-10

From Felix Sasaki (2005-10-26):

Reference to RFC 3023 (MIME media types), e.g. in appendix B.1 ----------------------------------------------------------------------------------------------------- Work is undertaken for a successor of RFC 3023. To be able to take its changes into account, it would be good if you could change the reference to RFC 3023 to "RFC 3023 or its successor." Please have a look at How to Register an Internet Media Type for a W3C Specification.

Resolution: Accepted

Agreed, document has been revised as suggested.

Email Trail:

Issue i18N-11

From Felix Sasaki (2005-10-26):

Reference to RFC 3023 in appendix B.1, on security considerations -------------------------------------------------------------------------------------------------------
Please refer to the security considerations mentioned in RFC 3987.

Resolution: Accepted

Agreed, document has been revised as suggested.

Email Trail:

Issue i18N-12

From Felix Sasaki (2005-10-26):

It would be good if you could make a clearer difference between normative and non-normative parts of the specification.

Resolution: Accepted

Agreed, we have reviewed and reorganized the document so that normative vs informative sections are clearly marked.

Email Trail:

Issue i18N-2-6

From Richard Ishida (2007-05-02):

Definition of URI not normative. A definition of URI is given in the Terminology section that defines it in terms of RFC 3986 and XML Schema Part 2:Datatypes, but that section is not normative. We think the definition of URI should be normative

Resolution: Accepted

We will reference RFC 3896 and RFC 3987 where the document first uses the term "URI" in normative text (section 3.2). We will use the following text following from the example in XQuery. "Within this specification, the term URI refers to a Universal Resource Identifier as defined in [RFC3986] and extended in [RFC3987] with the new name IRI. The term URI has been retained in preference to IRI to avoid introducing new names for concepts such as "Base URI" that are defined or referenced across the whole family of XML specifications."

Email Trail:

Issue i18N-2-7

From Richard Ishida (2007-05-02):

IRIs and URIs. [[A URI is a unifying syntax for the expression of names and addresses of objects on the network as used in the World Wide Web (RFC3986). A URI is defined as any legal anyURI primitive as defined in XML Schema Part 2: Datatypes Second Edition Section 3.2.17[SCHEMA2].]] We are concerned that you are disallowing IRIs here. (Btw, we did propose that you reference RFC 3987 as part of the first comment in a previous review [http://www.w3.org/International/2005/10/emma-review.html], and you agreed to implement that comment, but you seem to have overlooked this aspect.) The XML Schema 1.0 definition of anyURI does not encompass IRIs either (though this will be changed for XMLSchema 1.1). We suggest that you adopt a definition like that of XQuery. The XQuery definition reads: "Within this specification, the term URI refers to a Universal Resource Identifier as defined in [RFC3986] and extended in [RFC3987] with the new name IRI. The term URI has been retained in preference to IRI to avoid introducing new names for concepts such as "Base URI" that are defined or referenced across the whole family of XML specifications."

Resolution: Accepted

[There was no intention to disallow IRI's. We will add the proposed language from the XQuery definition to section 1.2.

Email Trail:

2.3 Requests for Change to Existing Features

Issue i18N-2-2

From Richard Ishida (2007-05-02):

Use of emma:lang. It's not at all clear to us what the difference is between emma:lang and xml:lang, the relationship between them, or when we should use which. (It might help to create examples that show the use of xml:lang as well as emma:lang.) [[In order handle inputs involving multiple languages, such as through code switching, the emma:lang tag MAY contain several language identifiers separated by spaces.]] This is definitely something you cannot do with xml:lang, but we are wondering what is the value of doing it anyway. We are not sure what benefit it would provide.

Resolution: Accepted

We address each of these two points in turn: Point 1: ACCEPT Clarification of emma:lang vs xml:lang function The W3C multimodal working group accept that it is important to make clear the differences between the xml:lang and emma:lang attributes and plan to add clarificatory text into the emma:lang section in the next draft of the EMMA specification. The xml:lang and emma:lang attributes serve uniquely different and equally important purposes. The role of xml:lang is to indicate the language used for content in an XML element or document. In contrast, the emma:lang attribute is used to indicate the language employed by a user when entering an input into a spoken or multimodal dialog system. Critically, emma:lang annotates the language of the signal originating from the user rather than the specific tokens used at a particular stage of processing. This is most clearly illustrated through consideration of an example involving, multiple stages of processing of a user input -- the primary use of EMMA markup. Consider the following scenario: EMMA is being used to represent three stages in the processing of a spoken input to an system for ordering products. The user input is in Italian, after speech recognition, the user input is first translated into English, then a natural language understanding system converts the English translation into a product ID (which is not in any particular language). Since the input signal is a user speaking Italian, the emma:lang will be emma:lang="it" on all of these stages of processing. The xml:lang attribute, in contrast will initial be "it", after translation the xml:lang will be "en-US", and after language understanding "zxx", assuming the use of "zxx" to indicate non-linguistic content. The following table illustrates the relation between the content in the EMMA document, the emma:lang and the xml:lang:
------------------------------------------------------------------------
CONTENT: emma:lang xml:lang processing stage
------------------------------------------------------------------------
condizionatore emma:lang="it" xml:lang="it" result from speech recognition
air conditioner emma:lang="it" xml:lang="en" result from machine translation
id1456 emma:lang="it" xml:lang="zxx" result from natural language understanding
The following are examples of EMMA documents corresponding to these three processing stages. Abbreviated to show the critical attributes for discussion here. Note that <transcription>, <translation>, and <understanding> are application namespace attributes, not part of the EMMA markup.
<emma:emma>
<emma:interpretation emma:lang="it" emma:mode="voice" emma:medium="acoustic">
<transcription xml:lang="it">condizionatore</transcription>
</emma:interpretation>
</emma:emma>
<emma:emma>
<emma:interpretation emma:lang="it" emma:mode="voice" emma:medium="acoustic">
<translation xml:lang="en">air conditioner</translation>
</emma:interpretation>
</emma:emma>
<emma:emma>
<emma:interpretation emma:lang="it" emma:mode="voice" emma:medium="acoustic">
<understanding xml:lang="zxx">id1456</understanding>
</emma:interpretation>
</emma:emma>
In order to make clear these differences we will add clarifying text and examples to the specification. Point 2: Clarification, multiple values in emma:lang:
-----------------------------------------------------
In call center and other applications multilingual users provide inputs in which they switch input language in mid utterance. The emma:lang in these cases needs to indicate that the language involved more than one language, e.g. "quisiera hacer una collect call" The emma:lang in this case would have value "sp en"
<emma:emma>
<emma:interpretation emma:lang="sp en" emma:mode="voice" emma:medium="acoustic">
<transcription>quisiera hacer una collect call</transcription>
</emma:interpretation>
</emma:emma>
In order to use xml:lang in this example perhaps an additional element could be used, e.g. <span>. Would this work? <emma:emma>
<emma:interpretation emma:lang="sp en" emma:mode="voice" emma:medium="acoustic">
<transcription xml:lang="sp">quisiera hacer una <span xml:lang="en">collect call </span></transcription>
</emma:interpretation>
</emma:emma>

Email Trail:

Issue i18N-2-3

From Richard Ishida (2007-05-02):

HTTP and HTML meta elements also allow for multiple language tags, but use commas to separate tags, rather than just spaces. It may reduce confusion to follow the same approach.

Resolution: Rejected

We agree that ',' are used in elements for separation of multiple values but in this case we are addressing separation of multiple values within attribute values. It is common practice in attributes for values to be space separated. Furthermore, the value of emma:lang and other attributes in EMMA which can hold multiple values, such as emma:medium and emma:mode is of type XSD:NMTokens which is a white space separated list of XSD:NMToken values.

Email Trail:

Issue i18N-2-5

From Richard Ishida (2007-05-02):

typo. [[in order handle]] -> 'in order to handle' ?

Resolution: Accepted

Comment: We will correct this typo in the next draft of the specification.

Email Trail:

Issue VB-A1.1

From Paolo Baggia (2006-04-03):

Values of emma:mode
Some clarification should be needed to explain how to map the values of "emma:mode" (http://www.w3.org/TR/emma/#s4.2.11) to the expected values of the "inputmode" variable (Table 10 in http://www.w3.org/TR/2004/REC-voicexml20-20040316/#dml2.3.1). The voiceXML 2.0/2.1 prescribes two values: "speech" and "dtmf".
Anther option is to adopt in EMMA the exact values expected by VoiceXML 2.0/2.1 to simplify the mapping. Other related fine grained EMMA annotation are not possible in VoiceXML 2.0/2.1.

Resolution: Accepted

The MMIWG agree that the values of emma:mode of specific relevance to VXML should be revised in EMMA. For the current editor draft, and for the candidate recommendation we will change the emma:mode values as follows in Section 4.2.11 and throughout the document as follows: - from "dtmf_keypad" to "dtmf" - from "speech" to "voice"

Email Trail:

Issue VB-A5

From Paolo Baggia (2006-04-03):

Add informative examples
We think that some informative examples will improve the description of the profile. This might include a SISR grammar for DTMF/speech and the expected EMMA results to be compliant to the profile.
The examples should include both single result returned and N-best results.
We think that also an alternative example of lattices would be very interesting, even if in the VoiceXML 2.0/2.1 it will not be representable, but nonetheless it will be useful for the evolution of VoiceXML, see point B below.

Resolution: Deferred

These comments are extremely useful for future versions of EMMA but go beyond the goal and requirements of the current specification.

Email Trail:

Issue Public-01

From Paolo Martini (2007-04-25):

Node anchoring on signal time axis. I approached only recently EMMA and I have some problems understanding the temporal anchoring of an emma:node. I would instinctively expect a node to correspond to what ISO 8601 calls an "instant", a "point on the time axis". With reference to paragraph 3.4, if I read correctly the document: 1. An emma:node can be anchored with absolute or relative timestamps. In the absolute mode, the optional emma:start and emma:end attributes seem to allow a duration, while in the relative mode, the optional emma:offset-to-start (with emma:duration not allowed) seems to force an instant status. If, conceptually, a node is allowed to correspond to a segment of the signal, I would welcome a comment on the rationale for that. If not, I would suggest to replace emma:start and emma:end with a single "time point"-like attribute or, at least, to forbid emma:end, implicitly adding ambiguity in the semantics of emma:start. 2. An emma:arc implicitly asserts the existence of two nodes, but I would say that the temporal attributes of the arcs, if present, define those nodes. A node could be therefore defined more than once. I simplify the example in 3.4.2: <emma:arc from="1" to="2"
emma:start="1087995961542" emma:end="1087995962042">flights </emma:arc>
<emma:arc from="2" to="3"
emma:start="1087995962042" emma:end="1087995962542">to</emma:arc>
Being node 2 the same, what if emma:end in the "flights" arc and emma:start in the "to" arc do not have the very same value? Again, if this is conceptually allowed, I would welcome an explanation of the rationale. Otherwise, I would prefer enforcing a coherent description directly in the language instead of relying on validity checks. For example, restricting the "definition" of nodes inside node:element, i.e. forbidding timestamps in arcs. I went through the document and the list archive and I wasn't able to find answers to these doubts. Nevertheless, I apologize if these points have already been addressed. Thanks for your help and your work, Paolo Martini

Resolution: Accepted (w/modifications)

You are correct that emma:node elements are intended to correspond to instants. Regarding 1., we agree that as it stands the ability to place both emma:start and emma:end on emma:node appears to allow a duration. This is an error in the current draft as we did not intend for emma:start and emma:end to be used on emma:node. In the next draft of the EMMA specification and the corresponding schema we will remove the emma:start and emma:end attributes from emma:node. The primary motivation for the addition of emma:node was to provide a place for annotations which apply specifically to nodes rather than to arcs. For example, in some representations of speech recognition lattices, confidences or weights are placed on nodes in addition to arcs. For this reason we define both nodes and arcs. It is critical that we have both timestamps and node start end annotations on arcs as they serve different purposes. The role of the 'from' and 'to' annotations on arcs is to define the topology of the graph. On the other hand the timestamps emma:start and emma:end are annotations which describe temporal properties associated not necessarily with the arc but with the label on the arc. There is in fact no guarantee that the emma:end on 'flights' in your example will be equivalent to the emma:start on 'to'. If they were required to be the same, the transition point from one arc to the next would have to be assigned to an arbitrary point in the silence between the two words. Similarly if there is no silence between two words in sequence and in fact they may share a geminate consonant, for example"well lit" "gas station" word timings from the recognizer may in fact overlap, that is the end of the arc for the word "well" may be later than beginning of "lit". Perhaps the even stronger case for having both time and the 'from' 'to' annotations is that in the lattice representation being at a particular time point does not guarantee that you are on the same node in the lattice. For example, imagine a lattice representing two possible strings: 'to boston'
'two blouses'
The lattice representation:
<emma:lattice initial="1" final="4"> <emma:arc from="1" to="2" start="1000" end ="2000">to</emma:arc> <emma:arc from="1" to="3" start="1000" end ="2000">two</emma:arc> <emma:arc from="2" to="4" start="2000" end ="4000">boston</emma:arc> <emma:arc from="3" to="4" start="2000" end ="4000">blouses</emma:arc> </emma:lattice>
Note that even though the first two arcs end at the same time point those arcs lead to different states 2 vs. 3, encoding which path has been taken in the graph. The critical factor here is that the lattice representation does not necessarily have to correspond to a time sequence. The lattice representation is used to encode a range of possible interpretations of a signal. It is often the case that the left to right sequence of symbols in the lattice corresponds to time but there is no guarantee. For example, the lattice may represent interpretations of a typed text string rather than speech. It is also possible that a semantic representation encoded as a lattice could have time annotations on the first arc which are later than time annotations on the final arc. Since lattices represent abstractions over the signal we cannot assume that time annotations define their topology. In order to clarify this we will add text to the specification making clear that lattices represent abstractions of the signal, and that time annotations may describe labels rather than arcs. We would greatly appreciate if you would review this response and respond within three weeks indicating whether this resolves your concern. If we do not receive a response within three weeks we will assume that this response resolves your concern.

Email Trail:

2.4 New Feature Requests

Issue i18N-9

From Felix Sasaki (2005-10-26):

Sec. 4.2.15 on references to a grammar -------------------------------------------------------------- You identify a grammar by an URI. It might also be useful to be able to say "just a french grammar", without specifying which one. That is, to have a mechanism to specify the relations like general vs specific between grammars.

Resolution: Rejected

We do not see any important use cases addressed by this potential feature. Specifically, we don't believe that specifying 'just a french grammar' would provide sufficient additional information over and above the information provided by the 'emma:lang' attribute to make it worth adding. This is due to the fact that it is only through successful processing using a language-specific grammar that the processor can identify the language used by the speaker in the first place.

Email Trail:

Issue i18N-13

From Felix Sasaki (2005-10-26):

Is it possible to apply the emma:lang annotation also to tokens?

Resolution: Rejected

There is no language associated with the contents of emma:tokens. In many cases, this attribute value will not be meaningful to the casual reader. For instance, it may describe the phonemes or phonetic units for speech recognition. Proper nouns or shared words such as 'no' in English and Spanish may appear in the grammars for several languages, though the meaning may be identical and the system may not care which language applied.
It is proper to say that emma:tokens and emma:lang provide information about the user's input but not that emma:lang describes the language of the contents of emma:tokens.

Email Trail:

Issue i18N-2-1

From Richard Ishida (2007-05-02):

There is no language attribute available for use on the emma:literal element. Please add one.

Resolution: Accepted (w/modifications)

Every emma:literal element appears within an emma:interpretation element, and the emma:lang attribute is permitted on emma:interpretation. Therefore, there is no need for another emma:lang attribute on the emma:literal element. For consistency we prefer that emma:lang appear on emma:interpretation rather than having emma:lang potentially appear on both elements. With respect to xml:lang, we will clarify in the specification that the xml:lang attribute can appear on any emma element (including emma:literal).Every emma:literal element appears within an emma:interpretation element, and the emma:lang attribute is permitted on emma:interpretation.Therefore, there is no need for another emma:lang attribute on the emma:literal element. For consistency we prefer that emma:lang appear on emma:interpretation rather than having emma:lang potentiallyappear on both elements. With respect to xml:lang, we will clarify in the specification that the xml:lang attribute can appear on any emma element (including emma:literal).

Email Trail:

Issue i18N-2-4

From Richard Ishida (2007-05-02):

Use of xml:lang="". In XML 1.0 you can indicate the lack of language information using xml:lang="". How does EMMA allow for that with xml:lang and emma:lang? We feel it ought to. See http://www.w3.org/International/questions/qa-no-language

Resolution: Accepted (w/modifications)

Thank you for raising this important issue. In addressing this issue and reading related documents such as (http://www.w3.org/International/questions/qa-no-language), we determined that in addition to the use of emma:lang="" we should also address the use of emma:lang="zxx". Below we address each in turn: 1. Non-linguistic input (emma:lang="zxx"):
------------------------------------------
Given the use of EMMA for capturing multimodal input, including input using pen/ink, sensors, computer vision etc there are many EMMA results that capture non-linguistic input. Example include drawing areas, arrows etc. on maps and music input for tune recognition. This raises the question of how non-linguistic inputs should be annotated for emma:lang. Following on from the use in xml:lang, we propose that non-linguistic input should be marked using the value "zxx". Since we already refer to BCP 47 and use the values from the IANA subtag registry for emma:lang values this does not require revision of the EMMA markup. We will however, add an example and clarifying text to the EMMA specification indicating the use of emma:lang="zxx" for non-linguistic inputs. To illustrate the difference between emma:lang and xml:lang for this kind of case. Hummed input to a tune recognition application would be emma:lang="zxx" since the input is not in a human language, but it the result was a song title in English, that would be marked as xml:lang="en":
<emma:emma>
<emma:interpretation emma:lang="zxx" emma:mode="tune" emma:medium="acoustic">
<songtitle xml:lang="en">another one bites the dust</songtitle>
</emma:interpretation>
<emma:emma>
2. Non-specification (emma:lang="") -----------------------------------
Parallel to your suggested usage for xml:lang
(http://www.w3.org/International/questions/qa-no-language), for cases in which there is no information about whether the source input is in a particular human language and if so which language, are annotated as emma:lang="". Furthermore, in cases where there is not explicit emma:lang annotation, and none is inherited from a higher element in the document, the default value for emma:lang is "" meaning that there is no information about whether the source input is in a language and if so which language.

Email Trail:

Issue SW-1

From Jin Liu (2006-10-2):

Suggest use of EMMA to represent output using emma:result element.

Resolution: Deferred

The current scope of the EMMA specification is to provide a framework for representing and annotating user inputs. There are considerably more issues to address and work needed to give an adequate representation of user output and so for the current specification document the multimodal working group have chosen to defer work on output. For example, how would graphical output be handled, if the system is going to draw ink, display a table, or zoom a map? There has been interest in output representation both inside and outside the working group. In a future version of EMMA we may consider this topic, and would at that time return to your contribution and others we have received.

Email Trail:

Issue SW-2

From Jin Liu (2006-10-2):

USING EMMA FOR STATUS COMMUNICATION AMONG COMPONENTS
PROPOSAL TO ADD EMMA ANNOTATIONS FOR STATUS COMMUNICATION AMONG COMPONENTS:
emma:status
emma:actual-answer-time
emma:expected-answer-time
emma:query-running

Resolution: Rejected

The scope of EMMA is to provide an representation and annotation mechanism for user inputs to spoken and multimodal systems. As such status communication messages among processing components fall outside the scope of EMMA and are better addressed as part of the MMI architecture outside of EMMA. We are forwarding this feedback to the architecture and authoring subgroups within the W3C Multimodal working group. This contribution is of particular interest to the authoring effort.

Email Trail:

Issue SW-3

From Jin Liu (2006-10-2):

OOV
=======================================================================
PROPOSAL TO ADD EMMA:OOV MARKUP FOR INDICATING PROPERTIES OF OUT OF VOCABULARY ITEMS:
emma:oov
<emma:arc emma:from="6" emma:to="7"
emma:start="1113501463034"
emma:end="1113501463934"
emma:confidence="0.72"> <emma:one-of id="MMR-1-1-OOV" emma:start="1113501463034" emma:end="1113501463934"> <emma:oov emma:class="OOV-Celestial-Body" emma:phoneme="stez" emma:grapheme="sters" emma:confidence="0.74"/>
<emma:oov emma:class="OOV-Celestial-Body"
emma:phoneme="stO:z"
emma:grapheme="staurs"
emma:confidence="0.77"/>
<emma:oov emma:class="OOV-Celestial-Body"
emma:phoneme="stA:z"
emma:grapheme="stars"
emma:confidence="0.81"/>
</emma:one-of> </emma:arc>

Resolution: Rejected

While the ability to specify recognize and annotate the presence of out of vocabulary items appears extremely valuable, the EMMA group are concerned as to how many recognizers will in fact provide this capability. Furthermore to develop this proposal fully significant time will have to be assigned. Therefore we believe that the proposed annotation for oov is a best handled as vendor specific annotation. EMMA provides an extensibility mechanism for such annotations through the emma:info element. The current markup from your feedback above does not meet the EMMA XML schema as it contains emma:one-of within a lattice emma:arc. Also the timestamp on the one of may not be necessary since it matches that on emma:arc. The oov information could alternatively be encoded as a vendor or application specific extension using emma:info as follows:
<emma:arc emma:from="6" emma:to="7"
emma:start="1113501463034"
emma:end="1113501463934"
emma:confidence="0.72"> <emma:info> <example:oov class="OOV-Celestial-Body"
phoneme="stez" grapheme="sters"
confidence="0.74"/> <example:oov class="OOV-Celestial-Body" phoneme="stO:z" grapheme="staurs" confidence="0.77"/> <example:oov class="OOV-Celestial-Body" phoneme="stA:z" grapheme="stars" confidence="0.81"/> </emma:info> </emma:arc>

Email Trail:

Issue SW-4

From Jin Liu (2006-10-2):

In dialog applications it is important to distinguish between each distinct turn. The xs:nonNegativInteger annotation specifies the turn ID associated with an element.
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/emma/emma10.xsd"
xmlns="http://www.example.com/example"> <emma:interpretation turn-id="42"> ...
</emma:interpretation> </emma:emma>

Resolution: Accepted

We agree that it is important to have an annotation of indicating turn id and adopt your suggestion.
We have added a new section to the specification:
4.2.17 Dialog turns: emma:dialog-turn attribute
The emma:dialog-turn annotation associates the EMMA result in the container element with a dialog turn. The syntax and semantics of dialog turns is left open to suit the needs of individual applications. For example, some applications may use an integer value, where successive turns are represented by successive integers. Other applications may combine a name of a dialog participant with an integer value representing the turn number for that participant. Ordering semantics for comparison of emma:dialog-turn is deliberately unspecified and left for applications to define.

Email Trail:

Issue ITS-01

From Christian Lieske (2007-05-03):

i. Allowing ITS markup in EMMA. With this provision in place, EMMA could for example easily carry for example information on directionality, or ruby. Your example [emma:tokens="arriving at 'Liverpool Street'"] could for example be enhanced by local ITS markup (see http://www.w3.org/TR/its/#basic-concepts-selection-local) as follows in order to explicitly encode directionality information: [its:dir="ltr" emma:tokens="arriving at 'Liverpool Street'"]. Please note, that the EMMA design decision to encode tokens in an attribute prevents a decoration of individual tokens. With an elements-based encoding of tokens, the example [<tokens> arriving at 'Liverpool Street'</tokens>] furthermore could be enhanced by local ITS markup as follows in order to explicitly encode the fact that 'Liverpool Street' is a specific type of linguistic unit ('span' by the way is an element which ITS recommends): [<tokens>arriving at <span its:term="yes">Liverpool Street</span></tokens>"]. Aside: We have considered your response on tokens in http://lists.w3.org/Archives/Public/public-i18n-core/2006JulSep/0074.html while crafting this suggestion. We felt, that ITS-annotations to tokens despite of your response would be valuable.

Resolution: Rejected

EMMA provides different mechanisms for representing captured input and the various stages of semantic analysis that follow. We agree that there are situations where ITS markup is appropriate within an EMMA document and that the 'emma:tokens' attribute does not permit embedded ITS annotations. The restricted content model of emma:tokens has been intentionally chosen to make common use cases simple. There are other approaches with greater expressive power where ITS annotations may be specified. EMMA anticipates a rich diversity of user inputs (e.g. keyboard entry, speech, handwriting input) and provides multiple mechanisms for representing that input. The 'emma:tokens' attribute is the most limited of these. Other mechanisms such as the 'emma:signal' and the emma:derivation element offer far more freedom. To better explain these different mechanisms, we offer some background and walk through two illustrative examples showing how user input may be used to represented and/or summarized at various levels within the semantic analysis. We expect this review will better explain where 'emma:tokens' is appropriate.

Email Trail:

Issue ITS-02

From Christian Lieske (2007-05-03):

ii. Creating an ITS Rule file (see http://www.w3.org/TR/its/#link-external-rules) along with the EMMA specification (e.g. as a non-normative appendix). With this in place, localization/translation would become easier in case EMMA instances or parts of EMMA instances (eg. an "interpretation") would need to be transferred from one natural language to another one. Several EMMA and elements and attributes contain text. Most, if not all localization tools (as well as ITS) assume element content is translatable and attribute content is not translatable. However in EMMA, this assumption does not seem to be valid. The EMMA element "interpretation" for example does not seem to contain immediate translatable content, and the EMMA attribute "tokens" in some circumstances might have to be translated. While this is fine because tools have ways to specify an element should not be translated, it is very often quite difficult no know *which elements* or *which attributes* should behave like that. Having a list of elements that are non-translatable (or conversely if there are more non-translatable than translatable elements) would help a lot. This list could be expressed using ITS rules (see http://www.w3.org/TR/its/#basic-concepts-selection-global) relating to "its:translate" (see "its:translate" see http://www.w3.org/TR/its/#trans-datacat). This way all user of translation tools (or other language-related applications such as machine-translation engines, etc.) could look up that set of rules and process accordingly. For the examples given above, and ITS rules file could be as simple as: <its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0"> <its:translateRule selector="//interpretation" translate="no"/> <its:translateRule selector="//@tokens" translate="yes"/> </its:rules>

Resolution: Deferred

See Archived document

Email Trail: