Translation, Modality Transformation and Assistance Services -- Research and Infrastructure Development

Position paper for RDIG (call); March 21, 2003
Gottfried Zimmermann, Trace R&D Center, University of Madison-Wisconsin

Introduction

The Translation,Modality Transformation and Assistance Services program at the Trace Center examines how network-based services can be used to provide assistance for people with disabilities, and people who are aging. Furthermore, these services benefit everybody, whether in a constrained environment (e.g. driving in a car), communicating with people who speak different languages, or just having accurate and searchable meeting minutes. The spectrum of envisioned services includes speech recognition (speech to text), sign language and sign language recognition, international language translation, assistance/mentoring, language simplification, print recognition (advanced OCR), and image/video description. This program is funded by the National Institute on Disability and Rehabilitation Research (NIDRR) and NSF.

Speech-to-Text Service for Collaborative Environments

Currently, Trace is developing an experimental speech-to-text service for a scientific tele-collaborative environment, the Access Grid. A commercial provider of captioning services is employed to provide instant text transcription for virtual meetings. An initial prototype has been developed and was used in experimental environments. The current development activity focuses on a Web service based, modular system for speech-to-text transformation, and simple meeting facilitation services. Field testing will begin in July 2003 with selected Access Grid sessions.

Selected Use Cases

The speech-to-text service can be used in a variety of usage scenarios. Here are some of them:

Text transcription for presentations. In a conference like situation, a large screen in the front shows a text transcript of the speech. This has been demonstrated at the SuperComputing conference in Nov. 2002. The remote speech-to-text service is more economic than having a captioner in person on the site.

Scientific meeting with diverse participants. Distributed groups of scientists collaborate via a tele-collaborative infrastructure. Participants (with and without hearing impairment) can watch the text on a shared, large display or on their personal laptops. Participants can make textual contributions via an integrated message queueing service. This can be an important aide when different people participate in different languages and modalities in addition to participating from different places. The text archive can be used to search for content after the meeting.

Telephone conferences with diverse participants. Similar as in the scientific meeting scenario, participants (with and without hearing impairment) can run personal clients and follow the text on their screens. The meeting facilitation service can help for an organized meeting process (submitting textual messages for "hand raising" or content submission, turn taking).

Acknowledgments: Partial funding was provided by the National Institute on Disability and Rehabilitation Research, and the National Science Foundation. Opinions expressed in this paper are those of the authors only.