Position Paper

IBM Special Needs Self Voicing Browser

IBM Home Page Reader tm - "the voice of the world wide web"
a new self voicing browser

Jim Thatcher - thatch@us.ibm.com
Phill Jenkins - pjenkins@us.ibm.com
Cathy Laws - claws@us.ibm.com

This paper was originally prepared for the W3C working group meeting held in October, 1998.

Table of Contents

Over a decade of blind accessibility experience

There is no community that better understands the issues of access with speech output than those computer users who are blind. If you want to use the web without a display, with only audio output - in the same way you would use a telephone - you are in exactly the same position as a blind person. Since the mid 70s, IBM has been working with blind computer users to design screen readers and provide accessibility to information technology.

Scope - there is a lot to it

There is much more to a self voicing browser being usable than devising a speech mark-up language or mimicking a telephone voice response system.  Using audio only to browse the Web is much more than adding multimedia capabilities to a visual browser.  The mouse and display need to be replaced by a simple keypad and audio feedback.  Many page designs were built with the assumption that  hand and eye coordination was available - so it has to be replaced with only the keypad and audio.  If there is no full keyboard, like on a phone, then voice recognition may be necessary for inputting into forms and search strings.

What is Home Page Reader tm?

IBM Home Page Reader tm (HPR) is a voice browser designed by a blind researcher in IBM Tokyo Research Labs. Chieko Asakawa found browsing the Web somewhere between difficult and impossible using today's Windows screen readers. Home Page Reader was released as an IBM product in Japan in October of 1997. IBM Special Needs Systems in Austin picked it up and has been adapting it to the North American blind user, adding recommended access features, and aspects of HTML 4.0 support.

What must a self voicing browser do?

A blind user needs to be able to open a URL, read a page, stop, hear short segments again, continue, hear a link, follow that link, listen again and fast forward through the un-important parts.  If you cannot see the display you need to be able to get some kind of orientation information; how big is this page, how many links, and "where am I?". Sighted users get that from visually scanning the page.

The display is two dimensional with colors and images. So it is sort of two-plus dimensions. Speech is one dimensional. We can add voice changes and pitch changes, just as color is added to the screen.  Voice changes kind of make it one-plus dimensions. It is easy to get lost on the Web. It is much easier to get lost with a one dimensional view of a two dimensional universe. So there is a fundamental difference that the self voicing browser has to accommodate.

Web Authors can help

Comfort is an important part of browsing the Web with speech.  If you can see, you train yourself to ignore pictures and navigation links going down one side of the page - you get familiar with the visual layout.  If you cannot see, and the page structure is bad, you may not know what to ignore. How difficult will it be to ignore that 30 item navigation menu?
 
The Web site of the American Council of the Blind has several navigation links at the top of their home page; the first link, however is "skip navigation links." That is the kind of helpful page design needed by people who listen to the web.

The IBM alphaWorks Web site has a text-only site. Usually text-only links appear at the bottom of the page, or on one page we have seen, in a large server side image map. On the alphaWorks site, the text-only link is at the top of the page as "seen" by a text or voice browser. Ot is actually an invisible image on the graphics page.  Text-only and self-voicing browsers hear that link first, while graphics browser are not bothered.  If more Web authors would take that kind of care for users with voice browsers, life would be a lot simpler for them.

Future directions and issues

Days are spent by academicians on how to speak a long description of an image that nobody cares about anyway. Little effort has gone into the problems of intelligently speaking a page thoroughly.  How do you intelligently speak a heavily formatted page using tables inside tables inside tables.

Aural style sheets is perhaps the best way to add a one-plus dimension to the audio only browser.  They help maintain the separation of structure from presentation.

We also need to merge the work done in voice response systems with voce recognition and replace the full keyboard with speech input for complex input like URLs, search strings, forms, and object manipulations.
 

Home Page Reader design can help

Home Page Reader has a play and a stop key. With the play key, you say, read on. With the stop key you stop!

Then there are next, current, and previous - word, item and link keys (nine keys there) that do exactly what their names suggest. Words and links are easy to identify, but what is an item. It is a form element, a list item or a paragraph. Basically an HPR 'item' is a reasonable sized chunk of text so you can comfortably read through a page item at a time.

HPR has a history key, to move backwards and forwards in the history list; a bookmarks key to bring up bookmarks, add them and delete them. And there is a help key, to bring up on line help.

Everything can be taken to extremes; so it is with Home Page Reader. We have extended-functions, so, for example, extended-stop is stop loading. Extended-previous character, item and link keys are go to the first character of the line, first item of the page (top of page), and first link of the page. Same with extended-next. It goes to the last character of the line, the last item in the page, and the last link of the page.

Because it really doesn't make any sense to extend current character, item or link, the extended-current combinations are assigned functions important for voice navigation. Extended-current character switches between character and word reading; extended-current item is the "where am I?" function. And the important extended-current link is follow that link, the equivalent of mouse click.

The help key opens the on-line help manual which in turn can be browsed with Home Page Reader. The extended-help key enters a keys help mode where every key pressed explains its use. Extended-history key function reloads the page; extended-bookmark adds the current page or deletes the current book mark depending on whether you are looking at the page or the bookmark.

Then there are the jump functions, jump tables and structures. Jump tables takes you to previous, current and next table. The jump structure moves to the next item which has a different structure, the first new major HTML element. We define major elements to be lists, table rows, select menus, maps, and forms. So if you are reading a map or menu, jump structure takes you out of that map or menu. You can jump between headers or just jump forward or backwards, ten items at a time, like page up and page down. These jumps are included to assist in navigating sites whose use of HTML as a formatting language makes logical navigation almost impossible.

The page summary function announces the title and size of the documents as measured by number of items and number of links. It also announces the number of tables, and forms. Then the "Where Am I?" function explains the position in the document relative to that summary, e.g., "you are in row 2 of three rows of table 5, item 56 of 87."

In addition to these "reading" functions there are also a number of editing, file management, history list and bookmark tasks that must be managed by Home Page Reader. The well designed reading functions are the ones that make it possible for a blind person to productively hear the voice of the world wide web.