Proposed Solutions to Make Data Mining Results Accessible

This paper summarizes the accessibility features that were implemented for the development of IBM DB2® Intelligent Minerâ„¢ Visualization. Intelligent Miner Visualization is a Java application for visualizing data mining results. It uses pie charts, histograms, directed graphs, binary tree graphs, and tables to display these results. In order to make this information accessible to users with disabilities, special presentation and interaction techniques had to be developed.

Possible Accessibility Features for Visualizations

In terms of the guidelines for web content accessibility [2], the deployed  accessibility features fall into the following categories:

a) Provide equivalent alternatives to auditory and visual content
Each visualizer provides the data that make up charts and graphs also in tabular format. In this way, assistive technologies can access this information. The clustering visualizer offers a special view that summarizes the most important characteristics of each cluster in textual form (see detailed description below).

In addition, visually impaired users can choose the built-in accessibility display schemes for font sizes and contrast settings for foreground and background colors: high contrast / black on white, high contrast / black on white / large fonts, high contrast / white on black, and high contrast / white on black / large fonts.
b) Don't rely on color alone
For users with normal color vision, the visualizers provide color coding on multiple dimensions. The needs of users with color vision deficiencies are accounted for in several ways: The default color palette is suitable for the most common forms of color vision deficiencies, all color coding is customizable, and monochrome textures can be used instead of colors.
c) Provide context and orientation information
Each visualizer is divided into a number of graphical and textual views, presented on tabbed pages. Within a page, information is logically grouped into sections that allow users to quickly navigate to the desired location.
d) Provide clear navigation mechanisms
The visualizers are fully enabled for mouse-less operation via the keyboard. This includes navigation and interaction in directed graphs and tree graphs.

A Closer Look at Textual Descriptions as Alternative Representations

A special technique had to be developed for the visualization of clustering results, because of the complexity of the information displayed. Clustering is a data mining function which sorts the analysed data into clusters of similar data. An example of this data mining function is the segmentation of customer profiles. For example, a bank wants to understand how its different segments of customers look like, so that it can predict their preferences. The sourcedata may contain information like customer income, marital status, profession, age, etc. The clustering algorithm analyses these data and builds segments of customers presenting similarities.

An example of such a customers segment could be: "married men over 50 with a high income, representing 15% of all customers". An other segment could be like "single student under 30, with a low income, representing 20% of all customers". The result of such an analysis is presented to the user as a collection of graphs, histograms and pie charts describing the statistical characteristics of each cluster and comparing these information against the statistical characteristics of the whole population of customers. The challenge for accessibility is that the user can only interpret the result if he or she has an overview of all the diagrams describing a cluster. For instance, a normal-sighted user would conclude from the diagrams that a specific segment represents the "married men over 50 with a high income", because he or she sees that the pie-chart showing the distribution of menand women in this segment shows a predominance of men, that the histogram showing the age distribution in this segment shows a peak at 60, and that the histogram representing the income shows a peak for income values which are much higher than the avarage income. If the aggregation of all these informations is not a real problem for a user who is able to see all diagrams at the same time, it presents a lot of difficulties for a blind user. Even if the values represented by the diagrams are available in tabular format, it does not help such a user, who needs an overview on all the data, before he or she can use the single data values provided in the tables.

To solve this problem, we have developed an algorithm that interprets the statistical informations -- minimum, maximum, mean value, and standard deviation -- contained in each graph, and generates a textual description which summarizes the different diagrams describing a cluster. Such a description could be like: "Marital status is predominantly 'married', sex is predominantly 'man', age is high, income is high".

It is quite easy to build a textual description of a pie chart showing the distribution of a discrete value, such as the marital status, by listing the most represented categories. It is not so trivial to build such a description for a numeric information, such as the income, because additional information is needed to be able to categorize an income as 'high', 'low', or 'medium'. To do that, our algorithm compares the statistic of the diagram to describe with the statistic of the entire - not segmented - population. By knowing the mean value (M) and standard deviation (S) of the income for all the customers, and assuming that the data are normally distributed, it deducts that 95% of the customers have an income between M-2.S and M+2.S. By knowing this background information, it defines the limits, when an income can be considered as high, medium, or low. For example, the algorithm could decide that a value between M-S/2 and M+S/2 can be defined as being medium, a value above M+S/2 is high, and a value below M-S/2 is low.


By using this description algorithm that uses the statistical information of the population to better describe the characteristic of a subset of this population, the user can get a quick interpretation of complex diagrams. Quite interestingly, this feature
is often praised and preferred by by normal-sighted users, because it helps them understand complex information that is usually difficult to interpret fornon-statisticians.

Conclusions and Outlook

We consider this work as a starting point. Many features described satisfy the minimum requirements for accessibility, i.e. disabled users are not excluded from working with the software. We believe, however, that design for accessibility must also incorporate usability requirements of users with special needs. In other words, when designing for accessibility measurable usability objectives with regard to effectiveness, efficiency, and satisfaction must be considered [3], for instance, "a visually impaired or blind user is able to understand and interpret the representation of  a pie chart accurately within 1 minute".  This may be achieved by developing alternative representations of data visualizations, such as the one outlined in this paper. Establishing standards and guidelines for suitable representations could be a promising step towards that goal.

References

  1. D. Willuhn, C. Schulz, L. Knoth-Weber, S. Feger, and Y. Saillet, Developing Accessible Software for Data Visualization, IBM Systems Journal, Vol. 42, No. 4, 2003.
  2. W. Chisholm, G. Vanderheiden,I.Jacobs (eds.): Web Content Accessibility Guidelines 1.0, World Wide Web Consortium, http://www.w3.org/TR/WAI-WEBCONTENT/
  3. ISO 9241-11:1998 Ergonomic requirements for office work with visual display terminals (VDTs) -- Part 11: Guidance on usability.

Authors

Dirk Willuhn, Senior Usability Engineer

IBM Deutschland Entwicklung GmbH, Schoenaicherstrasse 220, 71032 Boeblingen, Germany.

dwilluhn@de.ibm.com

Yannick Saillet, Advisory Software Engineer  

IBM Deutschland Entwicklung GmbH, Schoenaicherstrasse 220, 71032 Boeblingen, Germany.

ysaillet@de.ibm.com