This paper summarizes the accessibility features that were implemented for the development of IBM DB2® Intelligent Minerâ„¢ Visualization. Intelligent Miner Visualization is a Java application for visualizing data mining results. It uses pie charts, histograms, directed graphs, binary tree graphs, and tables to display these results. In order to make this information accessible to users with disabilities, special presentation and interaction techniques had to be developed.
In terms of the guidelines for web content accessibility [2], the deployed accessibility features fall into the following categories:
A special technique had to be developed for the visualization of clustering results, because of the complexity of the information displayed. Clustering is a data mining function which sorts the analysed data into clusters of similar data. An example of this data mining function is the segmentation of customer profiles. For example, a bank wants to understand how its different segments of customers look like, so that it can predict their preferences. The sourcedata may contain information like customer income, marital status, profession, age, etc. The clustering algorithm analyses these data and builds segments of customers presenting similarities.
An example of such a customers segment could be:
"married men over 50 with a high income, representing 15% of all customers".
An other segment could be like "single student under 30, with a low income,
representing 20% of all customers". The result of such an analysis is
presented to the user as a collection of graphs, histograms and pie charts
describing the statistical characteristics of each cluster and comparing
these information against the statistical characteristics of the whole
population of customers. The challenge for accessibility is that the user can
only interpret the result if he or she has an overview of all the diagrams
describing a cluster. For instance, a normal-sighted user would conclude from
the diagrams that a specific segment represents the "married men over 50 with
a high income", because he or she sees that the pie-chart showing the
distribution of menand women in this segment shows a predominance of men,
that the histogram showing the age distribution in this segment shows a peak
at 60, and that the histogram representing the income shows a peak for income
values which are much higher than the avarage income. If the aggregation of
all these informations is not a real problem for a user who is able to see
all diagrams at the same time, it presents a lot of difficulties for a blind
user. Even if the values represented by the diagrams are available in tabular
format, it does not help such a user, who needs an overview on all the data,
before he or she can use the single data values provided in the tables.
To solve this problem, we have developed an algorithm that interprets the statistical informations -- minimum, maximum, mean value, and standard deviation -- contained in each graph, and generates a textual description which summarizes the different diagrams describing a cluster. Such a description could be like: "Marital status is predominantly 'married', sex is predominantly 'man', age is high, income is high".
It is quite easy to build a textual description of a pie chart showing the distribution of a discrete value, such as the marital status, by listing the most represented categories. It is not so trivial to build such a description for a numeric information, such as the income, because additional information is needed to be able to categorize an income as 'high', 'low', or 'medium'. To do that, our algorithm compares the statistic of the diagram to describe with the statistic of the entire - not segmented - population. By knowing the mean value (M) and standard deviation (S) of the income for all the customers, and assuming that the data are normally distributed, it deducts that 95% of the customers have an income between M-2.S and M+2.S. By knowing this background information, it defines the limits, when an income can be considered as high, medium, or low. For example, the algorithm could decide that a value between M-S/2 and M+S/2 can be defined as being medium, a value above M+S/2 is high, and a value below M-S/2 is low.
By using this description algorithm that uses the statistical information of
the population to better describe the characteristic of a subset of this
population, the user can get a quick interpretation of complex diagrams.
Quite interestingly, this feature
is often praised and preferred by by
normal-sighted users, because it helps them understand complex information
that is usually difficult to interpret fornon-statisticians.
We consider this work as a starting point. Many features described satisfy the minimum requirements for accessibility, i.e. disabled users are not excluded from working with the software. We believe, however, that design for accessibility must also incorporate usability requirements of users with special needs. In other words, when designing for accessibility measurable usability objectives with regard to effectiveness, efficiency, and satisfaction must be considered [3], for instance, "a visually impaired or blind user is able to understand and interpret the representation of a pie chart accurately within 1 minute". This may be achieved by developing alternative representations of data visualizations, such as the one outlined in this paper. Establishing standards and guidelines for suitable representations could be a promising step towards that goal.
Dirk Willuhn, Senior Usability Engineer
IBM Deutschland Entwicklung GmbH, Schoenaicherstrasse 220, 71032 Boeblingen, Germany.
Yannick Saillet, Advisory Software Engineer
IBM Deutschland Entwicklung GmbH, Schoenaicherstrasse 220, 71032 Boeblingen, Germany.