Important note: This Wiki page is edited by participants of the RDWG. It does not necessarily represent consensus and it may have incorrect information or information that is not supported by other Working Group participants, WAI, or W3C. It may also have some very useful information.
User Evaluation Methods for Web Accessibility Evaluation
- 1 Contacts
- 2 Keywords
- 3 Description
- 4 Background
- 5 Discussion
- 5.1 Can we adapt and use traditional usability methods?
- 5.2 What methodologies are most suitable for user testing with people with disabilities?
- 5.3 What are the benefits of user testing in iterative development cycles?
- 5.4 Most User testing is traditionally informal and lacks scientific rigour. What are the benefits of a more scientific/controlled approach to user testing, and how would this work in practice?
- 5.5 How do we ensure that we cover broad range of disabilities, broad range of expertise, etc?
- 5.6 What about alternative techniques such as remote user evaluation techniques?
- 5.7 How would you verify the results obtained by remote user evaluation?
- 5.8 How do you set up and support a user evaluation team?
- 6 References
Other contact(s): if applicable, name of other RDWG participant(s) who have particular interest and/or expertise in this research topic]
User evaluation, Evaluation, Conformance, Guidelines, User Testing
User evaluation is considered to be very important for web accessibility evaluation, however not much known about how to properly conduct a user evaluation for web accessibility with disabled users. There are a number of disability advocacy groups who are performing user evaluation, however most of them appear to be focusing on assessment as it pertains to that particular group of users. It is costly and difficult to set up and maintain a user testing group which spans a greater number of disability categories, but some are doing it well. We need to know the difficulties of resourcing such a group and training them in the art/science of website evaluation.
Many website accessibility evaluations are carried out relying solely on automated tools, or automated tools with the addition of a manual check of the results. W3C and other literature specifically stipulate that an evaluation should not rely solely on automated tools. Automated tools may miss items which cause a person with assistive technology to have problems with a website, and some of the tools flag items as problems which do not appear to negatively affect such users. An effective user evaluation will provide detail on the problems faced by a user with assistive technology and often suggests additional resources for the website that will improve the user experience for all groups of users. Another problem is that alternating a website so that it helps one group of users may negatively affect users with other disabilities.
Can we adapt and use traditional usability methods?
It is not advisable to use traditional usability methods since the focus of the evaluation and the desired outcome is fundamentally different. Standard usability tests aim at revealing general usability flaws and how the concept of operations works out in general while usability evaluations with focus on accessibility for people with disabilities wants to detect issues related to this specific topic. Data collection during these tests focuses on understanding errors related to accessibility barriers, rather than user satisfaction or time-on-task measures. [IBM12]
What methodologies are most suitable for user testing with people with disabilities?
Basically, there exist three different types of user testing methodologies for general purpose user tests as well as for user testing in the field of accessibility. Each of the following evaluation methods come with advantages and disadvantages and the correct selection of the method which fits a specific product or project highly depends on the desired target groups and the availability of resources in terms of participants and infrastructure. [IBM12]
User testing in a lab environment
The first method requires a laboratory which provides the space and the equipment needed to reconstruct an adequate reproduction of the user’s work environment. The advantages are that the test runs and the evaluations in general are easier to control for the administrator. Session administrators can relatively easily define the testing conditions including the environment, like the technical equipment, the lightning or even the furniture. It is also easier to take notes and record a session in detail since the lab can be adapted in a way to optimally fulfill this task. Consequentially, a better control allows potentially better results when it comes to consistency and meaningfulness. [IBM12] The first method requires a laboratory which provides the space and the equipment needed to reconstruct an adequate reproduction of the user’s work environment. The advantages are that the test runs and the evaluations in general are easier to control for the administrator. Session administrators can relatively easily define the testing conditions including the environment, like the technical equipment, the lightning or even the furniture. It is also easier to take notes and record a session in detail since the lab can be adapted in a way to optimally fulfill this task. Consequentially, a better control allows potentially better results when it comes to consistency and meaningfulness. [IBM12]
User testing at the user’s location
Evaluations at the user’s workplace or home overcome some of the flaws of the lab evaluation but they also come with some disadvantages. The main pro is that the user can use the equipment and assistive technology in the environment he or she is familiar with. Therefore, the configuration effort for setting up a working environment is much less, but the administrator has much less control about the test setup. On the other hand, the configuration effort to assure a reasonable environment to record a session and adequately observe the user and his or her behavior is much higher. In addition, some other important issues need to be taken into account. Many participants do not appreciate a large audience, for that reason the team which is in charge of observing the test run needs to be kept small and the roles need to be clearly defined beforehand. However, one person is not sufficient; also in the case of liability issues a second person may be very helpful to dispel any doubts. Besides that, it is practicable that one person sits with the participant and the other stays off to the side for taking notes. Training sessions should be recorded using portable camcorders and/or sound recorders. [IBM12]
What are the benefits of user testing in iterative development cycles?
The benefits are that, depending on the specific software engineering methodology, the results and outcome of each iteration cycle can be evaluated and tested against specifications as well as against the user expectation, which can change during the development process. This can provide important advantages in terms of monetary savings and the allover quality of the product. Revealing errors in early stages of the project can be fixed much easier. In the worst case fundamental, conceptual flaws which get discovered in later stages might be too expensive to get fixed, which decreases to quality of the product. [CON11, WAI12]
Most User testing is traditionally informal and lacks scientific rigour. What are the benefits of a more scientific/controlled approach to user testing, and how would this work in practice?
More scientific approaches basically derive from methods used for scientifically controlled experiments. The benefits or rather the goals are the same. Scientific controlled experiments as well as scientifically-driven and more formal user evaluation techniques focus on reliability and significance of the results, which usually requires more preparation and effort to set up such testing environments. The following factors are identified as key factors for formal user testing [CON11]:
- Hypothesis: Beforehand, a hypothesis needs to be formulated as specific as possible and must clarify the goal of the test and this is what needs to be proved or disproved.
- Participants randomly chosen: Randomly chosen participants will undertake the evaluation tests. This group needs to be representative in terms of composition and size.
- Tight controls are employed: This is crucial for the integrity of the test results and derived conclusions.
- Control groups: Control groups assure that there is no effect when there should be no effect.
- Size of the group of participants: The size of the group of participants needs to exceed a certain size to avoid poor or misleading data which could be the case when undertaking a user evaluation with a group which is too small. [CON11]
How do we ensure that we cover broad range of disabilities, broad range of expertise, etc?
Basically, the composition of the participants of the user evaluations should reflect all target groups of the future product. In particular, products solving general purpose tasks which aim at accessibility in general to cover a broader possible user group should consider as many groups of disabilities as possible. Kinds of disabilities which should be taken into account are those affecting the vision, the hearing, the motor skills, and cognitive and learning skills. Users suffering from lowered vision or blindness usually use screen magnifiers or Braille displays. People who are deaf or hard of hearing have problems with auditory cues, alarm sound and audio output in general, which need to be communicated somehow else. Also quite important in this context is, that people who are deaf since birth, often also have problems with written language since its grammar is completely different from that of the sign language. Easy-to-read text should help as well as meaningful symbols which is crucial for people who suffer from learning and cognitive disabilities or who are not completely familiar with the interface language. Finally it must be stated, that there are a lot of different user groups for accessible applications which need to be considered and an adaptation which provides an improvement for one group might even negatively influence the performance of another group [WAI12]. However, it is advisable, especially if the monetary resources for user tests are limited, that user tests involve users who suffer from different kinds of disabilities. For example, many elderly people are not very familiar with handing a computer. Therefore, they are a good indicator for a good user interface in general. Besides that, they may have limited dexterity through arthritis and also might have low vision and are hard of hearing. A user interface which works for this kind of user group very well might be a big step in the right direction towards an accessible application or website. However, it does not solve issues related to for example use Braille display users. The information needs to be adapted and structured differently to assure a flawless interaction with nowadays screen readers. [IBM12] The need for a broad and widespread composition of the target group might not be true for very specific applications, for example for assistive technology itself. These products are supposed to focus on a very small and specific target group and do not need to deal with other kinds of disabilities. Hence, they do not need to include groups suffering from different kinds of disabilities.
What about alternative techniques such as remote user evaluation techniques?
Sometimes limitation and restrictions arise in terms of time, funding and other kinds of resources which prevent administrators from running tests neither at a lab nor at the user’s location. In these cases remote user evaluations can fulfill parts of traditional user evaluation methods, however it is highly advisable to prefer the other evaluation methods when possible. Proven approaches to conduct user evaluations are[POW09]:
- Remote questionnaires/surveys: the display of questions is triggered by actions in the user interface
- Remote control evaluation: the user’s environment is especially equipped for synchronous or asynchronous recording of the user test session.
- Video conferencing as an extension of the usability laboratory: the user performing the test is connected with an evaluator at a remote site through with a video conferencing tool.
- Instrumented remote evaluation: user applications are prepared and enriched to provide additional functionality for recording the test session and the user workflow.
- Semi-instrumented remote evaluation: users are trained to reveal user interface and accessibility flaws and record positive and negative experiences with the concept of operations.
The last decade researchers and practitioners tried to improve remote user evaluation to create a feasible user evaluation methodology even when traditional approaches are not possible. Earlier days, remote user evaluation was undertaken too, but most of these solutions required some kind of tool support which needed to be individually adapted to the product which was supposed to be tested. These days, there are ambitious efforts to build unified frameworks and tool support to help evaluators to reduce costs for carrying out remote user evaluations, like Power et al. [POW09].
How would you verify the results obtained by remote user evaluation?
In general, it is not advisable to undertake only a remote user evaluation. A lot of potentially usefully information may get lost if there is no physically present observer who watches the user fulfilling the tasks of the test run. However, as a last resort, remote user evaluation can reveal some usability and accessibility flaws as well. Nevertheless, it is highly beneficial to conduct at least parts of the whole user testing either at the lab or at the user’s location. If the results do not vary too much then one could conclude that the remote user evaluation for this specific kind of application or website can be a feasible option too. [IBM12]
How do you set up and support a user evaluation team?
The size and composition of the user evaluation team highly depends on the methods used. Lab evaluations allow a large group size, while evaluations at the user’s location do not. In particular evaluation teams at the user’s location should have a minimum size of two and the upper limit depends on the user and the boundary conditions regarding the size of the location. At lab evaluations the evaluation team can be bigger and for remote evaluation the group size is not subject to any limitation from a theoretical point of view. However, most important for all types of evaluations is that the roles are clearly defined beforehand and for the first two types, it is practicable if one person sits next to the person to assist the person, give hints and observes the handling at close range. At least one other person is supposed to take notes staying off to the test. [IBM12]
- [CON11]Joshue O Connor, Real World User Testing: An Assessment of User Testing Methodologies in Theory and Practice, 2011
- [IBM12]IBM, conducting user evaluations with people with disabilities, http://www-03.ibm.com/able/resources/ueplansessions.html#13, 2012
- [POW09]Power, Christopher and Petrie, Helen and Mitchell, Richard, Stephanidis, Constantine, A Framework for Remote User Evaluation of Accessibility and Usability of Websites, Universal Access in Human-Computer Interaction. Addressing Diversity, Springer Berlin Heidelberg, http://dx.doi.org/10.1007/978-3-642-02707-9_67, 2009
- [WAI12]W3C - Web Accessibility Initiative, Involving Users in Web Projects for Better, Easier Accessibility, http://www.w3.org/WAI/users/involving.html, 2012
Back to the list of topics.