AI, Machine Learning and Accessibility

Purpose and Scope

This wiki page serves as a planning ground for Task force work on artificial intelligence, machine learning, and Web accessibility.

Types of AI

Reactive Machine AI

Reactive machines are AI systems with no memory and are designed to perform a very specific task. Since they can’t recollect previous outcomes or decisions, they only work with presently available data.

Limited Memory AI

Unlike Reactive Machine AI, this form of AI can recall past events and outcomes and monitor specific objects or situations over time. Limited Memory AI can use past- and present-moment data to decide on a course of action most likely to help achieve a desired outcome. However, while Limited Memory AI can use past data for a specific amount of time, it can’t retain that data in a library of past experiences to use over a long-term period. As it’s trained on more data over time, Limited Memory AI can improve in performance.

Theory of Mind AI

Theory of Mind functionality would understand the thoughts and emotions of other entities.

Self-Aware AI

If ever achieved, it would have the ability to understand its own internal conditions and traits along with human emotions and thoughts. It would also have its own set of emotions, needs and beliefs.

3 types of AI vs disability requirements:

Process Automation: advancement in technology & industry such as tools.
Analytics & insight: data & programmatic directives.
Human engagement & individual requirements: user adaption & machine self-learning.

Practical Applications of AI technologies:

1. Computer vision applications: • Image recognition and classification • Object detection • Object tracking • Facial recognition • Content-based image retrieval

2. Robotic applications: • Repetitive tasks • Assistance & monitoring • Industrial applications • Smart home applications

3. Expert systems • Decision making • Complex problem solving • Data evaluation & simulation

What areas of WCAG could AI help with

There are many AAA Criterion with W3C’s WCAG that are often too expensive or hard to implement as well as many level A and AA criterion that can be improved with the use of AI. Many of these criterions could be addressed with AI. These criterions are separated out into mobility, cognitive, visual, and auditory issues.

Principle 1 – Perceivable

Guideline 1.1 – Text Alternatives

Provide text alternatives for any non-text content so that it can be changed into other forms people need, such as large print, braille, speech, symbols or simpler language.

1.1.1 Non-text Content Level A All non-text content that is presented to the user has a text alternative that serves the equivalent purpose, except for the situations listed below.

Potential: auto-generated alt text could ensure that images are well represented as alt text Implementation: some tools do have automated alternative text Effectiveness: currently automated alternative text is not at a level of quality that is usable, manual alternative text is still require Possible Shortcomings: Like live captioning, the auto generated alt text would need to be checked to make sure it makes sense and for any mistakes that could be present.

Guideline 1.2 – Time-based Media

Provide alternatives for time-based media.

1.2.1 Audio-only and Video-only (Prerecorded) Level A For prerecorded audio-only and prerecorded video-only media, the following are true, except when the audio or video is a media alternative

1.2.2 Captions (Prerecorded) Level A Captions are provided for all prerecorded audio content in synchronized media, except when the media is a media alternative for text and is clearly labelled as such.

1.2.3 Audio Description or Media Alternative (Prerecorded) Level A An alternative for time-based media or audio description of the prerecorded video content is provided for synchronized media, except when the media is a media alternative for text and is clearly labelled as such.

1.2.4 Captions (Live) Level AA Captions are provided for all live audio content in synchronized media. Potential: automated captions can provide support to people who are Deaf or hard of hearing by automatically converting speech to text in real time Implementation: some tools and social media platforms provide auto-generated captions. Effectiveness: Research suggest 85-90% accuracy so it is usable as a supporting mechanism, but not at a level of quality for reliable and consistent accuracy. Quality also varies significantly depending on audio quality.

1.2.5 Audio Description (Prerecorded) Level AA Audio description is provided for all prerecorded video content in synchronized media.

1.2.6 Sign Language (Prerecorded) Level AAA Sign language interpretation is provided for all prerecorded audio content in synchronized media. Potential: To implement sign language, you would need to hire an interpreter and record a video of the interpreter signing out the website or video, which is expensive depending on how large the website is. There are also numerous different sign languages for different countries. AI could change that by having programs analyse the video or website and translate it into sign language that Deaf or hard-of-hearing people can understand. Implementation: While it will not be implemented within websites, it could be a separate program, like screen readers, that translates the website into sign language. Possible Shortcomings: It could lead to a whole new set of criterions for websites to follow, Like 2.3.3 Animation from interactions.

1.2.7 Extended Audio Description (Prerecorded) Level AAA Where pauses in foreground audio are insufficient to allow audio descriptions to convey the sense of the video, extended audio description is provided for all prerecorded video content in synchronized media.

1.2.8 Media Alternative (Prerecorded) Level AAA An alternative for time-based media is provided for all prerecorded synchronized media and for all prerecorded video-only media.

1.2.9 Audio-only (Live) Level AAA An alternative for time-based media that presents equivalent information for live audio-only content is provided.

Guideline 1.3 – Adaptable

Create content that can be presented in different ways (for example simpler layout) without losing information or structure.

1.3.1 Info and Relationships Level A Information, structure, and relationships conveyed through presentation can be programmatically determined or are available in text. The issue of bold text being used as a heading could potentially be spotted by AI and be marked up correctly as a heading.

1.3.2 Meaningful Sequence Level A When the sequence in which content is presented affects its meaning, a correct reading sequence can be programmatically determined.

1.3.3 Sensory Characteristics Level A Instructions provided for understanding and operating content do not rely solely on sensory characteristics of components such as shape, color, size, visual location, orientation, or sound. Note 1: For requirements related to color, refer to Guideline 1.4.

1.3.4 Orientation Level AA (Added in 2.1) Content does not restrict its view and operation to a single display orientation, such as portrait or landscape, unless a specific display orientation is essential.

1.3.5 Identify Input Purpose Level AA (Added in 2.1) The purpose of each input field collecting information about the user can be programmatically determined. Potential: text to speech AI implementation - Ai could read into the page within the link and give a general purpose

1.3.6 Identify Purpose Level AAA(Added in 2.1) In content implemented using markup languages, the purpose of user interface components, icons, and regions can be programmatically determined.

Guideline 1.4 – Distinguishable

Make it easier for users to see and hear content including separating foreground from background.

1.4.1 Use of Color Level A Color is not used as the only visual means of conveying information, indicating an action, prompting a response, or distinguishing a visual element. Note 1: This success criterion addresses color perception specifically. Other forms of perception are covered in Guideline 1.3 including programmatic access to color and other visual presentation coding. AI may be able to detect the use of color to indicate a change and automatically add another mechanism e.g. a box around the element that appears when changing.

1.4.2 Audio Control Level A If any audio on a Web page plays automatically for more than 3 seconds, either a mechanism is available to pause or stop the audio, or a mechanism is available to control audio volume independently from the overall system volume level.

1.4.3 Contrast (Minimum) Level AA The visual presentation of text and images of text has a contrast ratio of at least 4.5:1. AI could enforce the 4.5:1 ratio by pushing out poorly contrasting colours but retaining a similar look.

1.4.4 Resize Text Level AA Except for captions and images of text, text can be resized without assistive technology up to 200 percent without loss of content or functionality.

1.4.5 Images of Text Level AA If the technologies being used can achieve the visual presentation, text is used to convey information rather than images of text. Potential: Images of text are already problematic due to screen readers not being able to pick up the text within the image. AI could fix this by deconstructing the image and place it into a pure text form, which websites could set up as a button near the image for screen readers to pick up.

1.4.6 Contrast (Enhanced) Level AAA The visual presentation of text and images of text has a contrast ratio of at least 7:1.

1.4.7 Low or No Background Audio Level AAA For prerecorded audio-only content that (1) contains primarily speech in the foreground, (2) is not an audio CAPTCHA or audio logo, and (3) is not vocalization intended to be primarily musical expression such as singing or rapping.

1.4.8 Visual Presentation Level AAA For the visual presentation of blocks of text, a mechanism is available.

1.4.9 Images of Text (No Exception) Level AAA Images of text are only used for pure decoration or where a particular presentation of text is essential to the information being conveyed. Note 1: Logotypes (text that is part of a logo or brand name) are considered essential.

1.4.10 Reflow Level AA (Added in 2.1) Content can be presented without loss of information or functionality, and without requiring scrolling in two dimensions.

1.4.11 Non-text Contrast Level AA (Added in 2.1) The visual presentation of the following has a contrast ratio of at least 3:1 against adjacent color(s). AI could detect low contrast interface elements and improve their contrast.

1.4.12 Text Spacing Level AA (Added in 2.1) In content implemented using markup languages that support the following text style properties, no loss of content or functionality.

1.4.13 Content on Hover or Focus Level AA (Added in 2.1) Where receiving and then removing pointer hover or keyboard focus triggers additional content to become visible and then hidden.

Principle 2 – Operable

User interface components and navigation must be operable.

Guideline 2.1 – Keyboard Accessible

Make all functionality available from a keyboard.

2.1.1 Keyboard Level A All functionality of the content is operable through a keyboard interface without requiring specific timings for individual keystrokes, except where the underlying function requires input that depends on the path of the user's movement and not just the endpoints. Note 1: This exception relates to the underlying function, not the input technique. For example, if using handwriting to enter text, the input technique (handwriting) requires path-dependent input but the underlying function (text input) does not. Note 2: This does not forbid and should not discourage providing mouse input or other input methods in addition to keyboard operation.

2.1.2 No Keyboard Trap Level A If keyboard focus can be moved to a component of the page using a keyboard interface, then focus can be moved away from that component using only a keyboard interface, and, if it requires more than unmodified arrow or tab keys or other standard exit methods, the user is advised of the method for moving focus away. Note 1: Since any content that does not meet this success criterion can interfere with a user's ability to use the whole page, all content on the Web page (whether it is used to meet other success criteria or not) must meet this success criterion. See Conformance Requirement 5: Non-Interference.

2.1.3 Keyboard (No Exception) Level AAA All functionality of the content is operable through a keyboard interface without requiring specific timings for individual keystrokes.

2.1.4 Character Key Shortcuts Level A (Added in 2.1) If a keyboard shortcut is implemented in content using only letter (including upper- and lower-case letters), punctuation, number, or symbol characters.

Guideline 2.2 – Enough Time

Provide users enough time to read and use content. 2.2.1 Timing Adjustable Level A For each time limit that is set by the content.

2.2.2 Pause, Stop, Hide Level A For moving, blinking, scrolling, or auto-updating information.

2.2.3 No Timing Level AAA Timing is not an essential part of the event or activity presented by the content, except for non-interactive synchronized media and real-time events.

2.2.4 Interruptions Level AAA Interruptions can be postponed or suppressed by the user, except interruptions involving an emergency.

2.2.5 Re-authenticating Level AAA When an authenticated session expires, the user can continue the activity without loss of data after re-authenticating. Potential: An AI tool could remember what options you took when entering a timed form and re-authenticate it so you do not have to.

2.2.6 Timeouts Level AAA (Added in 2.1) Users are warned of the duration of any user inactivity that could cause data loss unless the data is preserved for more than 20 hours when the user does not take any actions.

Guideline 2.3 – Seizures and Physical Reactions

Do not design content in a way that is known to cause seizures or physical reactions. 2.3.1 Three Flashes or Below Threshold Level A Web pages do not contain anything that flashes more than three times in any one second period, or the flash is below the general flash and red flash thresholds. Note 1: Since any content that does not meet this success criterion can interfere with a user's ability to use the whole page, all content on the Web page (whether it is used to meet other success criteria or not) must meet this success criterion. See Conformance Requirement 5: Non-Interference.

2.3.2 Three Flashes Level AAA Web pages do not contain anything that flashes more than three times in any one second period. Potential: AI could analyse videos and automatically slow down areas that flash down to 3 per second or warn users of flashes when users check a setting.

2.3.3 Animation from Interactions Level AAA (Added in 2.1) Motion animation triggered by interaction can be disabled, unless the animation is essential to the functionality, or the information being conveyed.

Guideline 2.4 – Navigable

Provide ways to help users navigate, find content, and determine where they are. 2.4.1 Bypass Blocks Level A A mechanism is available to bypass blocks of content that are repeated on multiple Web pages.

2.4.2 Page Titled Level A Web pages have titles that describe topic or purpose.

2.4.3 Focus Order Level A If a Web page can be navigated sequentially and the navigation sequences affect meaning or operation, focusable components receive focus in an order that preserves meaning and operability.

2.4.4 Link Purpose (In Context) Level A The purpose of each link can be determined from the link text alone or from the link text together with its programmatically determined link context, except where the purpose of the link would be ambiguous to users in general.

2.4.5Multiple Ways Level AA More than one way is available to locate a Web page within a set of Web pages except where the Web Page is the result of, or a step in, a process.

2.4.6 Headings and Labels Level AA Headings and labels describe topic or purpose.

2.4.7 Focus Visible Level AA Any keyboard operable user interface has a mode of operation where the keyboard focus indicator is visible.

2.4.8 Location Level AAA Information about the user's location within a set of Web pages is available.

2.4.9 Link Purpose (Link Only) Level AAA A mechanism is available to allow the purpose of each link to be identified from link text alone, except where the purpose of the link would be ambiguous to users in general.

2.4.10 Section Headings Level AAA Section headings are used to organize the content. Note 1: "Heading" is used in its general sense and includes titles and other ways to add a heading to different types of content. Note 2: This success criterion covers sections within writing, not user interface components. User Interface components are covered under Success Criterion 4.1.2.

2.4.11 Focus Not Obscured (Minimum) Level AA (Added in 2.2) When a user interface component receives keyboard focus, the component is not entirely hidden due to author-created content. Note 1: Where content in a configurable interface can be repositioned by the user, then only the initial positions of user-movable content are considered for testing and conformance of this Success Criterion. Note 2: Content opened by the user may obscure the component receiving focus. If the user can reveal the focused component without advancing the keyboard focus, the component with focus is not considered hidden due to author-created content.

2.4.12 Focus Not Obscured (Enhanced) Level AAA (Added in 2.2) When a user interface component receives keyboard focus, no part of the component is hidden by author-created content.

2.4.13 Focus Appearance Level AAA (Added in 2.2) When the keyboard focus indicator is visible.

Guideline 2.5 – Input Modalities

Make it easier for users to operate functionality through various inputs beyond keyboard.

2.5.1 Pointer Gestures Level A (Added in 2.1) All functionality that uses multipoint or path-based gestures for operation can be operated with a single pointer without a path-based gesture, unless a multipoint or path-based gesture is essential.

2.5.2 Pointer Cancellation Level A (Added in 2.1) For functionality that can be operated using a single pointer.

2.5.3 Label in Name Level A (Added in 2.1) For user interface components with labels that include text or images of text, the name contains the text that is presented visually.

2.5.4 Motion Actuation Level A (Added in 2.1) Functionality that can be operated by device motion or user motion can also be operated by user interface components and responding to the motion can be disabled to prevent accidental actuation.

2.5.5 Target Size (Enhanced) Level AAA (Added in 2.1) The size of the target for pointer inputs is at least 44 by 44 CSS pixels.

2.5.6 Concurrent Input Mechanisms Level AAA (Added in 2.1) Web content does not restrict use of input modalities available on a platform except where the restriction is essential, required to ensure the security of the content, or required to respect user settings.

2.5.7 Dragging Movements Level AA (Added in 2.2) All functionality that uses a dragging movement for operation can be achieved by a single pointer without dragging, unless dragging is essential or the functionality is determined by the user agent and not modified by the author. Note: This requirement applies to web content that interprets pointer actions (i.e. this does not apply to actions that are required to operate the user agent or assistive technology).

2.5.8 Target Size (Minimum) Level AA (Added in 2.2) The size of the target for pointer inputs is at least 24 by 24 CSS pixels.

Principle 3 – Understandable

Information and the operation of the user interface must be understandable.

Guideline 3.1 – Readable

Make text content readable and understandable.

3.1.1 Language of Page Level A The default human language of each Web page can be programmatically determined.

3.1.2 Language of Parts Level AA The human language of each passage or phrase in the content can be programmatically determined except for proper names, technical terms, words of indeterminate language, and words or phrases that have become part of the vernacular of the immediately surrounding text.

3.1.3 Unusual Words Level AAA A mechanism is available for identifying specific definitions of words or phrases used in an unusual or restricted way, including idioms and jargon.

3.1.4 Abbreviations Level AAA A mechanism for identifying the expanded form or meaning of abbreviations is available.

3.1.5 Reading Level AAA When text requires reading ability more advanced than the lower secondary education level after removal of proper names and titles, supplemental content, or a version that does not require reading ability more advanced than the lower secondary education level, is available.

3.1.6 Pronunciation Level AAA A mechanism is available for identifying specific pronunciation of words where meaning of the words, in context, is ambiguous without knowing the pronunciation.

Guideline 3.2 – Predictable

Make Web pages appear and operate in predictable ways.

3.2.1 On Focus Level A When any user interface component receives focus, it does not initiate a change of context.

3.2.2 On Input Level A Changing the setting of any user interface component does not automatically cause a change of context unless the user has been advised of the behaviour before using the component.

3.2.3 Consistent Navigation Level AA Navigational mechanisms that are repeated on multiple Web pages within a set of Web pages occur in the same relative order each time they are repeated, unless a change is initiated by the user.

3.2.4 Consistent Identification Level AA Components that have the same functionality within a set of Web pages are identified consistently.

3.2.5 Change on Request Level AAA Changes of context are initiated only by user request, or a mechanism is available to turn off such changes.

3.2.6 Consistent Help Level A (Added in 2.2) If a Web page contains any of the following help mechanisms, and those mechanisms are repeated on multiple Web pages within a set of Web pages, they occur in the same order relative to other page content.

Guideline 3.3 – Input Assistance

Help users avoid and correct mistakes.

3.3.1 Error Identification Level A If an input error is automatically detected, the item that is in error is identified and the error is described to the user in text.

3.3.2 Labels or Instructions Level A Labels or instructions are provided when content requires user input.

3.3.3 Error Suggestion Level AA If an input error is automatically detected and suggestions for correction are known, then the suggestions are provided to the user, unless it would jeopardize the security or purpose of the content.

3.3.4 Error Prevention (Legal, Financial, Data) Level AA For Web pages that cause legal commitments or financial transactions for the user to occur, that modify or delete user-controllable data in data storage systems, or that submit user test responses.

3.3.5 Help Level AAA Context-sensitive help is available.

3.3.6 Error Prevention (All) Level AAA For Web pages that require the user to submit information.

3.3.7 Redundant Entry Level A (Added in 2.2) Information previously entered by or provided to the user that is required to be entered again in the same process.

3.3.8 Accessible Authentication (Minimum) Level AA (Added in 2.2) A cognitive function test (such as remembering a password or solving a puzzle) is not required for any step in an authentication process.

3.3.9 Accessible Authentication (Enhanced) Level AAA (Added in 2.2) A cognitive function test (such as remembering a password or solving a puzzle) is not required for any step in an authentication process.

Principle 4 – Robust

Content must be robust enough that it can be interpreted by a wide variety of user agents, including assistive technologies.

Guideline 4.1 Compatible

Maximize compatibility with current and future user agents, including assistive technologies.

4.1.2 Name, Role, Value Level A For all user interface components (including but not limited to: form elements, links and components generated by scripts), the name and role can be programmatically determined; states, properties, and values that can be set by the user can be programmatically set; and notification of changes to these items is available to user agents, including assistive technologies. Note 1: This success criterion is primarily for Web authors who develop or script their own user interface components. For example, standard HTML controls already meet this success criterion when used according to specification.

4.1.3 Status Messages Level AA (Added in 2.1) In content implemented using markup languages, status messages can be programmatically determined through role or properties such that they can be presented to the user by assistive technologies without receiving focus.

AI Crossover Work

The influence of AI into areas of everyday life increases more platforms and devices that will require accessibility assessments. Artificial Intelligence:

AR/VR
Implant devices
Automotive/transportation
Wearables

Relevant work within RQTF may include

RAUR
XAUR
CTAUR
Remote Meetings
MAUR

External Work

Anthropic Location: Fully Remote Company size: 51 - 200 employees Anthropic develops Claude, an AI assistant used for answering questions, summarizing text and images, or generating new content. The company also provides access to the Claude API, where users can build and launch their own generative AI tools. Additionally, Anthropic focuses on conducting AI safety research to make the development and deployment of its systems, such as Claude, more reliable.

IBM Location: Armonk, New York Company size: 10,001+ employees IBM offers a suite of AI-based solutions centred around its AI assistant IBM Watson. IBM Watson Orchestrate specializes in automating tasks and workflows, so teams can redirect resources toward more pressing matters and boost their production. Meanwhile, IBM Watson Code Assistant can offer recommendations to developers, speeding up the coding process and reducing errors.

NATE Location: New York, New York Company size: 51 - 200 employees Nate operates an AI-powered app that incorporates products from websites across the internet and makes them available for purchase in one convenient location. The Nate app allows users to consolidate their Favorite items into lists on their Nate dashboard and click a button to purchase while the AI handles all checkout and shipping steps on its own. Users can also split payments on items purchased through Nate into four instalments. Ascent* Location: Chicago, Illinois Company size: 11 - 50 employees Ascent is an AI-powered regulatory platform that identifies the regulations a company must comply with and keeps them updated as the rules change in the financial sector. Ascent’s platform uses AI to constantly monitor for rule changes and quickly alert the proper people to any compliance issues.

Note: Ascents goals are limited to the financial sector but can easily be expanded to the accessibility sector.

3Play Media Location: Boston, Massachusetts Company size: 51 - 200 employees 3Play Media provides services to make online videos more accessible using a combination of human expertise and automated machine learning technology. For example, the company’s live automatic captioning service relies on automatic speech recognition technology to generate text in real-time.

Kin + Carta Location: London, United Kingdom Company size: 1,001 - 5,000 employees Digital transformation company Kin + Carta uses AI in a variety of contexts, from personalization for B2B to product data analysis. The company specializes in “intelligent experiences,” which are digital experiences wherein the user is supplied with all the data they need, rather than having to seek it out. AI facilitates this data personalization and availability.

Durable Location: Louisville, Colorado Company size: 2 - 10 employees Durable aims to make custom software more accessible by using AI systems that can reason and engage in dialogue in the same way as humans. The company says it is working to bring together the capabilities of deep learning and symbolic AI with the ultimate goal of developing a style of artificial intelligence designed to improve long-term software reliability.

Open AI Location: San Francisco, California Company size: 51 - 200 employees OpenAI is a nonprofit research company with a mission to create artificial general intelligence, similar to human beings. OpenAI’s ChatGPT is an AI chatbot that has been trained to engage in human-like virtual interactions. With a focus on long-term research and transparency, OpenAI aims to advance AGI safely and responsibly. The company’s sponsors have included Amazon, Microsoft, Elon Musk and Reid Hoffman.

Useful Tools

Sourced from-Designing Generative AI to Work for People with Disabilities (hbr.org) Then consider the following list of basic requirements for inclusive interfaces: Keyboard Navigation Assess all functionalities of the chatbot, including menus, options, and buttons based on WCAG fundamentals for keyboard compatibility. Ask: What design elements would make these accessible to most? What add-ons can be personalized for others? One example of such an add-on is ChatGPT Let’s Talk, a Chrome extension that adds keyboard shortcuts to the interface, which allows users to talk to and hear AI-generated responses in the site.

Alternative Text Provide contextual descriptions so that visually impaired users who are using an audio interface with a generative AI tool such as ChatGPT can more fully understand the content. For example, Microsoft’s Bot Framework for developers provides guidelines and features that support the inclusion of alternative text.

Voice-Enabled Interface/Speech-to-Text Integrate voice-enabled interfaces that enable individuals with a broad range of disabilities (e.g., mobility or motor, visual, cognitive, physical disabilities) to interact with generative AI. For instance, Google’s Dialogflow has built-in integration with Google Cloud Speech-to-Text API, allowing developers to create chatbots that support voice-enabled input.

Text/Image-to-Speech Include accessibility features such as image or text-to-speech technology to support people with dyslexia or vision or motor impairments. One example is Be My Eyes Virtual Volunteer. Powered by OpenAI’s GPT-4 language model, it allows users to send images via the app to an AI-powered Virtual Volunteer, which will answer any question about that image and provide instantaneous visual assistance for a wide variety of tasks.

Colour Contrast, Dyslexia-Friendly Fonts, and Clear Language Use a high-contrast design of the interface so visually impaired users can differentiate among elements. People with dyslexia will benefit from the improved readability of text and the enhanced user experience when you use a plug-in like Dyslexie Font. In addition, it’s important to use clear and concise language in how to use the chatbot and in the chatbot’s responses (in the languages provided) to enable users with cognitive disabilities to understand the conversation easily.