Pronunciation Overview

Introduction

Most people who are blind rely on text-to-speech (TTS) software called screen readers. Some people with cognitive disabilities who have difficulty processing written text also use screen readers. Text-to-speech is essential for people with disabilities and useful for all.

TTS is now widely used in popular applications such as voice assistants. Many computers and mobile devices today have built in text-to-speech functionality that is used by people without disabilities in different situations, such as when they lose their glasses or their eyes are tired.

Accurate pronunciation is essential in many situations, such as education and assessment (testing students).

Currently text-to-speech pronunciation is often inaccurate and inconsistent because of technology limitations. For example, incorrect pronunciation may be based on context, regional variation, or emphasis.

W3C is developing normative specifications (standards) and guidance on best practices so that text-to-speech (TTS) synthesis pronounce HTML content (for example, web pages) correctly. This will allow content creators to specify how words should be pronounced.

Explainer and User Scenarios

Explainer: Improving Spoken Presentation on the Web provides an overview of the work. It:

Briefly introduces the context for W3C work on pronunciation
Describes the advantages and disadvantages of two approaches
Poses questions for additional input

Pronunciation User Scenarios provides examples of:

End-users, including screen reader users
Content providers, including educators
Software developers, including content management systems

Exploring Technical Solutions

The Pronunciation Task Force has been exploring technical options for content authors to provide pronunciation information. A challenge is developing a solution that will be used by screen readers. One aspect of that work is analyzing how required features for accurate pronunciation are covered in existing technical specifications, including HTML5.

Pronunciation Gap Analysis and Use Cases provides details on the analysis. It:

Provides more detailed context
Describes required features for pronunciation and spoken presentation
Describes specific implementation approaches for introducing presentation authoring markup into HTML5 (called “use cases”)
Provides a gap analysis
Describes how the required features may be met by existing approaches

Specification for Spoken Presentation in HTML provides details on two markup approaches. Both satisfy the requirements and provide consistent results. We seek feedback from authors and implementors on which approach would be most implementable across all spoken presentation applications.

Who Develops the Pronunciation Documents

Pronunciation documents are developed by the Pronunciation Task Force of the Accessible Platform Working Group (APA WG), which is part of the World Wide Web Consortium (W3C) Web Accessibility Initiative (WAI). For more information about the Task Force, see the Pronunciation Task Force page.

Contributing to the Work

Opportunities for contributing to Pronunciation and other WAI work are introduced in Participating in WAI.

To get notifications of drafts for review, see Get WAI News for links to WAI tweets, RSS feed, and WAI Interest Group (WAI IG) emails. An email address for sending comments on the pronunciation documents is included in the “Status of this Document” sections.

If you are interested in becoming a participant of the Pronunciation Task Force or have questions regarding its work, e-mail Task Force Facilitators Irfan Ali and Paul Grenier, and W3C Staff Contact Roy Ran.