Pronunciation

Roy Ran

ran@w3.org

W3C Team contact

International standards harmonization in China

Slides available on line:

https://www.w3.org/People/Roy/Talks/pronunciation/

Pronunciation Examples

How is my last name "Ran" pronounced?
Same word in different phoneme: The farm was used to produce produce;
An amusing story from my English teacher. What is the "Two to two to two two"?

The Problem

Text-to-speech pronunciation is often inaccurate and inconsistent because of technology limitations.

The Need

Accurate, consistent pronunciation of content spoken by text-to-speech (TTS) synthesis

Screen readers used by people who are blind
Read Aloud tools that assist language learners and people with cognitive and learning disabilities
Voice-based assistants

Goals

Define a standard way for content authors to provide pronunciation information in HTML
Leverage Speech Synthesis Markup Language (SSML), if possible
Solution that will be used by screen readers

Features

SSML features critical for implementation:

Language
Voice Family / Gender
Phonetic Pronunciation of String Values
String Substitution
Rate / Pitch / Volume
Emphasis
Say As
Pausing

Technical Approaches Considered

Two front runners:

In-line SSML within web content
Attribute-based model of SSML

Examples on last page of Slides

Working Draft Publications

Explainer: Improving Spoken Presentation on the Web overview of the work
Pronunciation User Scenarios provides examples of:
- End-users, including screen reader users
- Content providers, including educators
- Software developers, including content managements systems
Pronunciation Gap Analysis and Use Cases more detailed context, analysis of potential technical approaches, and more.

Start Here

Pronunciation Overview page:

https://www.w3.org/WAI/pronunciation/

Join the Task Forces

Interested? Join the Task Forces

Email Roy Ran: ran@w3.org

Your Input

Questions?
Comments?

Technical Examples

In-line SSML in an HTML fragment is shown below:

The farm was used to produce produce

The farm was used to <speak><phoneme ph="prəˈd(j)us">produce</phoneme></speak> <speak><say-as phoneme ph="ˈproʊd(j)us">produce</phoneme></speak>

Attribute based model of SSML:

Train stopped at that station: "Two to two to two two"

<span data-ssml='{"say-as" : {"interpret-as":"time: 2 minutes to 2"}}'>two to two</span> to <span data-ssml='{"say-as" : {"interpret-as":"time: 2 past 2"}}'>two two</span>.