Pronunciation Lexicon Specification (PLS) Version 1.0

W3C Working Draft 31 January 2006

This version:: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/
Latest version:: http://www.w3.org/TR/pronunciation-lexicon/
Previous version:: http://www.w3.org/TR/2005/WD-pronunciation-lexicon-20050214/
Editor:: Paolo Baggia, Loquendo

Abstract

This document defines the syntax for specifying pronunciation lexicons to be used by Automatic Speech Recognition and Speech Synthesis engines in voice browser applications.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document is a Last Call Working Draft of the Pronunciation Lexicon specification (PLS) Version 1.0, and has been produced as part of the W3C Voice Browser Activity, following the procedures set out for the W3C Process. The authors of this document are members of the Voice Browser Working Group (W3C Members only).

The main changes in this version of the PLS are as follows: support for the IPA phonetic alphabet is now mandatory. There is also a new section on multiple pronunciations, clarifying the use of the "prefer" attribute. A lot of the previous text has been corrected or clarified, and a glossary of terms has been added. Finally, schemas for PLS in XML Schema and RELAXNG are now provided.

This is a W3C Last Call Working Draft for review by W3C Members and other interested parties. Last Call means that the Working Group believes that this specification is technically sound and therefore wishes this to be the Last Call for comments. If the feedback is positive, the Working Group plans to submit it for consideration as a W3C Candidate Recommendation. Comments can be sent until 15 March 2006.

This document is for public review. Comments and discussion are welcomed on the public mailing list < www-voice@w3.org >. To subscribe, send an email to <www-voice-request@w3. org> with the word subscribe in the subject line (include the word unsubscribe if you want to unsubscribe). The archive for the list is accessible on-line.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced under the 5 February 2004 W3C Patent Policy. The Working Group maintains a public list of patent disclosures relevant to this document; that page also includes instructions for disclosing [and excluding] a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) with respect to this specification should disclose the information in accordance with section 6 of the W3C Patent Policy.

Per section 4 of the W3C Patent Policy, Working Group participants have 150 days from the title page date of this document to exclude essential claims from the W3C RF licensing requirements with respect to this document series. Exclusions are with respect to the exclusion reference document, defined by the W3C Patent Policy to be the latest version of a document in this series that is published no later than 90 days after the title page date of this document.

1. Introduction to Pronunciation Lexicon Specification
- 1.1 How TTS Uses the PLS
- 1.2 How ASR Uses the PLS
- 1.3 How VoiceXML Applications Use the PLS
- 1.4 What PLS does not Support
- 1.5 Glossary of Terms
2. Pronunciation Alphabets
3. Conformance
4. Pronunciation Lexicon Markup Language Definition
- 4.1 <lexicon> Element
- 4.2 <meta> Element
- 4.3 <metadata> Element
- 4.4 <lexeme> Element
- 4.5 <grapheme> Element
- 4.6 <phoneme> Element
- 4.7 <alias> Element
- 4.8 <example> Element
- 4.9 Multiple Pronunciations for ASR and TTS
  - 4.9.1 Multiple Pronunciations for ASR
  - 4.9.2 Multiple Pronunciations for TTS
  - 4.9.3 Examples of Multiple Pronunciations
5. Examples
- 5.1 Simple Use Case
- 5.2 Multiple Pronunciations
- 5.3 Multiple Orthographies
- 5.4 Homophones
- 5.5 Homographs
- 5.6 Pronunciation by Orthography (Acronyms, Abbreviations, etc.)
6. References
- 6.1 Normative References
- 6.2 Informative References
7. Acknowledgements
Appendix A. Schema for Pronunciation Lexicon Specification (normative)
Appendix B. MIME Type and File Suffix (normative)

1. Introduction to Pronunciation Lexicon Specification

This section is informative.

The accurate specification of pronunciation is critical to the success of speech applications. Most Automatic Speech Recognition (ASR) and Text-To-Speech (TTS) engines internally provide extensive high quality lexicons with pronunciation information for many words or phrases. To ensure a maximum coverage of the words or phrases used by an application, application-specific pronunciations may be required. For example, these may be needed for proper nouns such as surnames or business names.

The Pronunciation Lexicon Specification (PLS) is designed to enable interoperable specification of pronunciation information for both ASR and TTS engines within voice browsing applications. The language is intended to be easy to use by developers while supporting the accurate specification of pronunciation information for international use.

The language allows one or more pronunciations for a word or phrase to be specified using a standard pronunciation alphabet or if necessary using vendor specific alphabets. Pronunciations are grouped together into a PLS document which may be referenced from other markup languages, such as the Speech Recognition Grammar Specification [SRGS] and the Speech Synthesis Markup Language [SSML].

1.1. How TTS Uses the PLS

A TTS engine aims to transform input content (either text or markup, such as SSML) into speech. This activity involves several processing steps:

Text normalization
Word pronunciation (lexical stress, phonetic transcription)
Sentence structure (intonation, rhythm)
Sentence level modification in phonetic transcription (co-articulation)
Computation of prosodic parameters
Generation of the acoustic signal

SSML enables a user to control and enhance TTS activity by acting through SSML elements on these levels of processing (see [SSML] for details).

The PLS is the standard format of the documents referenced by the <lexicon> element of SSML (see [SSML], section 3.1.4).

The following is a simple example of an SSML document. It includes an Italian movie title and the name of the director to be read in US English.

<?xml version="1.0" encoding="UTF-8"?>
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" 
    xml:lang="en-US">
    
    The title of the movie is: "La vita e' bella" (Life is beautiful),
    which is directed by Roberto Benigni. 
</speak>

To be pronounced correctly the Italian title and the director's name might include the pronunciation inline in the SSML document.

<?xml version="1.0" encoding="UTF-8"?>
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" 
    xml:lang="en-US">
    
    The title of the movie is: 
    <phoneme alphabet="ipa" ph="&#x2C8;l&#x251; &#x2C8;vi&#x2D0;&#x27E;&#x259;
     &#x2C8;&#x294;e&#x26A; &#x2C8;b&#x25B;l&#x259;"> La vita e' bella </phoneme>
    <!-- The IPA pronunciation is "ˈlɑ ˈviːɾə ˈʔeɪ ˈbɛlə" --> 
    (Life is beautiful),
    which is directed by 
    <phoneme alphabet="ipa" ph="&#x279;&#x259;&#x2C8;b&#x25B;&#x2D0;&#x279;&#x27E;o&#x28A;
     b&#x25B;&#x2C8;ni&#x2D0;nji">Roberto Benigni.</phoneme>
    <!-- The IPA pronunciation is "ɹəˈbɛːɹɾoʊ bɛˈniːnji" --> 
</speak>

With the use of the PLS, all the pronunciations can be factored out in an external PLS document which is referenced by the <lexicon> element of SSML (see [SSML], section 3.1.4).

<?xml version="1.0" encoding="UTF-8"?>
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" 
    xml:lang="en-US">

    <lexicon uri="http://www.example.com/movie_lexicon.pls"/>

    The title of the movie is: "La vita e' bella" (Life is beautiful),
    which is directed by Roberto Benigni. 
</speak>

The PLS engine will load the external PLS document and transparently apply the pronunciations during the processing of the SSML document. An application may contain several distinct PLS documents to be used in different points of the application. Section 3.1.4 of [SSML] describes how to use more then one PLS document referenced in a SSML document.

1.2. How ASR Uses the PLS

An ASR engine transforms an audio signal into a recognized sequence of words or a semantic representation of the meaning of the utterance (see Semantic Interpretation for Speech Recognition [SISR] for a standard definition of Semantic Interpretation).

An ASR grammar is used to improve ASR performance by describing the possible words and phrases the ASR might recognize. SRGS is the standard definition of ASR grammars (see [SRGS] for details).

PLS may be used by an ASR processor to allow multiple pronunciations of words, phrases and also text normalization, such as acronym expansion and abbreviations.

This is a very simple SRGS grammar that allows the recognition of sentences like "Boston Massachusetts" or "Miami Florida".

<?xml version="1.0" encoding="UTF-8"?>
<grammar xmlns="http://www.w3.org/2001/06/grammar"
  xml:lang="en-US" version="1.0" root="city_state" mode="voice">

  <rule id="city" scope="public">
    <one-of> <item>Boston</item> 
             <item>Miami</item> 
             <item>Fargo</item> </one-of> 
  </rule>
  <rule id="state" scope="public">
    <one-of> <item>Florida</item>
             <item>North Dakota</item>
             <item>Massachusetts</item> </one-of>
  </rule> 
  
  <rule id="city_state" scope="public"> 
     <ruleref uri="#city"/> <ruleref uri="#state"/>
  </rule>
</grammar>

If a pronunciation lexicon is referenced by a SRGS grammar it can allow multiple pronunciations of the word in the grammar to accommodate different speaking styles. See the same grammar with reference to a external PLS document.

<?xml version="1.0" encoding="UTF-8"?>
<grammar xmlns="http://www.w3.org/2001/06/grammar"
  xml:lang="en-US" version="1.0" root="city_state" mode="voice">
  
  <lexicon uri="http://www.example.com/city_lexicon.pls"/>

  <rule id="city" scope="public">
    <one-of> <item>Boston</item> 
             <item>Miami</item> 
             <item>Fargo</item> </one-of> 
  </rule>
  <rule id="state" scope="public">
    <one-of> <item>Florida</item>
             <item>North Dakota</item>
             <item>Massachusetts</item> </one-of>
  </rule> 
  
  <rule id="city_state" scope="public"> 
     <ruleref uri="#city"/> <ruleref uri="#state"/>
  </rule>
</grammar>

Also a SRGS grammar might reference multiple PLS documents.

1.3. How VoiceXML Applications Use the PLS

A VoiceXML 2.0 application ([VXML]) contains SRGS grammars for ASR and SSML prompts for TTS. The introduction of PLS in both SRGS and SSML will directly impact VoiceXML applications.

The benefits described in Section 1.1 and Section 1.2 are also available in VoiceXML applications. The application may use several contextual PLS documents at different points in the interaction, but may also use the same PLS document both in SRGS, to improve ASR, and in SSML, to improve TTS.

1.4. What PLS does not Support

The current specification is focused on the major features described in the requirements document [REQS]. The most complex features have been postponed to a future revision of this specification. Some of the complex features not included, for instance, are the introduction of morphological, syntactic and semantic information associated with pronunciations (such as tense, parts-of-speech, word stems, etc.). Many of these features can be specified using RDF [RDF-XMLSYNTAX] that reference lexemes within one or more pronunciation lexicons.

1.5. Glossary of Terms

Acronym: A word formed from the initial letters of a name, such as PC for Personal Computer, or by combining initial letters or parts of a series of words, such as radar for radio detection and ranging, or variations, such as W3C for World Wide Web consortium.
Acronym expansion: The action of replacing an acronym by the sequence of words it represents. Acronym expansion is typically performed to help TTS engines read acronyms and ASR to recognize them.
ASR, Automatic Speech Recognition, Speech Recognition: The process of using an automatic computation algorithm to analyze spoken utterances to determine what words and phrases or semantic information were present.
Grapheme: One of the set of the smallest units of a written language, such as letters, ideograms, or symbols, that distinguish one word from another; a representation of a single orthographic element.
Homophones: One of a set of words that are pronounced the same way but differ in meaning, origin, and sometimes spelling. E.g. night and knight in English.
Homographs: One of two or more words that have the same spelling but differ in origin, meaning, and sometimes pronunciation. For example, in English the noun refuse has a different pronunciation to the verb refuse, or fils (son) and fils (threads) in French.
International Phonetic Alphabet (IPA): The International Phonetic Alphabet [IPA] is a phonetic alphabet used by linguists to accurately and uniquely represent each of the wide variety of sounds (phones or phonemes) the human vocal apparatus can produce. It is intended as a notational standard for the phonetic representation of all languages.
Lexeme: An atomic unit in a language, like a word or a stem. In this specification, "lexeme" designates a collection of graphemic and phonetic representations of words or phrases.
Lexicon: A collection of words with additional information, such as definition or pronunciation.
Orthography: A notation for writing and displaying words. Orthography includes character sets, whitespace, case sensitivity, diacritics within languages such as Arabic or Persian, and accents within languages such as French. Example orthographies include Romaji, Kanji, and Hiragana.
Phoneme: One of the set of the smallest units of speech that can distinguish words: for example, the English language has over 40 phonemes (19 vowels and 24 consonants). In American English, /t/ and /p/ are phonemes that can distinguish the word tin from pin.
Phonetic alphabet: A set of symbols that represent the sounds in spoken languages, such as English, Chinese, or German, see also International Phonetic Alphabet.
Pronunciation lexicon: A pronunciation lexicon is a lexicon where the additional information relates to the pronunciation of words or phrases.
SAMPA: The Speech Assessment Methods Phonetic Alphabet [SAMPA]: a phonetic alphabet using only ASCII characters, rather than the extended character set used by the International Phonetic Alphabet.
Semantic Interpretation for Speech Recognition [SISR]: A W3C specification defining a process to produce a semantic result representing the meaning of a natural language utterance.
Speech Recognition Grammar Specification [SRGS]: A W3C specification defining a language to describe grammars (words or phrases) that an ASR engine can recognize.
Speech Synthesis Markup Language [SSML]: A W3C XML language for specifying the rendering of text by a TTS engine.
TTS, Text-To-Speech, Speech Synthesis: Converting text into sounds using speech synthesis techniques.

2. Pronunciation Alphabets

A phonemic/phonetic alphabet is used to specify a pronunciation. An alphabet in this context refers to a collection of symbols to represent the sounds of one or more human languages. In the PLS specification the pronunciation alphabet is specified by the alphabet attribute (see Section 4.1 and Section 4.6 for details on the use of this attribute). The only valid values for the alphabet attribute are "ipa" (see the next paragraph) and vendor-defined strings of the form "x-organization" or "x-organization-alphabet". For example, the Japan Electronics and Information Technology Industries Association [JEITA] might wish to encourage the use of an alphabet such as "x-jeita" or "x-jeita-2000" for their phoneme alphabet [JEIDAALPHABET]. Another example might be "x-sampa" [X-SAMPA] an extension of SAMPA phonetic alphabet [SAMPA] to cover the entire range of characters in the International Phonetic Alphabet [IPA].

A compliant PLS processor must support "ipa" as the value of the alphabet attribute. This means that the PLS processor must support the Unicode representations of the phonetic characters developed by the International Phonetic Association [IPA]. In addition to an exhaustive set of vowel and consonant symbols, this character set supports a syllable delimiter, numerous diacritics, stress symbols, lexical tone symbols, intonational markers and more. For this alphabet, legal phonetic/phonemic values are strings of the values specified in Appendix 2 of [IPAHNDBK]. Informative tables of the IPA-to-Unicode mappings can be found at [IPAUNICODE1] and [IPAUNICODE2]. Note that not all of the IPA characters are available in Unicode. For processors supporting this alphabet,

The processor must syntactically accept all legal values.
The processor should produce output when given Unicode IPA codes that can reasonably be considered to belong to the current language.

Informative Note:

Currently there is no ready way for a blind or partially sighted person to read or interact with a lexicon containing IPA symbols. It is hoped that implementers will provide tools which will enable such an interaction.

3. Conformance

This section enumerates the conformance rules of this specification.

All sections in this specification are normative, unless otherwise indicated. The informative parts of this specification are identified by "Informative" labels within sections.

Individual conformance requirements or testable statements are identifiable in the PLS specification through imperative voice statements. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. However, for readability, these words do not appear in all uppercase letters in this specification.

4. Pronunciation Lexicon Markup Language Definition

The Pronunciation Lexicon markup language consists of the following elements and attributes:

Elements	Attributes	Description
`<lexicon>`	`version` `xml:base` `xmlns` `xml:lang` `alphabet`	root element for PLS
`<meta>`	`name` `http-equiv` `content`	meta data container element
`<metadata>`		meta data container element
`<lexeme>`	`xml:id`	the container element for a single lexical entry
`<grapheme>`	`orthography`	contains orthographic information for a lexeme
`<phoneme>`	`prefer` `alphabet`	contains pronunciation information for a lexeme
`<alias>`	`prefer`	contains acronym expansions and words' substitutions
`<example>`		contains an example of the usage for a lexeme

4.1 `<lexicon>` Element

The root element of the Pronunciation Lexicon markup language is the <lexicon> element. This element is the container for all other elements of the PLS language. A <lexicon> element may contain zero or more occurrences of <lexeme>, <meta> and <metadata> elements. There is no specified order for the children of the <lexicon> element.

The <lexicon> element must specify an alphabet attribute which indicates the pronunciation alphabet to be used within the PLS document. The values of the alphabet attribute are described in Section 2 and it may be overridden for a given lexeme using the <phoneme> element.

The required version attribute indicates the version of the specification to be used for the document and must have the value "1.0".

The required xml:lang attribute allows identification of the language for which the pronunciation lexicon is relevant. RFC 3066 [RFC3066] is the normative reference on the values of the xml:lang attribute.

Note that xml:lang specifies a single unique language for the entire PLS document. This does not limit the ability to create multilingual SRGS [SRGS] and SSML [SSML] documents. These documents may reference multiple pronunciation lexicons, possibly written for different languages.

The xml:base attribute allows to define a base URI for the PLS document as defined in XML Base [XML-BASE]. As in the HTML 4.01 specification [HTML], a URI which all the relative references within the document take as their base.

The namespace URI for PLS is "http://www.w3.org/2005/01/pronunciation-lexicon". All PLS markup must be associated with the PLS namespace, using a Namespace Declaration as described in [XMLNS]. This can for instance be achieved by declaring an xmlns attribute on the <lexicon> element, as the examples in this specification show.

Example:

A simple PLS document for the word "tomato" and its pronunciation.

<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
      alphabet="ipa" xml:lang="en-US">
  <lexeme>
    <grapheme>tomato</grapheme>
    <phoneme>t&#x259;mei&#x325;&#x27E;ou&#x325;</phoneme>
    <!-- This is an example of IPA phonetic string -->
    <!-- Because many platform/browser/text editor combinations do not
       correctly cut and paste Unicode text, this example uses the entity
       escape versions of the IPA characters.  Normally, one would directly
       use the UTF-8 representation of these symbols: 
       "təmei̥ɾou̥" -->
  </lexeme>
</lexicon>

4.2 `<meta>` Element

The <metadata> and <meta> elements are containers in which information about the document can be placed. The <metadata> element provides more general and powerful treatment of metadata information than <meta> by using a metadata schema.

A <meta> element associates a string to a declared meta property or declares http-equiv content. Either a name or http-equiv attribute is required. It is an error to provide both name and http-equiv attributes. A content attribute is also required. The only <meta> property defined by this specification is "seeAlso". It is used to specify a resource that might provide additional metadata information about the content. This property is modeled on the "seeAlso" property of "RDF Vocabulary Description Language 1.0: RDF Schema" [RDF-SCHEMA], section 5.4.1. The http-equiv attribute has a special significance when documents are retrieved via HTTP. Although the preferred method of providing HTTP header information is that of using HTTP header fields, the http-equiv content may be used in situations where the PLS document author is unable to configure HTTP header fields associated with their document on the origin server, for example, cache control information. Note that HTTP servers and caches are not required to inspect the contents of <meta> in PLS documents and thereby override the header values they would send otherwise.

The <meta> element is an empty element.

Informative Note:

This section is modelled after the <meta> description in the HTML 4.01 Specification [HTML]. Despite the fact that the name/content model is now being replaced by better ways to include metadata, see for instance section 20.6 of XHTML 2.0 [XHTML2], and the fact that the http-equiv directive is no longer recommended in section 3.3 of XHTML Media Types [XHTML-MTYPES], the Working Group has decided to retain this for compatibility with the other specifications of the first version of the Voice Interface Framework (VoiceXML, SSML, SRGS, CCXML). Future versions of the framework will align with more modern metadata schemes.

Example:

This is an example of how <meta> elements can be included in a PLS document to specify a resource that provides additional metadata information.

<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
      alphabet="ipa" xml:lang="en-US">
    <meta http-equiv="Cache-Control" content="no-cache"/>
    <meta name="seeAlso" content="http://example.com/my-pls-metadata.xml"/>
    <!--  If lexemes are to be added to this lexicon, they start below -->
</lexicon>

4.3 `<metadata>` Element

The <metadata> element is a container in which information about the document can be placed using metadata markup. The behavior of software processing the content of a <metadata> element is not described in this specification. Therefore, software implementing this specification is free to ignore that content.

Although any metadata markup can be used within <metadata>, it is recommended that the RDF/XML Syntax [RDF-XMLSYNTAX] be used, in conjunction with the general metadata properties defined by the Dublin Core Metadata Initiative [DC] (e.g., Title, Creator, Subject, Description, Rights, etc.)

Example:

This is an example of how metadata can be included in a PLS document using the "Dublin Core Metadata Element Set, Version 1.1" [DC-ES] describing general document information such as title, description, date, and so on:

<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
      alphabet="ipa" xml:lang="en-US">
  <metadata>
    <rdf:RDF
       xmlns:rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
       xmlns:dc  = "http://purl.org/dc/elements/1.1/">

     <!-- Metadata about the PLS document -->
     <rdf:Description rdf:about=""
       dc:title="Pronunciation lexicon for W3C terms"
       dc:description="Common pronunciations for many W3C acronyms and abbreviations, i.e. I18N or WAI"
       dc:publisher="W3C"
       dc:language="en-US"
       dc:date="2005-11-29"
       dc:rights="Copyright 2002 W3C"
       dc:format="application/pls+xml">
       <dc:creator>The W3C Voice Browser Working Group</dc:creator>
     </rdf:Description>
    </rdf:RDF>
  </metadata>
  <!--  If lexemes are to be added to this lexicon, they start below -->
</lexicon>

4.4 `<lexeme>` Element

The <lexeme> element is a container for a lexical entry which may include multiple orthographies and multiple pronunciation information.

The <lexeme> element contains one or more <grapheme> elements, one or more of either <phoneme> or <alias> elements, and zero or more <example> elements. The children of the <lexeme> element can appear in any order, but note that the order will have an impact on the treatment of multiple pronunciations, see Section 4.9.

The <lexeme> element has an optional xml:id [XML-ID] attribute, allowing the element to be referenced from other documents (through fragment identifiers or XPointer [XPOINTER], for instance). For example, developers may use external RDF statements [RDF-CONC] to associate metadata (such as part of speech or word relationships) with a lexeme.

Example:

A pronunciation lexicon for the Italian language with two lexemes:

<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
      alphabet="ipa" xml:lang="it-IT">
  <lexeme>
    <grapheme>file</grapheme>
    <phoneme>fa&#x026A;l</phoneme>
    <!-- This is an example of IPA string for
      the pronunciation of the English word: "file"
      that may be present in an Italian text.
      This is the pronunciation: "faɪl" -->
  </lexeme>
  <lexeme>
    <grapheme>EU</grapheme>
    <alias>Unione Europea</alias>
    <!-- This is a substitution of the European
      Union acronym in Italian language.  -->
  </lexeme>
</lexicon>

4.5 `<grapheme>` Element

A <lexeme> contains at least one <grapheme> element. The <grapheme> element contains text describing the orthography of the <lexeme>. The <grapheme> element must not be empty, and must not contain subelements.

In more complex situations there may be alternative textual representations for the same word or phrase; this can arise due to a number of reasons, for example:

Regional spelling variations e.g. "colour" and "color";
Free spelling variations e.g. "judgment" and "judgement"
Alternate writing systems, e.g. Japanese uses a mixture of Han ideographs (Kanji), and phonemic spelling systems e.g. Katakana or Hiragana for representing the orthography of a word or phrase;
Traditional vs Modern spellings e.g. for example in German it is common to replace "ö" with "oe".

In order to remove the need for duplication of pronunciation information to cope with the above variations, the <lexeme> element may contain more than one <grapheme> element to define the base orthography and any variants which should share the pronunciations.

A <grapheme> may optionally contain an orthography attribute which identifies the script code used for writing the orthography. The script code must be compliant with ISO 15924 [ISO15924].

Examples:

An example of a single grapheme and a single pronunciation.

<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
      alphabet="ipa" xml:lang="en-US">
  <lexeme>
    <grapheme>Sepulveda</grapheme>
    <phoneme>s&#x0259;'p&#x028C;lv&#x026A;d&#x0259;</phoneme>
    <!-- IPA string is: "sə'pʌlvɪdə" -->
  </lexeme>
</lexicon>

Another example with more than one written form for a lexical entry, where the first orthography uses latin characters ("Latn" code in ISO 15924 [ISO15924]) for "Romaji" orthography, the second one uses "Kanji" orthography ("Hani" code in ISO 15924 [ISO15924]) and the third one uses the "Hiragana" orthography ("Kana" code in ISO 15924 [ISO15924]):

<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
      alphabet="ipa" xml:lang="jp">
  <lexeme>
    <grapheme orthography="Latn">nihongo</grapheme>
    <grapheme orthography="Hani">日本語</grapheme>
    <grapheme orthography="Kana">にほんご</grapheme>
    <phoneme>n&#x026A;h&#x0252;&#x014B;&#x0252;</phoneme>
    <!-- IPA string is: "nɪhɒŋɒ" -->
  </lexeme>
</lexicon>

4.6 `<phoneme>` Element

A <lexeme> may contain one or more <phoneme> elements. The <phoneme> element contains text describing how the <lexeme> is pronounced. The <phoneme> element must not be empty, and must not contain subelements. A <phoneme> element may optionally have an alphabet attribute which indicates the pronunciation alphabet that is used for this <phoneme> element only. The legal values for the alphabet attribute are described in Section 2.

The prefer is an optional attribute which indicates the preferred pronunciation to be used by a speech synthesis engine. The possible values are: "true" or "false". The default value is "false".

The prefer mechanism spans both the <phoneme> and <alias> elements; see the examples in Section 4.7. Section 4.9 describes how multiple pronunciations are specified in PLS for ASR and TTS, and gives many examples in Section 4.9.3.

Examples:

More than one pronunciation per lexical entry:

<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
      alphabet="ipa" xml:lang="en-US">
  <lexeme>
    <grapheme>huge</grapheme>
    <phoneme prefer="true">hju:&#x02A4;</phoneme>
    <!-- IPA string is: "hju:ʤ" -->
    <phoneme>ju:&#x02A4;</phoneme>
    <!-- IPA string is: "ju:ʤ" -->
  </lexeme>
</lexicon>

More than one written form and more than one pronunciation:

<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
      alphabet="ipa" xml:lang="en-US">
  <lexeme>
    <grapheme>theater</grapheme>
    <grapheme>theatre</grapheme>
    <phoneme prefer="true">'&#x03B8;&#x026A;&#x0259;t&#x0259;r</phoneme>
    <!-- IPA string is: "'θɪətər" -->
    <phoneme>'&#x03B8;i:j&#x0259;t&#x0259;r</phoneme>
    <!-- IPA string is: "'θi:jətər" -->
  </lexeme>
</lexicon>

An example of a <phoneme> that changes the pronunciation alphabet to a proprietary one.

<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
      alphabet="ipa" xml:lang="en-US">
  <lexeme>
    <grapheme>color</grapheme>
    <phoneme>'k&#x028C;l&#x0259;</phoneme>
    <!-- The above pronunciation is in IPA: "'kʌlə" -->
  </lexeme>
  <lexeme>
    <grapheme>XYZ</grapheme>
    <phoneme alphabet="x-example-alphabet">XYZ</phoneme>
    <!-- The above pronunciation is given in a proprietary alphabet 
      called: "x-example-alphabet" -->
  </lexeme>
</lexicon>

4.7 `<alias>` Element

A <lexeme> element may contain one or more <alias> elements which are used to indicate the pronunciation of an acronym, an abbreviated term, in terms of other orthographies, or other transformations as necessary, see examples below and in Section 4.9.3. The <alias> element must not be empty, and must not contain subelements.

In a <lexeme> element, both <alias> elements and <phoneme> elements may be present. If authors want explicit control over the pronunciation, they may use the <phoneme> element instead of the <alias> element.

The <alias> element has an optional prefer attribute analogous to the prefer attribute for the <phoneme> element; see Section 4.6 for a normative description the prefer attribute.

Recursion among <alias> elements is allowed, but cyclic dependencies are not allowed, even across lexicon documents.

Examples:

Acronym expansion using the <alias> element:

<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
      alphabet="ipa" xml:lang="en-US">
  <lexeme>
    <grapheme>W3C</grapheme>
    <alias>World Wide Web Consortium</alias>
  </lexeme>
</lexicon>

The following example illustrates a legal use of recursion. The entry "The B-52s" is defined in part in terms of "fifty-two", which is defined in turn in terms of "fifty" and "two".

<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
      alphabet="ipa" xml:lang="en-US">
  <lexeme>
    <grapheme>The B-52s</grapheme>
    <alias>the b fifty-two zz_plural</alias>
  </lexeme>
  <lexeme>
    <grapheme>b</grapheme>
    <phoneme>bi:</phoneme>
  </lexeme>
  <lexeme>
    <grapheme>fifty-two</grapheme>
    <alias>fifty two</alias>
  </lexeme>
  <lexeme>
    <grapheme>zz_plural</grapheme>
    <phoneme>z</phoneme>
  </lexeme>
</lexicon>

The following example is not permitted, since "7" is defined by aliasing "seven", which is in turn defined by aliasing "7". This is a cyclic dependency, and is not allowed.

<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
      alphabet="ipa" xml:lang="en-US">
  <lexeme>
    <grapheme>7</grapheme>
    <alias>seven</alias>
  </lexeme>
  <lexeme>
    <grapheme>seven</grapheme>
    <alias>7</alias>
  </lexeme>
</lexicon>

4.8 `<example>` Element

The <example> element includes an example sentence that illustrates an occurrence of this lexeme. Because the examples are explicitly marked, automated tools can be used for regression testing and for generation of pronunciation lexicon documentation. The <example> element must not be empty, and must not contain subelements.

Zero, one or many <example> elements may be provided for a single <lexeme> element.

Example:

<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
      alphabet="ipa" xml:lang="en-US">
  <lexeme>
    <grapheme>lead</grapheme>
    <phoneme>led</phoneme>
    <example>My feet were as heavy as lead.</example>
  </lexeme>
  <lexeme>
    <grapheme>lead</grapheme>
    <phoneme>li:d</phoneme>
    <example>The guide once again took the lead.</example>
  </lexeme>
</lexicon>

4.9 Multiple Pronunciations for ASR and TTS

This section describes the treatment of multiple pronunciations specified in a PLS document for ASR and TTS.

4.9.1 Multiple Pronunciations for ASR

If more than one pronunciation for a given <lexeme> is specified (either by <phoneme> elements or <alias> elements or a combination of both), an ASR processor must consider each of them as valid pronunciations for the word. See Example 2 and following examples in Section 4.9.3.

If more than one <lexeme> contains the same <grapheme>, all their pronunciations will be collected in document order and an ASR processor must consider all of them as valid pronunciations for the <grapheme>. See Example 7 and Example 8 in Section 4.9.3.

4.9.2 Multiple Pronunciations for TTS

If more than one pronunciation for a given <lexeme> is specified (either by <phoneme> elements or <alias> elements or a combination of both), a TTS processor must use the first one in document order that has the prefer attribute set to "true". If none of the pronunciations has prefer set to "true", the TTS processor must use the first one in document order. See Example 2 and following examples in Section 4.9.3.

If more than one <lexeme> contains the same <grapheme>, all their pronunciations will be collected in document order and a TTS processor must use the first one in document order that has the prefer attribute set to "true". If none of the pronunciations has prefer set to "true", the TTS processor must use the first one in document order. See Example 7 and Example 8 in Section 4.9.3.

4.9.3 Examples of Multiple Pronunciations

The following examples are designed to describe and illustrate the most common examples of multiple pronunciations. Both ASR and TTS behavior is described.

Example 1:

In the following example, there is only one pronunciation, and it must be used by both ASR and TTS processors.

<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
      alphabet="ipa" xml:lang="en-US">
  <lexeme>
    <grapheme>bead</grapheme>
    <phoneme>bi:d</phoneme>
  </lexeme>
</lexicon>

Example 2:

In the following example, there are two pronunciations. An ASR processor must recognize both pronunciations, whereas a TTS processor must only use the first one (because it is first in document order).

<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
      alphabet="ipa" xml:lang="en-US">
  <lexeme>
    <grapheme>read</grapheme>
    <phoneme>red</phoneme>
    <phoneme>ri:d</phoneme>
  </lexeme>
</lexicon>

Example 3:

In the following example, there are two pronunciations. An ASR processor must recognize both pronunciations, whereas a TTS processor must only use the second one (because it has prefer set to "true").

<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
      alphabet="ipa" xml:lang="en-US">
  <lexeme>
    <grapheme>lead</grapheme>
    <phoneme>led</phoneme>
    <phoneme prefer="true">li:d</phoneme>
  </lexeme>
</lexicon>

Example 4:

In the following example, "read" has two pronunciations. The first one is specified by means of an alias to "red", which is defined just below it. An ASR processor must recognize both pronunciations, whereas a TTS processor must only use the first one (because it is first in document order). In this example, the alias refers to a lexeme later in the lexicon, but in general, this order is not relevant.

<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
      alphabet="ipa" xml:lang="en-US">
  <lexeme>
    <grapheme>read</grapheme>
    <alias>red</alias>
    <phoneme>ri:d</phoneme>
  </lexeme>
  <lexeme>
    <grapheme>red</grapheme>
    <phoneme>red</phoneme>
  </lexeme>
</lexicon>

Example 5:

In the following example, there are two pronunciations for "lead". Both are given with prefer set to "true". An ASR processor must recognize both pronunciations, whereas a TTS processor must only use the first one (because it is first in document order).

<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
      alphabet="ipa" xml:lang="en-US">
  <lexeme>
    <grapheme>lead</grapheme>
    <alias prefer="true">led</alias>
    <phoneme prefer="true">li:d</phoneme>
  </lexeme>
  <lexeme>
    <grapheme>led</grapheme>
    <phoneme>led</phoneme>
  </lexeme>
</lexicon>

Example 6:

In the following example, there are two pronunciations. ASR processor must recognize both pronunciations, whereas a TTS processor must only use the second one (because it has prefer set to "true"). Note that the alias entry for "lead" as "led" does not inherit the preference of the pronunciation of the alias.

<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
      alphabet="ipa" xml:lang="en-US">
  <lexeme>
    <grapheme>lead</grapheme>
    <alias>led</alias>
    <phoneme prefer="true">li:d</phoneme>
  </lexeme>
  <lexeme>
    <grapheme>led</grapheme>
    <phoneme prefer="true">led</phoneme>
  </lexeme>
</lexicon>

Example 7:

In the following example, "lead" has two different entries in the lexicon. An ASR processor must recognize both pronunciations given here, but a TTS processor must only recognize the "led" pronunciation, because it is the first one in document order.

<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
      alphabet="ipa" xml:lang="en-US">
  <lexeme>
    <grapheme>lead</grapheme>
    <phoneme>led</phoneme>
  </lexeme>
  <lexeme>
    <grapheme>lead</grapheme>
    <phoneme>li:d</phoneme>
  </lexeme>
</lexicon>

Example 8:

In the following example, there are two pronunciations in each of two different lexeme entries in the same lexicon document. An ASR processor must recognize both pronunciations given here, but a TTS processor must only recognize the "li:d" pronunciation, because it is the first one in document order that has prefer set to "true".

<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
      alphabet="ipa" xml:lang="en-US">
  <lexeme>
    <grapheme>lead</grapheme>
    <alias>led</alias>
    <phoneme prefer="true">li:d</phoneme>
  </lexeme>
  <lexeme>
    <grapheme>lead</grapheme>
    <phoneme prefer="true">led</phoneme>
    <phoneme>li:d</phoneme>
  </lexeme>
</lexicon>

5. Examples

This section is informative.

5.1 Simple Case

In its simplest form the Pronunciation Lexicon language allows orthographies (the textual representation) to be associated with pronunciations (the phonetic/phonemic representation). A Pronunciation Lexicon document typically contains multiple entries. So, for example, to specify the pronunciation for proper names, such as "Newton" and "Scahill", the markup will look like the following.

<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
      alphabet="ipa" xml:lang="en-GB">
  <lexeme>
    <grapheme>Newton</grapheme>
    <phoneme>'nju:t&#x0259;n</phoneme>
    <!-- IPA string is: "'nju:tən" -->
  </lexeme>
  <lexeme>
    <grapheme>Scahill</grapheme>
    <phoneme>'sk&#x0251;h&#x026A;l</phoneme>
    <!-- IPA string is: "'skɑhɪl" -->
  </lexeme>
</lexicon>

Here we see the root element <lexicon> which contains the two lexemes for the words "Newton" and "Scahill". Each <lexeme> is a composite element consisting of the orthographic and pronunciation representations for the entry. For the two <lexeme> elements there is a single <grapheme> element which includes the orthographic text and the <phoneme> element which includes the pronunciation. In this case the alphabet attribute of the <lexicon> element is set to "ipa", so the International Phonetic Alphabet [IPA] has to be used for all the pronunciations.

5.2 Multiple pronunciations for the same orthography

For ASR systems it is common to rely on multiple pronunciations of the same word or phrase in order to cope with variations of pronunciation within a language. In the Pronunciation Lexicon language, multiple pronunciations are represented by more than one <phoneme> elements within the same <lexeme> element.

In the following example the word "Newton" has two possible pronunciations.

<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
      alphabet="ipa" xml:lang="en-GB">
  <lexeme>
    <grapheme>Newton</grapheme>
    <phoneme prefer="true">'nju:t&#x0259;n</phoneme>
    <!-- IPA string is: "'nju:tən" -->
    <phoneme>'nu:t&#x0259;n</phoneme>
    <!-- IPA string is: "'nu:tən" -->
  </lexeme>
</lexicon>

In the situation where only a single pronunciation needs to be selected among multiple pronunciations that are available (for example where a pronunciation lexicon is also being used by a speech synthesis system), then the prefer attribute on the <phoneme> element may be used to indicate the preferred pronunciation.

5.3 Multiple orthographies

In some situations there are alternative textual representations for the same word or phrase this can arise due to a number of reasons. See Section 4.5 for details.

Here are two simple examples of multiple orthographies: alternative spelling of an English word and multiple writings of a Japanese word.

<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
      alphabet="ipa" xml:lang="en-US">
  <!-- English entry showing how alternative spellings are handled -->
  <lexeme>
    <grapheme>colour</grapheme>
    <grapheme>color</grapheme>
    <phoneme>'k&#x028C;l&#x0259;</phoneme>
    <!-- IPA string is: "'kʌlə" -->
  </lexeme>
</lexicon>

<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
      alphabet="ipa" xml:lang="jp">
  <!-- Japanese entry showing how multiple writing systems are handled
          romaji, kanji and hiragana orthographies -->
  <lexeme>
    <grapheme orthography="Latn">nihongo</grapheme>
    <grapheme orthography="Hani">日本語</grapheme>
    <grapheme orthography="Kana">にほんご</grapheme>
    <phoneme>n&#x026A;h&#x0252;&#x014B;&#x0252;</phoneme>
    <!-- IPA string is: "nɪhɒŋɒ" -->
  </lexeme>
</lexicon>

5.4 Homophones

In most languages there are occasions where there are homophones, words with different spellings and different meanings but the same pronunciation, for instance "seed" and "cede" or the English names "Smyth" and "Smith". In some cases the pronunciations may overlap rather than being exactly the same, for example the English names "Smyth" and "Smith" share one pronunciation but "Smyth" has a pronunciation that is only relevant to itself. Hence we cannot use the multiple orthography mechanism.

Pronunciations are explicitly bound to one or more orthographies within a <lexeme> element so homophones are easy to handle. See the following examples:

<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
      alphabet="ipa" xml:lang="en-US">
  <lexeme>
    <grapheme>cede</grapheme>
    <phoneme>si:d</phoneme>
  </lexeme>
  <lexeme>
    <grapheme>seed</grapheme>
    <phoneme>si:d</phoneme>
  </lexeme>
  <lexeme>
    <grapheme>Smyth</grapheme>
    <phoneme>sm&#x026A;&#x03B8;</phoneme>
    <!-- IPA string is: "smɪθ" -->
    <phoneme>sma&#x026A;&#x00F0;</phoneme>
    <!-- IPA string is: "smaɪð" -->
  </lexeme>
  <lexeme>
    <grapheme>Smith</grapheme>
    <phoneme>sm&#x026A;&#x03B8;</phoneme>
    <!-- IPA string is: "smɪθ" -->
  </lexeme>
</lexicon>

5.5 Homographs

In most languages there are occasions where there are words with the same spelling but different pronunciations, called homographs. For example, in English the noun "refuse" has a different pronunciation to the verb "refuse". If a pronunciation lexicon author did not want to distinguish between the two words then they could simply be represented as alternate pronunciations within the same <lexeme> element, otherwise two different <lexeme> elements need to be used. In both cases the processor will not be able to distinguish when to apply the first or the second transcription.

Informative Note:

Note that the current version of specification is not able to instruct the PLS processor how to distinguish the two pronunciations, because by doing so it will need a contextual processing that is outside the scope of the current version of PLS (see Section 1.)

<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
      alphabet="ipa" xml:lang="en-US">
  <lexeme>
    <grapheme>refuse</grapheme>
    <phoneme>r&#x026A;'fju:z</phoneme>
    <!-- IPA string is: "rɪ'fju:z" -->
  </lexeme>
  <lexeme>
    <grapheme>refuse</grapheme>
    <phoneme>'refju:s</phoneme>
    <!-- IPA string is: "'refju:s" -->
  </lexeme>
</lexicon>

5.6 Pronunciation by Orthography (Acronyms, Abbreviations, etc.)

For some words and phrases pronunciation can be quickly and conveniently expressed as a sequence of other orthographies. The developer is not required to have linguistic knowledge, but instead makes use of the pronunciations that are already expected to be available. To express pronunciations using other orthographies the <alias> element may be used.

This feature may be very useful to deal with acronym expansion.

<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
      alphabet="ipa" xml:lang="en-US">
  <!-- 
	Acronym expansion
  -->
  <lexeme>
    <grapheme>W3C</grapheme>
    <alias>World Wide Web Consortium</alias>
  </lexeme>
  <!-- 
	number representation
  -->
  <lexeme>
    <grapheme>101</grapheme>
    <alias>one hundred and one</alias>
  </lexeme>
  <!-- 
	crude pronunciation mechanism
  -->
  <lexeme>
    <grapheme>Thailand</grapheme>
    <alias>tie land</alias>
  </lexeme>
  <!-- 
	crude pronunciation mechanism and acronym expansion
  -->
  <lexeme>
    <grapheme>BBC 1</grapheme>
    <alias>be be sea one</alias>
  </lexeme>
</lexicon>

6. References

6.1 Normative References

[HTML]: HTML 4.01 Specification, Dave Raggett, et al., Editors. World Wide Web Consortium, 24 December 1999. This version of the HTML 4.01 Recommendation is http://www.w3.org/TR/1999/REC-html401-19991224/. The latest version of HTML is available at http://www.w3.org/TR/html/.
[IPAHNDBK]: Handbook of the International Phonetic Association, International Phonetic Association, Editors. Cambridge University Press, July 1999. Information on the Handbook is available at http://www.arts.gla.ac.uk/ipa/handbook.html.
[ISO15924]: Information and documentation - Codes for the representation of names of scripts, Michael Everson. ISO 15924:2004, January 2004. This is available at http://www.unicode.org/iso15924/standard/index.html.
[RFC2119]: Key words for use in RFCs to Indicate Requirement Levels, S. Bradner, Editor. IETF, March 1997. This RFC is available at http://www.ietf.org/rfc/rfc2119.txt.
[RFC3066]: Tags for the Identification of Languages, H. Alvestrand, Editor. IETF, January 2001. This RFC is available at http://www.ietf.org/rfc/rfc3066.txt.
[RFC4267]: The W3C Speech Interface Framework Media Types: application/voicexml+xml, application/ssml+xml, application/srgs, application/srgs+xml, application/ccxml+xml, and application/pls+xml, Max Froumentin, Editor. IETF, November 2005. This RFC is available at http://www.ietf.org/rfc/rfc4267.txt.
[SRGS]: Speech Recognition Grammar Specification Version 1.0, Andrew Hunt and Scott McGlashan, Editors. World Wide Web Consortium, 16 March 2004. This version of the SRGS 1.0 Recommendation is http://www.w3.org/TR/2004/REC-speech-grammar-20040316/. The latest version is available at http://www.w3.org/TR/speech-grammar/.
[SSML]: Speech Synthesis Markup Language (SSML) Version 1.0, Daniel C. Burnett, et al., Editors. World Wide Web Consortium, 7 September 2004. This version of the SSML 1.0 Recommendation is http://www.w3.org/TR/2004/REC-speech-synthesis-20040907/. The latest version is available at http://www.w3.org/TR/speech-synthesis/.
[XML-BASE]: XML Base, J. Marsh, editor. World Wide Web Consortium, 27 June 2001. This version of the XML Base Recommendation is http://www.w3.org/TR/2001/REC-xmlbase-20010627/. The latest version is available at http://www.w3.org/TR/xmlbase/.
[XML-ID]: xml:id Version 1.0, J. Marsh, D. Veillard, N. Walsh. World Wide Web Consortium, 9 September 2005. This version of the xml:id Recommendation is http://www.w3.org/TR/2005/REC-xml-id-20050909/. The latest version is available at http://www.w3.org/TR/xml-id/.
[XMLNS]: Namespaces in XML 1.1, T. Bray et al., Editors. World Wide Web Consortium, 4 February 2004. This version of the XML Namespaces Recommendation is http://www.w3.org/TR/2004/REC-xml-names11-20040204/. The latest version is available at http://www.w3.org/TR/xml-names11/.

6.2 Informative References

[DC]: Dublin Core Metadata Initiative.
See http://dublincore.org/.
[DC-ES]: Dublin Core Metadata Element Set, Version 1.1: Reference Description.
See http://dublincore.org/documents/dces/.
[IPA]: International Phonetic Association.
See http://www.arts.gla.ac.uk/ipa/ipa.html for the organization's website.
[IPAUNICODE1]: The International Phonetic Alphabet, J. Esling. This table of IPA characters in Unicode is available at http://web.uvic.ca/ling/resources/ipa/charts/unicode_ipa-chart.htm.
[IPAUNICODE2]: The International Phonetic Alphabet in Unicode, J. Wells. This table of Unicode values for IPA characters is available at http://www.phon.ucl.ac.uk/home/wells/ipa-unicode.htm.
[JEIDAALPHABET]: JEIDA-62-2000 Phoneme Alphabet. JEITA. An abstract of this document (in Japanese) is available at http://it.jeita.or.jp/document/publica/standard/summary/JEIDA-62-2000.pdf.
[JEITA]: Japan Electronics and Information Technology Industries Association.
See http://www.jeita.or.jp/.
[RDF-CONC]: Resource Description Framework (RDF): Concepts and Abstract Syntax, G. Klyne and J.J. Carroll, Editors. World Wide Web Consortium, 10 February 2004. This version of the RDF Concepts and Abstract Syntax is http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/. The latest version is available at http://www.w3.org/TR/rdf-concepts/.
[RDF-SCHEMA]: RDF Vocabulary Description Language 1.0: RDF Schema, D. Brickley and R. Guha, Editors. World Wide Web Consortium, 10 February 2004. This version of the RDF Schema Recommendation is http://www.w3.org/TR/2004/REC-rdf-schema-20040210/. The latest version of RDF Schema is available at http://www.w3.org/TR/rdf-schema/.
[RDF-XMLSYNTAX]: RDF/XML Syntax Specification, D. Beckett, Editor. World Wide Web Consortium, 10 February 2004. This version of the RDF/XML Syntax Recommendation is http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210/. The latest version of the RDF XML Syntax is available at http://www.w3.org/TR/rdf-syntax-grammar/.
[REQS]: Pronunciation Lexicon Specification (PLS) Version 1.0 Requirements, P. Baggia and F. Scahill, Editors. World Wide Web Consortium, 29 October 2004. This document is a work in progress. This version of the Pronunciation Lexicon Requirements is http://www.w3.org/TR/2004/WD-lexicon-reqs-20041029/. The latest version of the Pronunciation Lexicon Requirements is available at http://www.w3.org/TR/lexicon-reqs/.
[SAMPA]: SAMPA computer readable phonetic alphabet, J.C. Wells.
See http://www.phon.ucl.ac.uk/home/sampa/home.htm for information on it.
[SISR]: Semantic Interpretation for Speech Recognition (SISR) Version 1.0, Luc van Tichelen and Dave Burke, Editors. World Wide Web Consortium, 11 January 2006. This version of the SISR 1.0 Candidate Recommendation is http://www.w3.org/TR/2006/CR-semantic-interpretation-20060111/. The latest version is available at http://www.w3.org/TR/semantic-interpretation/.
[VXML]: Voice Extensible Markup Language (VoiceXML) Version 2.0, Scott McGlashan et al., Editors. World Wide Web Consortium, 16 March 2004. This version of the VoiceXML 2.0 Recommendation is http://www.w3.org/TR/2004/REC-voicexml20-20040316/. The latest version is available at http://www.w3.org/TR/voicexml20/.
[XHTML2]: XHTML 2.0, J. Axelsson et al., Editors. World Wide Web Consortium, 22 July 2004. This version of the XML XHTML 2.0 Working Draft is http://www.w3.org/TR/2004/WD-xhtml2-20040722/. The latest version is available at http://www.w3.org/TR/xhtml2/.
[XHTML-MTYPES]: XHTML Media Types, Ishikawa Masayasu, Editor. World Wide Web Consortium, 1 August 2002. This version of the W3C Note is http://www.w3.org/TR/xhtml-media-types/xhtml-media-types.html. The latest version is available at http://www.w3.org/TR/xhtml-media-types.
[XPOINTER]: XPointer Framework, P. Grosso, E. Maler, J. Marsh, N. Walsh. World Wide Web Consortium, 25 March 2003. This version of the XPointer Framework Recommendation is http://www.w3.org/TR/2003/REC-xptr-framework-20030325/. The latest version is available at http://www.w3.org/TR/xptr-framework/.
[X-SAMPA]: Computer-coding the IPA: a proposed extension of SAMPA, J.C. Wells, University College London, 28 April 1995. This version is available at http://www.phon.ucl.ac.uk/home/sampa/ipasam-x.pdf.

7. Acknowledgements

This specification was written with the help of the following people (listed in alphabetical order):

Jeff Adams, Nuance Communications

Kazuyuki Ashimura, W3C

Paolo Baggia, Loquendo (leading author)

Dan Burnett, Vocalocity

Debbie Dahl, Conversational Technologies

Ken Davies, HeyAnita

Ellen Eide, IBM

Max Froumentin, W3C

Will Gardella, SAP

Makoto Hirota, Canon

Jim Larson, Intel

Dave Pawson, RNIB

Dave Raggett, W3C/Canon

Luc Van Tichelen, Nuance Communications

The editor wishes to thank the first author of this document, Frank Scahill, BT.

Appendix A - Schema for Pronunciation Lexicon Specification

This section is normative.

There are two schemas which can be used to validate PLS documents:

The XML schema is located at "http://www.w3.org/2006/01/pronunciation-lexicon/pls.xsd".
The RELAX NG schema is located at "http://www.w3.org/2006/01/pronunciation-lexicon/pls.rng".

Appendix B - MIME Type and File Suffix

This section is normative.

The media type associated to Pronunciation Lexicon Specification documents is "application/pls+xml" and the filename suffix is ".pls" as defined in [RFC4267].

Pronunciation Lexicon Specification (PLS) Version 1.0

W3C Working Draft 31 January 2006

Abstract

Status of this Document

Table of Contents

1. Introduction to Pronunciation Lexicon Specification

1.1. How TTS Uses the PLS

1.2. How ASR Uses the PLS

1.3. How VoiceXML Applications Use the PLS

1.4. What PLS does not Support

1.5. Glossary of Terms

2. Pronunciation Alphabets

Informative Note:

3. Conformance

4. Pronunciation Lexicon Markup Language Definition

4.1 <lexicon> Element

Example:

4.2 <meta> Element

Informative Note:

Example:

4.3 <metadata> Element

Example:

4.4 <lexeme> Element

Example:

4.5 <grapheme> Element

Examples:

4.6 <phoneme> Element

Examples:

4.7 <alias> Element

Examples:

4.8 <example> Element

Example:

4.9 Multiple Pronunciations for ASR and TTS

4.9.1 Multiple Pronunciations for ASR

4.9.2 Multiple Pronunciations for TTS

4.9.3 Examples of Multiple Pronunciations

Example 1:

Example 2:

Example 3:

Example 4:

Example 5:

Example 6:

Example 7:

Example 8:

5. Examples

5.1 Simple Case

5.2 Multiple pronunciations for the same orthography

5.3 Multiple orthographies

5.4 Homophones

5.5 Homographs

Informative Note:

5.6 Pronunciation by Orthography (Acronyms, Abbreviations, etc.)

6. References

6.1 Normative References

6.2 Informative References

7. Acknowledgements

Appendix A - Schema for Pronunciation Lexicon Specification

Appendix B - MIME Type and File Suffix

4.1 `<lexicon>` Element

4.2 `<meta>` Element

4.3 `<metadata>` Element

4.4 `<lexeme>` Element

4.5 `<grapheme>` Element

4.6 `<phoneme>` Element

4.7 `<alias>` Element

4.8 `<example>` Element