Pronunciation Lexicon Specification (PLS) Version 1.0 Requirements

W3C Working Draft 29 October 2004

This version:: http://www.w3.org/TR/2004/WD-lexicon-reqs-20041029/
Latest version:: http://www.w3.org/TR/lexicon-reqs/
Previous versions:: http://www.w3.org/TR/2001/WD-lexicon-reqs-20010312/
Editor:: Paolo Baggia, Loquendo; Frank Scahill, BT

Abstract

The W3C Voice Browser Working Group aims to develop specifications to enable access to the Web using spoken interaction. This document is part of a set of requirements studies for voice browsers, and provides details of the requirements for markup used for specifying application specific pronunciation lexicons.

Application specific pronunciation lexicons are required in many situations where the default lexicon supplied with a speech recognition or speech synthesis processor does not cover the vocabulary of the application. A pronunciation lexicon is a collection of words or phrases together with their pronunciations specified using an appropriate pronunciation alphabet.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document describes the requirements for markup used for pronunciation lexicon. This new requirements list replaces the old requirements. New requirements are now in line with VoiceXML 2.0 Recommendation, and other Voice Browser Working Group specification requirements. Changes between these two versions are described in a diff document. You are encouraged to subscribe to the public discussion list <www-voice@w3.org> and to mail us your comments. To subscribe, send an email to <www-voice-request@w3. org> with the word subscribe in the subject line (include the word unsubscribe if you want to unsubscribe). A public archive is available online.

This document has been produced as part of the W3C Voice Browser Activity, following the procedures set out for the W3C Process. The authors of this document are members of the Voice Browser Working Group (W3C Members only).

Patent disclosures relevant to this specification may be found on the Working Group's patent disclosure page. This document has been produced under the 24 January 2002 CPP as amended by the W3C Patent Policy Transition Procedure. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) with respect to this specification should disclose the information in accordance with section 6 of the W3C Patent Policy.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

1. Introduction
2. Interoperability Requirements
3. Lexicon Requirements
4. Orthographic Requirements
5. Pronunciation Requirements
6. Pronunciation alphabet Requirements
7. Future Study
8. References
9. Acknowledgements

1. Introduction

This document establishes a prioritized list of requirements for pronunciation lexicon markup which any proposed markup language should address. This document addresses both procedure and requirements for the specification development. The requirements are addressed in separate sections on Lexicon Requirements, Orthographic Requirements, Pronunciation Requirements, and Pronunciation alphabet Requirements, followed by Future Study and Acknowledgements sections.

Why do we need such a markup language?

In voice browsing applications there is often a need to use proper nouns or other unusual words within speech recognition grammars and in text to be read out by Text-to-Speech processors. These words may not be present in the platforms' built-in lexicons. In such cases voice browsers typically resort to automatic pronunciation generation algorithms, which may be improved by manually specificied pronunciations. The goal of the pronunciation lexicon markup is to provide a mechanism for application developers to supply high quality additional pronunciations in a platform independent manner.

In many cases application developers will need to only provide one or two additional pronunciations inline within other voice markup languages, but there are other cases where an application may make use of large pronunciation lexicons that cannot conveniently be specified inline and have to be provided as separate documents. The pronunciation lexicon markup will address both communities.

The markup language for pronunciation lexicons will be developed within the following broad design criteria. They are ordered from higher to lower priority. In the event that two goals conflict, the higher priority goal takes precedence. Specific technical requirements are addressed in the following sections.

The pronunciation lexicon markup language will enable consistent, platform independent control of pronunciations for use by voice browsing applications.
The pronunciation lexicon markup language should be sufficient to cover the requirements of speech recognition and speech synthesis systems within a voice browser.
The pronunciation lexicon markup language will be an XML language and shall be interoperable with relevant W3C specifications (see section 2 Interoperability Requirements for details).
The pronunciation lexicon markup language will be usable in a large number of human languages (see the requirements 3.4 and 3.5).
It should be easy and computationally efficient to automatically generate and process documents using the pronunciation lexicon markup language.
All features of the pronunciation lexicon markup language should be implementable with existing, generally available technology. Anticipated capabilities should be considered to ensure future extensibility (but are not required to be covered in the specification).
The pronunciation lexicon markup language should be easy to author, where appropriate deriving from existing pronunciation lexicons formats and using existing pronunciation alphabets.

2. Interoperability Requirements

2.1 Integration with other Voice Browser Markup languages (must have)

The pronunciation lexicon markup must be interoperable with other relevant specifications developed by the W3C Voice Browser Working Group. In particular the pronunciation lexicon markup must be compatible with the Speech Synthesis Markup Language [SSML] and Speech Recognition Grammar Specification [SRGS].

2.2 Embeddable within other Voice Browser Markuplanguages (nice to have)

The pronunciation lexicon markup may be embedded in the Speech Synthesis Markup Language [SSML] and in Speech Recognition Grammar Specification [SRGS].

3. Lexicon Requirements

3.1 Multiple entries per lexicon (must have)

The pronunciation lexicon markup must support the ability to specify multiple entries within a document, each entry containing orthographic and pronunciation information.

3.2 Multiple lexicons per document (nice to have)

The pronunciation lexicon markup may provide named groupings of lexicon entries within a single lexicon document. This may be useful for separating lexicons into application specific classes of pronunciation e.g. all city names.

3.3 Pronunciation alphabet per lexicon (must have)

The pronunciation lexicon markup must provide the ability to specify the pronunciation alphabet for use by all entries within a document, such as the phonetic alphabet defined by the International Phonetic Association IPA [IPA].

3.4 Language identifier per lexicon (must have)

The pronunciation lexicon markup must provide the ability to specify language identifiers for use by all entries within a document. Each language identifier must be expressed following RFC 3066 [RFC3066].

3.5 Language identifier per Lexicon Entry (nice to have)

The pronunciation lexicon may support the ability to specify language identifiers for an individual entry within a document. Each language identifier must be expressed following RFC 3066 [RFC3066].

3.6 Lexicon can import other lexicons (nice to have)

The pronunciation lexicon markup may support the ability to import other pronunciation lexicons written in the pronunciation lexicon markup.

3.7 Lexicon can import individual lexicon entries(nice to have)

The pronunciation markup may support the ability to import lexicon entries from other pronunciation lexicons.

3.8 Metadata information (should have)

The pronunciation lexicon markup should provide a mechanism for specifying metadata within pronunciation lexicon documents. This metadata can contain information about the document itself rather than document content. For example: record the purpose of the lexicon document, the author, etc.

4. Orthographic Requirements

4.1 Multi word orthographies (must have)

The pronunciation lexicon markup must allow multi word orthographies. This is particularly important for natural speech applications where common phrases may have significantly different pronunciations to that of the concatenated word pronunciations, requiring a phrase level pronunciation. An example would be "how about" often pronounced "how 'bout".

4.2 Alternate orthographies (must have)

The pronunciation lexicon markup must provide the ability to indicate an alternative equivalent form of the orthography.

This is required to cover the following situations:

Regional spelling variations e.g. "colour" and "color"
Free spelling variations e.g. "judgment" and "judgement"
Alternate writing systems e.g. Japanese Kanji and Kana
Ancient vs Modern spellings e.g. German before and after the reform of the spelling system.

4.3 Handling of orthographic textual variability (must have)

The pronunciation lexicon markup must provide a mechanism to indicate the allowable textual variability in the orthography. Types of variability include, but are not limited to,

Whitespace handling
Case sensitivity
Unicode sequence variation
Valid character sets
Diacritics within languages such as Arabic or Farsi
Accent matching within languages such as French.

The definition of a standard text normalisation scheme is beyond the scope of this specification.

4.4 Handling of homographs (nice to have)

The pronunciation lexicon markup may provide a mechanism to deal with the problem of specifying homographs (words with the same spelling, but potentially different meanings and pronunciations), within the same document.

5. Pronunciation Requirements

5.1 Single Pronunciations (must have)

The pronunciation lexicon markup must provide the ability to specify a single pronunciation for a given lexicon entry as a sequence of symbols according to the pronunciation alphabet selected.

5.2 Multiple pronunciations (must have)

The pronunciation lexicon markup must support the ability to specify multiple pronunciations for a given lexicon entry.

5.3 Dialect indication (nice to have)

The pronunciation lexicon markup may provide a mechanism for indicating the dialect or language variation for each pronunciation, as described in RFC 3066 [RFC3066], such as "en-scounse".

5.4 Pronunciation preference (must have)

The pronunciation lexicon markup must enable indication of which pronunciation is the preferred form for use by a speech synthesizer where there are multiple pronunciations for a lexicon entry. The pronunciation lexicon markup must define the default selection behaviour for the situations where there are multiple pronunciations but no indicated preference.

5.5 Pronunciation weighting (nice to have)

The pronunciation lexicon markup may allow for relative weightings to be applied to pronunciations. These weightings to indicate the relative importance of the pronunciations within a single lexicon entry. This can be useful for speech recognition systems.

5.6 Orthographic Specification of Pronunciation (should have)

The pronunciation lexicon markup should allow the specification of the pronunciation of an orthography in terms of other orthographies with previously defined pronunciations, for example, the pronunciation for "W3C" specified as the concatenation of pronunciations of the words "double you three see".

6. Pronunciation alphabet Requirements

6.1 Standard Pronunciation alphabets (must have)

We will standardize on at least one existing pronunciation alphabet, such as the phonetic alphabet defined by the International Phonetic Association IPA [IPA]. We do not plan of developing a new standard pronunciation alphabet.

6.2 Internationalization (must have)

The pronunciation alphabet must allow the specification of pronunciations for any language including tonal languages.

6.3 Suprasegmental annotations (must have)

The pronunciation alphabet must provide a mechanism for indicating suprasegmental structure such as, word/syllable boundaries, and stress markings. The specification may address other types of suprasegmental structure.

6.4 Interoperability (should have)

The choice of pronunciation alphabet should take into account the requirements of interoperability between platforms.

6.5 Vendor Specific Pronunciation Alphabets (must have)

The pronunciation lexicon markup must allow for vendor specific pronunciation alphabets to be used.

7. Future Study

This section contains issues that were identified during requirements capture but which have not been directly incorporated in the current set of requirements.

7.1 More powerful addressing for Lexicon Entries

It may be desirable to provide an addressing scheme for lexicon entries that is more flexible than the document and fragment URI schemes currently listed in the requirements. An example of a more powerful addressing mechanism could be XPath.

7.2 Prefix/Suffix morphological rules

In some situations the explicit specification of all the morphological variants of a word can lead to extremely large lexicons. A standard scheme for providing prefix and suffix morphological rules would enable more compact lexicon documents. However it is felt that the most common use of the pronunciation lexicon markup will be for proper nouns where morphological variance is less of an issue, and that standardisation of morphological rules will be too difficult to achieve in a first draft. Off-line tools may provide mechanisms for generating morphological variants.

7.3 Context Dependent orthographies

In some languages the pronunciation of an orthography and the orthography itself are dependent upon the context in which this orthography is used. The requirements do not address this issue. It may not be possible to resolve this issue in a vendor independent manner. It is possible that the additional information could be used to handle this situation in a platform dependent manner.

7.4 Compound words

In languages such as German and Dutch words can occur as part of compound words and in some cases may only occur within compound words. In the future, the pronunciation lexicon markup should address handling compound words.

8. References

[IPA]: Handbook of the International Phonetic Association , International Phonetic Association, Editors. Cambridge University Press, July 1999. Information on the Handbook is available at http://www2.arts.gla.ac.uk/ipa/handbook.html.
[RFC3066]: Tags for the Identification of Languages , H. Alvestrand, Editor. IETF, January 2001. This RFC is available at http://www.ietf.org/rfc/rfc3066.txt.
[SRGS]: Speech Recognition Grammar Specification Version 1.0 , Andrew Hunt and Scott McGlashan, Editors. World Wide Web Consortium, 16 March 2004. This version of the SRGS 1.0 Recommendation is http://www.w3.org/TR/2004/REC-speech-grammar-20040316/. The latest version is available at http://www.w3.org/TR/speech-grammar/.
[SSML]: Speech Synthesis Markup Language (SSML) Version 1.0 , Daniel C. Burnett, et al., Editors. World Wide Web Consortium, 7 September 2004. This version of the SSML 1.0 Recommendation is http://www.w3.org/TR/2004/REC-speech-synthesis-20040907/. The latest version is available at http://www.w3.org/TR/speech-synthesis/.

9. Acknowledgements

The editor wishes to thank the previous author of this document, Frank Scahill, and the old and new members of the Voice Browser Working Group involved in this activity (listed in alphabetical order):

Paolo Baggia, Loquendo (current leading author)

Dan Burnett, Independent Consultant

Debbie Dahl, Conversational Technologies

Ken Davies, HeyAnita

Ellen Eide, IBM

Will Gardella, SAP

Andrew Hunt, ScanSoft

Jim Larson, Intel

Bruce Lucas, IBM

Dave Raggett, W3C/Canon

Frank Scahill, BT (previous author)

Linda Thibault, Locus Dialogue

Luc Van Tichelen, ScanSoft