Copyright ©2005 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
This document defines the syntax for specifying pronunciation lexicons to be used by speech recognition and speech synthesis engines in voice browser applications.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This document is the first public Working Draft of the Pronunciation Lexicon specification, and has been produced by the W3C Voice Browser Activity for review by W3C Members and other interested parties. The authors of this document are members of the Voice Browser Working Group (W3C Members only). This document is for public review. Comments should be sent to the public mailing list <www-voice@w3.org> (archive).
A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced under the 5 February 2004 W3C Patent Policy. The Working Group maintains a public list of patent disclosures relevant to this document; that page also includes instructions for disclosing [and excluding] a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) with respect to this specification should disclose the information in accordance with section 6 of the W3C Patent Policy.
Per section 4 of the W3C Patent Policy, Working Group participants have 150 days from the title page date of this document to exclude essential claims from the W3C RF licensing requirements with respect to this document series. Exclusions are with respect to the exclusion reference document, defined by the W3C Patent Policy to be the latest version of a document in this series that is published no later than 90 days after the title page date of this document.
<lexicon>
element<meta>
element<metadata>
element<lexeme>
element<grapheme>
element<phoneme>
element<alias>
element<example>
elementThis section is informative.
The accurate specification of pronunciation is critical to the success of speech applications. Most speech recognition and text-to-speech synthesis engines provide extensive high quality lexicons with pronunciation information for most words or phrases. To ensure a maximum coverage of the words or phrases used by an application, application specific pronunciations may be required. These are most commonly needed for proper nouns such as surnames or business names.
The Pronunciation Lexicon Specification (PLS) is designed to allow interoperable specification of pronunciation information for either speech recognition and speech synthesis engines within voice browsing applications. The language is intended to be easy to use by developers whilst supporting the accurate specification of pronunciation information for international use.
The language allows one or more pronunciations for a word or phrase to be specified using a standard pronunciation alphabet or if necessary using vendor specific alphabets. Pronunciations are grouped together into a PLS document which may be referenced from other markup languages, such as Speech Recognition Grammar Specification [SRGS] and Speech Synthesis Markup Language [SSML].
A Terminology Section will be included here.
In order to specify a pronunciation a phonemic/phonetic alphabet
has to be used. An alphabet in this context refers to a collection
of symbols to represent the sounds of one or more human languages.
In the Pronunciation Lexicon Specification the pronunciation
alphabet is specified by the "alphabet"
attribute (see
Section 4.1 and Section 4.6
for details on the use of this attribute). The only valid values
for the "alphabet"
attribute are "ipa"
(see the next paragraph) and vendor-defined strings of the form
"x-organization"
or
"x-organization-alphabet"
. For example, the Japan
Electronics and Information Technology Industries Association
[JEITA] might wish to encourage the use of
an alphabet such as "x-jeita"
or
"x-jeita-2000"
for their phoneme alphabet [JEIDAALPHABET]. Another example might be
"x-sampa"
[X-SAMPA] an
extension of SAMPA (Speech Assessment Methods Phonetic Alphabet)
phonetic alphabet [SAMPA] to cover the
entire range of characters in the International Phonetic Alphabet
[IPA].
A processor should support a value for "alphabet"
of "ipa"
, corresponding to Unicode representations of
the phonetic characters developed by the International Phonetic
Association [IPA]. In addition to an
exhaustive set of vowel and consonant symbols, this character set
supports a syllable delimiter, numerous diacritics, stress symbols,
lexical tone symbols, intonational markers and more. For this
alphabet, legal phonetic/phonemic values are strings of the values
specified in Appendix 2 of [IPAHNDBK].
Informative tables of the IPA-to-Unicode mappings can be found at
[IPAUNICODE1] and [IPAUNICODE2]. Note that not all of the IPA
characters are available in Unicode. For processors supporting this
alphabet,
It is intention of the Working Group to fulfill the Requirement "4.3 Handling of orthographic textual variability (must have)" ([REQS]) , which states that:
The pronunciation lexicon markup must provide a mechanism to indicate the allowable textual variability in the orthography. Types of variability include, but are not limited to,
The definition of a standard text normalisation scheme is beyond the scope of this specification.
The Pronunciation Lexicon markup language consists of the following elements:
Elements | Attributes | Description |
---|---|---|
<lexicon> |
version xml:base xmlns xml:lang alphabet |
root element for PLS |
<meta> |
name http-equiv content |
meta data container element |
<metadata> |
meta data container element | |
<lexeme> |
the container element for a single lexical entry | |
<grapheme> |
orthography |
contains orthographic information for the lexeme |
<phoneme> |
prefer alphabet |
contains pronunciation information for the lexeme |
<alias> |
contains substitution of acronyms and words | |
<example> |
contains an example of the usage of the lexeme |
<lexicon>
elementThe root element of the Pronunciation Lexicon markup language is
the <lexicon>
element. This
element is the container for all other elements of the PLS
language. A <lexicon>
element may contain zero or more occurrences of <lexeme>
, <meta>
and <metadata>
elements.
The <lexicon>
element
must contain an "alphabet"
attribute which indicates
the pronunciation alphabet to be used within the PLS document. The
values of the "alphabet"
attribute are described in
Section 2 and it may be overriden by the <phoneme>
element.
The required "version"
attribute indicates the
version of the specification to be used for the document and must
have the value "1.0"
.
The required "xml:lang"
attribute allows
identification of the language for which the pronunciation lexicon
is relevant. RFC 3066 [RFC3066] is the normative on the values of the
"xml:lang"
attribute.
The current Working Draft limits the Pronunciation Lexicon to be
monolingual, because the "xml:lang"
is unique for the
entire document.
This solution does not limit the possibility to create multilingual SRGS and SSML documents. These documents may reference multiple lexicons, possibly written for different languages.
The "xml:base"
attribute allows to define a base
URI for the PLS document as defined in XML Base [XML-BASE]. As in HTML 4.01
Specification [HTML], a URI
which all the relative references within the document take as their
base.
The <lexicon>
element can
designate the PLS namespace. This can be achieved by declaring an
"xmlns"
attribute or an attribute with an
"xmlns"
prefix. See Namespaces in XML [XMLNS] for details. Note that when
the "xmlns"
attribute is used alone, it sets the
default namespace for the element on which it appears and for any
child elements. The namespace URI for PLS is:
"http://www.w3.org/2005/01/pronunciation-lexicon"
.
A simple PLS document for the word "tomato" and its pronunciation.
<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" alphabet="ipa" xml:lang="en-US">
<lexeme>
<grapheme>tomato</grapheme>
<phoneme>təmei̥ɾou̥</phoneme>
<!-- This is an example of IPA phonetic string -->
<!-- Because many platform/browser/text editor combinations do not
correctly cut and paste Unicode text, this example uses the entity
escape versions of the IPA characters. Normally, one would directly
use the UTF-8 representation of these symbols:
"təmei̥ɾou̥". -->
</lexeme>
</lexicon>
<meta>
elementThe <metadata>
and
<meta>
elements are
containers in which information about the document can be placed.
The <metadata>
element
provides more general and powerful treatment of metadata
information than <meta>
by
using a metadata schema.
A <meta>
element
associates a string to a declared meta property or declares
"http-equiv" content. Either a "name"
or
"http-equiv"
attribute is required. It is an error to
provide both "name"
and "http-equiv"
attributes. A "content"
attribute is also required.
The "seeAlso"
property is the only defined <meta>
property name. It is used to
specify a resource that might provide additional metadata
information about the content. This property is modelled on the
"seeAlso"
property of "RDF Vocabulary Description
Language 1.0: RDF Schema" [RDF-SCHEMA
§5.4.1]. The "http-equiv"
attribute has a
special significance when documents are retrieved via HTTP.
Although the preferred method of providing HTTP header information
is that of using HTTP header fields, the "http-equiv"
content may be used in situations where the PLS document author is
unable to configure HTTP header fields associated with their
document on the origin server, for example, cache control
information. Note that HTTP servers and caches are not required to
introspect the contents of <meta>
in PLS documents and thereby
override the header values they would send otherwise.
The <meta>
element is an
empty element.
This is an example of how <meta>
elements can be included in a
PLS document to specify a resource that provides additional
metadata information.
<?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" alphabet="ipa" xml:lang="en"> <meta name="seeAlso" content="http://example.com/my-pls-metadata.xml"/> </lexicon>
<metadata>
elementThe <metadata>
element is
a container in which information about the document can be placed
using metadata markup. The behavior of software processing the
content of a <metadata>
element is not described in this specification. Therefore, software
implementing this specification is free to ignore that content.
Although any metadata markup can be used within <metadata>
, it is recommended that
the RDF/XML Syntax [RDF-XMLSYNTAX] be
used, in conjunction with the general metadata properties defined
by the Dublin Core Metadata Initiative [DC]
(e.g., Title, Creator, Subject, Description, Rights, etc.).
This is an example of how metadata can be included in a PLS document using the "Dublin Core Metadata Element Set, Version 1.1" [DC-ES] describing general document information such as title, description, date, and so on:
<?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" alphabet="ipa" xml:lang="en"> <metadata> <rdf:RDF xmlns:rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc = "http://purl.org/dc/elements/1.1/"> <!-- Metadata about the PLS document --> <rdf:Description rdf:about="http://www.example.com/meta.pls" dc:title="Hamlet-like Soliloquy" dc:description="Aldine's Soliloquy in the style of Hamlet" dc:publisher="W3C" dc:language="en-US" dc:date="2002-11-29" dc:rights="Copyright 2002 Aldine Turnbet" dc:format="application/pls+xml"> <dc:creator>William Shakespeare</dc:creator> <dc:creator>Aldine Turnbet</dc:creator> </rdf:Description> </rdf:RDF> </metadata> </lexicon>
<lexeme>
elementThe <lexeme>
element is a
container for a lexical entry which may include multiple
orthographies and multiple pronunciation information.
The <lexeme>
element must
contain one or more <grapheme>
elements for each written
form, may contain either one or more <phoneme>
elements for each
pronunciation or a <alias>
element for orthographic substitutions and one or more <example>
elements.
A pronunciation lexicon for the Italian language with two lexeme:
<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xml:lang="it-IT" alphabet="ipa">
<lexeme>
<grapheme>file</grapheme>
<phoneme>faɪl</phoneme>
<!-- This is an example of IPA string for
the pronunciation of the English word: "file"
that may be present in an Italian text.
This is the pronunciation: "faɪl". -->
</lexeme>
<lexeme>
<grapheme>EU</grapheme>
<alias>Unione Europea</alias>
<!-- This is a substitution of the European
Union acronym in Italian language. -->
</lexeme>
</lexicon>
<grapheme>
elementA <lexeme>
contains at
least one <grapheme>
element. Each <grapheme>
element contains CDATA specifying the orthography.
In more complex situations there may be alternative textual representations for the same word or phrase; this can arise due to a number of reasons, for example:
In order to remove the need for duplication of pronunciation
information to cope with the above variations, the <lexeme>
element may contain more
than one <grapheme>
element
to define the base orthography and any variants which should share
the pronunciations.
A <grapheme>
may
optionally contain an "orthography"
attribute which
identifies the script code used for writing the orthography. The
script code must be compliant with ISO 15924 [ISO15924].
"orthography"
attributeThere is a discussion in the group on the usefulness of the
"orthography"
attribute. We will revise it in
future.
An example of a single grapheme and a single pronunciation.
<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xml:lang="en-US" alphabet="ipa">
<lexeme>
<grapheme>Sepulveda</grapheme>
<phoneme>sə'pʌlvɪdə</phoneme>
<!-- IPA string is: "sə'pʌlvɪdə" -->
</lexeme>
</lexicon>
Another example with more than one written form for a lexical entry, where the first orthography uses latin characters ("Latn" code in ISO 15924 [ISO15924]) for "Romaji" orthography, the second one uses "Kanji" orthography ("Hani" code in ISO 15924 [ISO15924]) and the third one uses the "Hiragana" orthography ("Kana" code in ISO 15924 [ISO15924]):
<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xml:lang="jp" alphabet="ipa">
<lexeme>
<grapheme orthography="Latn">nihongo</grapheme>
<grapheme orthography="Hani">日本語</grapheme>
<grapheme orthography="Kana">にほんご</grapheme>
<phoneme>nɪhɒŋɒ</phoneme>
<!-- IPA string is: "nɪhɒŋɒ" -->
</lexeme>
</lexicon>
<phoneme>
elementA <lexeme>
may contain
one or more <phoneme>
elements. Each <phoneme>
element contains CDATA specifying the pronunciation. A <phoneme>
element may optionally
have an attribute "alphabet"
which indicates the
pronunciation alphabet that is used for this <phoneme>
element only. The legal
values for the "alphabet"
attribute are described in
Section 2.
The "prefer"
is an optional attribute which
indicates the preferred pronunciation to be used by a speech
synthesizer among multiple pronunciations. The possibile values
are: "true"
or "false"
. The default value
is "false"
.
A <phoneme>
element is in
mutal exclusion with an <alias>
element.
More than one pronunciation per lexical entry:
<?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xml:lang="en-US" alphabet="ipa"> <lexeme> <grapheme>huge</grapheme> <phoneme prefer="true">hju:ʤ</phoneme> <!-- IPA string is: "hju:ʤ" --> <phoneme>ju:ʤ</phoneme> <!-- IPA string is: "ju:ʤ" --> </lexeme> </lexicon>
More than one written form and more than one pronunciation:
<?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xml:lang="en-US" alphabet="ipa"> <lexeme> <grapheme>theater</grapheme> <grapheme>theatre</grapheme> <phoneme prefer="true">'θɪətər</phoneme> <!-- IPA string is: "'θɪətər" --> <phoneme>'θi:jətər</phoneme> <!-- IPA string is: "'θi:jətər" --> </lexeme> </lexicon>
<alias>
elementA <lexeme>
element may
contain one <alias>
element
which is used to indicate the pronunciation of an acronym or an
abbreviated term in terms of other orthographies.
Inside a <lexeme>
element, an <alias>
element
is in mutual exclusion with one or more <phoneme>
elements .
If authors want explicit control over the pronunciation, they should
use the <phoneme>
element
instead of the <alias>
element.
The need for recursion in the <alias> element
(the processor re-applying the lexicon mechanism on the content
of the <alias>
element)
is under discussion in the Working Group.
Substitution of an acronym using the <alias>
element:
<?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xml:lang="en-US" alphabet="ipa"> <lexeme> <grapheme>W3C</grapheme> <alias>World Wide Web Consortium</alias> </lexeme> </lexicon>
<example>
elementThe <example>
element
includes an example utterance that illustrates an occurence of this
lexeme. Because the examples are explicitly marked, automated tools
can be used for regression testing and for generation of
pronunciation lexicon documentation.
Zero, one or many <example>
elements may be provided
for a single <lexeme>
element.
<?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xml:lang="en-US" alphabet="ipa"> <lexeme> <grapheme>lead</grapheme> <phoneme>led</phoneme> <example>My feet were as heavy as lead.</example> </lexeme> <lexeme> <grapheme>lead</grapheme> <phoneme>li:d</phoneme> <example>The guide once again took the lead.</example> </lexeme> </lexicon>
This section is informative.
In its simplest form the Pronunciation Lexicon language allows orthographies (the textual representation) to be associated with pronunciations (the phonetic/phonemic representation). A Prounciation Lexicon document typically contains multiple entries. So for example to specify the pronunciation for proper names, such as "Newton" and "Scahill", the markup will look like the following.
<?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xml:lang="en-GB" alphabet="ipa"> <lexeme> <grapheme>Newton</grapheme> <phoneme>'nju:tən</phoneme> <!-- IPA string is: "'nju:tən" --> </lexeme> <lexeme> <grapheme>Scahill</grapheme> <phoneme>'skɑhɪl</phoneme> <!-- IPA string is: "'skɑhɪl" --> </lexeme> </lexicon>
Here we see the root element <lexicon>
which contains the two
lexicon entries (<lexeme>
elements) for the words "Newton" and "Scahill". Each <lexeme>
is a composite element
consisting of the orthographic and pronunciation representations
for the entry. For the two <lexeme>
elements there is a single
<grapheme>
element which
includes the orthographic text and the <phoneme>
element which includes the
pronunciation. In this case the "alphabet"
attribute
of the <lexicon>
element is
set to "ipa"
, so the International Pronunciation
Alphabet [IPA] has to be used for all the
pronunciations.
For speech recognition systems it is common to rely on multiple
pronunciations of the same word or phrase in order to cope with
variations of pronunciation within a language. In the Pronunciation
Lexicon language, multiple pronunciations are represented by more
than one <phoneme>
elements
within the same <lexeme>
element.
In the following example the word "Newton" has two possible pronunciations.
<?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xml:lang="en-GB" alphabet="ipa"> <lexeme> <grapheme>Newton</grapheme> <phoneme prefer="true">'nju:tən</phoneme> <!-- IPA string is: "'nju:tən" --> <phoneme>'nu:tən</phoneme> <!-- IPA string is: "'nu:tən" --> </lexeme> </lexicon>
In the situation where only a single pronunciation needs to be
selected among multiple pronunciations that are available (for
example where a pronunciation lexicon is also being used by a
speech synthesis system), then the "prefer"
attribute
on the <phoneme>
element may
be used to indicate the preferred pronunciation.
In some situations there are alternative textual representations for the same word or phrase this can arise due to a number of reasons. See Section 4.5 for details.
Here are two simple examples of multiple orthographies: alternative spelling of an English word and multiple writings of a Japanese word.
<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xml:lang="en-US" alphabet="ipa">
<!-- English entry showing how alternative spellings are handled -->
<lexeme>
<grapheme>colour</grapheme>
<grapheme>color</grapheme>
<phoneme>'kʌlə</phoneme>
<!-- IPA string is: "'kʌlə" -->
</lexeme>
</lexicon>
<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xml:lang="jp" alphabet="ipa">
<!-- Japanese entry showing how multiple writing systems are handled
romaji, kanji and hiragana orthographies -->
<lexeme>
<grapheme orthography="Latn">nihongo</grapheme>
<grapheme orthography="Hani">日本語</grapheme>
<grapheme orthography="Kana">にほんご</grapheme>
<phoneme>nɪhɒŋɒ</phoneme>
<!-- IPA string is: "nɪhɒŋɒ" -->
</lexeme>
</lexicon>
In most languages there are occasions where there are homophones, words with different spellings and different meanings but the same pronunciation, for instance "seed" and "cede" or the English names "Smyth" and "Smith". In some cases the pronunciations may overlap rather than being exactly the same, for example the English names "smyth" and "smith" share one pronunciation but "smyth" has a pronunciation that is only relevant to itself. Hence we cannot use the multiple orthography mechanism.
Pronunciations are explicitly bound to one or more orthographies
within a <lexeme>
element so
homophones are easy to cope with. See the following examples:
<?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xml:lang="en-US" alphabet="ipa"> <lexeme> <grapheme>cede</grapheme> <phoneme>si:d</phoneme> </lexeme> <lexeme> <grapheme>seed</grapheme> <phoneme>si:d</phoneme> </lexeme> <lexeme> <grapheme>Smyth</grapheme> <phoneme>smɪθ</phoneme> <!-- IPA string is: "smɪθ" --> <phoneme>smaɪð</phoneme> <!-- IPA string is: "smaɪð" --> </lexeme> <lexeme> <grapheme>Smith</grapheme> <phoneme>smɪθ</phoneme> <!-- IPA string is: "smɪθ" --> </lexeme>
In most languages there are occasions where there are words with
the same spelling but different pronunciations. For example, in
English the noun "refuse" has a different pronunciation to the verb
"refuse". If a lexicon author did not want to necessarily
distinguish between the two words then they could simply be
represented as alternate pronunciations within the same <lexeme>
element, otherwise two
different <lexeme>
elements
need to be used.
<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xml:lang="en-US" alphabet="ipa">
<lexeme>
<grapheme>refuse</grapheme>
<phoneme>rɪ'fju:z</phoneme>
<!-- IPA string is: "rɪ'fju:z" -->
</lexeme>
<lexeme>
<grapheme>refuse</grapheme>
<phoneme>'refju:s</phoneme>
</lexeme>
</lexicon>
For some words and phrases pronunciation can be quickly and
conveniently expressed as a sequence of other orthographies. This
has the advantage of not requiring linguistic knowledge, but
instead makes use of the pronunciations that are already expected
to be available. To express pronunciations using other
orthographies the <alias>
element may be used.
This feature may be very useful to deal with acronyms expansion.
<?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xml:lang="en-US" alphabet="ipa"> <!-- Acronym expansion --> <lexeme> <grapheme>W3C</grapheme> <alias>World Wide Web Consortium</alias> </lexeme> <!-- number representation --> <lexeme> <grapheme>101</grapheme> <alias>one hundred and one</alias> </lexeme> <!-- crude pronunciation mechanism --> <lexeme> <grapheme>Thailand</grapheme> <alias>tie land</alias> </lexeme> <!-- crude pronunciation mechanism and acronym expansion --> <lexeme> <grapheme>BBC 1</grapheme> <alias>be be sea one</alias> </lexeme> </lexicon>
The editor wishes to thank the first author of this document, Frank Scahill, and the members of the Voice Browser Working Group involved in this activity (listed in alphabetical order):
This section is normative.
To be done!
This section is normative.
To be done!
This section is normative.
To be done!