W3C

PLS 1.0: Last Call Disposition of Comments

This version:
November 7, 2007
Editor:
Paolo Baggia, Loquendo

Abstract

This document details the responses made by the Voice Browser Working Group to issues raised during the first Last Call (beginning 31 January 2006 and ending 15 March 2006) and the second Last Call (beginning 26 October 2006 and ending 26 November 2006). Comments were provided by Voice Browser Working Group members, other W3C Working Groups, and the public via the www-voice-request@w3.org (archive) mailing list.

Status

This document of the W3C's Voice Browser Working Group describes the disposition of comments as of October 1, 2007 on the second Last Call Working Draft of Pronunciation Lexicon Specification (PLS) Version 1.0 and first Last Call Working Draft of Pronunciation Lexicon Specification (PLS) Version 1.0. . It may be updated, replaced or rendered obsolete by other W3C documents at any time.

For background on this work, please see the Voice Browser Activity Statement.

Table of Contents


1. Introduction

This document describes the disposition of comments in relation to first Last Call Working Draft Pronunciation Lexicon Specification (PLS) Version 1.0 (http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/) and second Last Call Working Draft Pronunciation Lexicon Specification (PLS) Version 1.0 (http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20061026/). Each issue is described by the name of the commenter, a description of the issue, and either the resolution or the reason that the issue was not resolved.

Notation: Each original comment is tracked by a "(Change) Request" [R] designator. Each point within that original comment is identified by a point number. For example, "R100-1" is the first point in the change request number 100 for the specification.

2. Summary

ItemCommentatorNatureDisposition
R100-1Paul Bagshaw (2006-02-03)Feature RequestAccepted
R100-2Paul Bagshaw (2006-02-03)Feature RequestAccepted
R100-3Paul Bagshaw (2006-02-03)Clarification / Typo / Editorial Accepted
R100-4Paul Bagshaw (2006-02-03)Clarification / Typo / Editorial Accepted
R101Mark Alexandre (2006-02-08)Clarification / Typo / Editorial No reply
R102Al Gilman (2006-03-15)Feature RequestAccepted
R103-1Richard Ishida (2006-03-21)Feature RequestAccepted
R103-2Richard Ishida (2006-03-21)Clarification / Typo / Editorial Accepted
R103-3Richard Ishida (2006-03-21)Clarification / Typo / Editorial Accepted
R103-4Richard Ishida (2006-03-21)Technical ErrorAccepted
R103-5Richard Ishida (2006-03-21)Clarification / Typo / Editorial Accepted
R103-6Richard Ishida (2006-03-21)Technical ErrorAccepted
R103-7Richard Ishida (2006-03-21)Clarification / Typo / Editorial Accepted
R103-8Richard Ishida (2006-03-21)Clarification / Typo / Editorial Accepted
R103-9Richard Ishida (2006-03-21)Clarification / Typo / Editorial Accepted
R103-10Richard Ishida (2006-03-21)Clarification / Typo / Editorial Accepted
R103-11Richard Ishida (2006-03-21)Clarification / Typo / Editorial Accepted
R103-12Richard Ishida (2006-03-21)Clarification / Typo / Editorial Accepted
R103-13Richard Ishida (2006-03-21)Clarification / Typo / Editorial Accepted
R103-14Richard Ishida (2006-03-21)Clarification / Typo / Editorial Accepted
R103-15Richard Ishida (2006-03-21)Clarification / Typo / Editorial Accepted
R103-16Richard Ishida (2006-03-21)Clarification / Typo / Editorial Accepted
R103-17Richard Ishida (2006-03-21)Clarification / Typo / Editorial Accepted
R103-18Richard Ishida (2006-03-21)Clarification / Typo / Editorial Accepted
R103-19Richard Ishida (2006-03-21)Clarification / Typo / Editorial Accepted
R103-20Richard Ishida (2006-03-21)Clarification / Typo / Editorial Accepted
R103-21Richard Ishida (2006-03-21)Clarification / Typo / Editorial Accepted
R103-22Richard Ishida (2006-03-21)Clarification / Typo / Editorial Accepted
R103-23Richard Ishida (2006-03-21)Clarification / Typo / Editorial Accepted
R103-24Richard Ishida (2006-03-21)Change to Existing FeatureAccepted
R103-25Richard Ishida (2006-03-21)Clarification / Typo / Editorial Accepted
R103-26Richard Ishida (2006-03-21)Clarification / Typo / Editorial Accepted
R103-27Richard Ishida (2006-03-21)Clarification / Typo / Editorial Accepted
R103-28Richard Ishida (2006-03-21)Clarification / Typo / Editorial Accepted
R103-29Richard Ishida (2006-03-21)Clarification / Typo / Editorial Accepted
R103-30Richard Ishida (2006-03-21)Feature RequestAccepted
R103-31Richard Ishida (2006-03-21)Clarification / Typo / Editorial Accepted
R103-32Richard Ishida (2006-03-21)Clarification / Typo / Editorial Accepted
R103-33Richard Ishida (2006-03-21)Clarification / Typo / Editorial Accepted
R103-34Richard Ishida (2006-03-21)Clarification / Typo / Editorial Accepted
R103-35Richard Ishida (2006-03-21)Clarification / Typo / Editorial Accepted
R103-36Richard Ishida (2006-03-21)Feature RequestAccepted
R104Janina Sajka (2006-04-26)Clarification / Typo / Editorial Accepted
R105-1Deborah Dahl (2006-05-15)Clarification / Typo / Editorial Accepted
R105-2Deborah Dahl (2006-05-15)Clarification / Typo / Editorial Accepted
R106-1Kurt Fuqua (2006-11-20)Clarification / Typo / Editorial Accepted
R106-2Kurt Fuqua (2006-11-20)Feature RequestAccepted
R106-3Kurt Fuqua (2006-11-20)Clarification / Typo / Editorial Implicitly accepted
R106-4Kurt Fuqua (2006-11-20)Clarification / Typo / Editorial Implicitly accepted
R106-5Kurt Fuqua (2006-11-20)Feature RequestAccepted

2.1 Clarifications, Typographical, and Other Editorial

Issue R100-3

From Paul Bagshaw (2006-02-03):

3. Specification ambiguity

* Section 4.4 of PLS stipulates:

"The <lexeme> element contains one or more <grapheme> elements, one or more of either <phoneme> or <alias> elements, and zero or more <example> elements."

However, it appears to be possible to have BOTH <phoneme> AND <alias> elements in <lexeme>, as illustrated in example 4 and more clearly described in section 4.9.2

. . . either by <phoneme> elements or <alias> elements or a combination of both . . .

The either/or of section 4.4 needs correction (Proposition 3: add “or a combination of both”).

Resolution: Accepted

You are right there is a difference in wording between Section 4.4 [1] and Section 4.9.2 [2]. We propose the following change in Section 4.4. [1]

OLD:
"The <lexeme> element contains one or more <grapheme> elements, one or more of either <phoneme> or <alias> elements, and zero or more <example> elements."

NEW:
"The <lexeme> element contains one or more <grapheme> elements, one or more pronunciations (either by <phoneme> elements or <alias> elements or a combination of both), and zero or more <example> elements."

We think this will solve your concern on this issue.

[1] http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20061026/#S4.4
[2] http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20061026/#S4.9.2

Email Trail:

Issue R100-4

From Paul Bagshaw (2006-02-03):

4. Terminology

A final relatively minor comment: in section 4.5.

A <grapheme> may optionally contain an orthography attribute which identifies the script code used for writing the orthography.

The term ‘orthography’ has doubled use; one as a glossary term and the other as an attribute name. Only the font makes the specification clear. Rewording of the glossary term should be envisaged.

Resolution: Rejected

As consequence of another comment, we resolved to remove the 'orthography' attribute from the grapheme element because we do not see its value and recognize the benefits of supporting a mixture of script types within a grapheme element (which occurs in Japanese, for example).

Consequently, the double use you mention is no more present in the specification and the definition in the glossary may remain the same.

Email Trail:

Issue R101

From Mark Alexandre (2006-02-08):

In the Pronunciation Lexicon Specification (PLS) v1.0, as of Draft 31, the usage of the element tag <grapheme> reflects a misunderstanding of the meaning of the word "grapheme." The definition in the spec's Glossary of Terms is nevertheless quite accurate: "One of the set of the smallest units of a written language, such as letters, ideograms, or symbols, that distinguish one word from another; a representation of a single orthographic element."

Thus, the letter "g" and the numeral "4" are both examples of widely used graphemes, as are the question mark "?" and the dollar sign "$". In the current draft of the PLS however, the so-called "grapheme" element is mistakenly applied to what in English is commonly called the "spelling." I believe this will lead to confusion unless the element is renamed. As to what it should be named, more on that below.

Before that, however, I wish to draw attention to an attribute of the grapheme element, the "orthography" attribute. This value of this attribute is, according to the draft spec, supposed to be a "script code" compliant with the ISO 15924 standard. The title of that standard is "Information and documentation — Codes for the representation of names of scripts." All of this naturally leads to the question: why not name this attribute "script" or "scriptcode"?

The word "orthography", derived from Greek roots meaning (roughly) "correct" and "writing", can present some ambiguity between two related meanings, but neither meaning is the same as "script."

Orthography can be used as a synonym for what most English speakers more commonly call the "spelling" of a particular word. Thus the examples "colour" and "color" are two different orthographies for the same word in the English language — the latter being the American orthography that was adopted following a set of spelling reforms which the rest of the English-speaking world declined to follow. The corresponding word in the French language is written with the orthography "couleur."

In a related sense, the word "orthography" can be used to refer to an entire system of conventions for writing, including such issues as spelling and punctuation, plus even such trivia as the direction of writing (such as left-to-right or vice versa), the spacing and/or divisions of words, etc. Some may also comprehend the word in this broader sense to include issues of penmanship or calligraphy, that is, the correct method to compose or draw the graphemes (or characters, or symbols, or glyphs, if you like) of the language. Note that, in this sense, conventions of orthography can differ even between cultures that use the same alphabet — even between the style guides used by differing editorial staffs in the same metropolis!

Neither meaning of "orthography" is to be conflated with the meaning of the word "script." The Greek, Latin and Cyrillic alphabets, as well as ancient cuneiform, Egyptian hieroglyphics, Chinese characters (Hanzi, or Kanji in Japan), etc., are all most precisely referred to as scripts, collectively. Perhaps the only alternative to "script" is the much more vague and expansive term, "writing system."

Finally, then, it would seem clear that the weight of the evidence clearly argues for the attribute in question to be called "script" (or "scriptcode" to be verbose), as indeed it is called in ISO 15924. Having thus liberated the word "orthography" from misapplication, we may consider that word a candidate for the element incorrectly labelled "grapheme."

In addition to "orthography," other candidates for the element now called grapheme in the draft spec might include "spelling" or "writing" or the almost comically long-winded "graphic presentation form." Any one of these four terms would be vastly preferable to "grapheme," —which, again, is simply wrong—but each does have certain short- comings as well. I will briefly list the problems I am aware of.

The term "orthography" is almost ideal, except for its unavoidable connotation of "correctness." That is, were it ever desirable, for whatever reason, to list what may be deemed a "non-standard" written form of a word, then calling that an orthography for the word is misleading. On the other hand, obviously, if the PLS is specifically only intended to associate pronunciation with "correct" spellings (according to somebody's criteria of correctness), then orthography (in its narrower sense, vide supra) would be precisely accurate. An additional bonus is that this word is understood with pretty much the same meaning in other languages such as French and Spanish.

The term "spelling" is by far the more commonly used word by English speakers when referring to how to write out a particular word. Further- more, it carries no connotation of correctness, since you can easily refer to "alternate spelling" or even "bad spelling." The downside, a minor one, is that the notion of spelling is strongly associated with alphabets; it is not at all clear what spelling means in the context of Chinese writing or similar non-alphabetic systems.

The term "writing" is just vague enough to mean anything you want. Since it applies to every aspect of, well, writing, it could be applied to any aspect of it. Put another way, its upside is its downside.

Finally, as for "graphic presentation form," or something similarly long and comically precise: one is tempted to wonder whether every XML parser out there really can handle sentence-length element names, as well as how many folks have access to an XML editor with a contextual auto-completion feature!

Just to throw out one last (off-the-wall) possibility, consider that the Spanish cognate of the English word orthography is "ortografía", which commonly gets shortened to just "grafía." [See for example, http://www.xtec.es/~faguile1/grafia/]. This suggests that this originally Greek root for "writing" suffices all by itself to communicate the idea we are talking about here. Perhaps an Anglicized (or Anglicised, if you prefer) coining such as "graphy" or a more internationally flavored "graphia" or "graphie" would actually be the least open to misuse and misinterpretation, since—to paraphrase Humpty-Dumpty— it would mean just what we chose it to mean.

Resolution: Rejected

We have summarised your comment and identified two requests.

#1. Rename the 'orthography' attribute to 'script' or 'scriptcode' within the grapheme element.

#2. Having liberated the name orthography, rename the grapheme element 'orthography' (preference), 'spelling', writing' or even 'graphia'

We resolve to remove the 'orthography' attribute from the grapheme element because we do not see its value and recognize the benefits of supporting a mixture of script types within a grapheme element (which occurs in Japanese, for example). Consequently, the request to rename the attribute is inapplicable. The name 'orthography' is however liberated.

The element named 'grapheme' [1] almost always involves a sequence of graphemes. However, it is not a requirement for the element to contain a sequence of graphemes; only one grapheme (smallest orthographic unit) is permissible (minimum requirement). The grapheme or sequence of graphemes given in the 'grapheme' element corresponds to the phoneme or sequence of phonemes given in the 'phoneme' element. This is in accordance with the notion of "grapheme-to-phoneme conversion" (or, in layman's terms, letter-to-sound conversion). The name of the element 'grapheme' goes hand-in-hand with the name of the element 'phoneme', which has been borrowed from SSML 1.0 [2] because it has a similar usage.

Future revisions of PLS may wish to define the pronunciation of orthographic units larger than the grapheme, such as 'morpheme' or 'affix' (as is common in system internal lexicons). Grapheme, morpheme, affix, locution... are all terms that refer to orthographic units. A generic term such as 'orthography', 'spelling' or 'writing' etc. for this element seems inappropriate at this stage given that it would probably have to be changed to 'grapheme' in future. It is thus our opinion that the current name 'grapheme' is the best name for this element.

Please note that our reply is in accordance with two further public comments closely related to yours [3] [4].

[1] http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131//#S4.5
[2] http://www.w3.org/TR/speech-synthesis/#S3.1
[3] http://lists.w3.org/Archives/Public/www-voice/2006AprJun/0107.html
[4] http://lists.w3.org/Archives/Public/www-voice/2006AprJun/0120.html

Email Trail:

Issue R103-2

From Richard Ishida (2006-03-21):

Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/

Comment 2
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: Many
Owner: RI


Comment:

(All examples show escapes in the code and ipa characters in comments. Please reverse this. It would be fine to say in one place that people could use escapes if it is difficult to type in characters (as you do at the end of 4.1) (though see the suggestion for overcoming that difficulty using a character picker, later). The current approach encourages the use of escapes, and makes the examples difficult to read.

Resolution: Accepted

We accept your request. All the examples in the PLS specification will be changed by moving the "numerical character references" into the XML comments and the IPA unicode characters into the PLS elements.

We originally followed the convention from the SSML 1.0 specification [1] (i.e. in Section 3.1.9 of SSML) to better support copy-and-paste from the spec to an SSML document. Two years later, this may no longer be necessary.

We will add an Informative Note mentioning the possible use of the "numerical character references" as shown in the comment.

Is the term "numerical character references" the correct one? We found a normative reference in [2]:

[[ C047 [I] [C] Escapes SHOULD only be used when the characters to be expressed are not directly representable in the format or the character encoding of the document, or when the visual representation of the character is unclear. ]]

[1] http://www.w3.org/TR/speech-synthesis/
[2] http://www.w3.org/TR/charmod/

Email Trail:

Issue R103-3

From Richard Ishida (2006-03-21):

Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/

Comment 3
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: General
Owner: FS


Comment:

Is a pronounciation lexicon embeddable in other formats, e.g. like MathML in HTML? Please address this question at someplace.

Resolution: Rejected

We believe this request is outside the current scope of the PLS specification. That stated, we believe that adding support for an embedded lexicon within SSML and SRGS documents is valuable and should be considered for future versions of those specifications.

Email Trail:

Issue R103-5

From Richard Ishida (2006-03-21):

Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/

Comment 5
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: -
Location in reviewed document: General
Owner: RI


Comment:

Thank you for specifying the encoding on the XML declaration throughout!

Resolution: Accepted

Fine.

Email Trail:

Issue R103-7

From Richard Ishida (2006-03-21):

Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/

Comment 7
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: General
Owner: RI


Comment:

Please make it clearer, throughout the document, when talking about multiple instances of grapheme or phoneme, whether this is useful for speech synthesis or speech recognition.

Resolution: Accepted

Email Trail:

Issue R103-8

From Richard Ishida (2006-03-21):

Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/

Comment 8
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 1.1, example
Owner: RI


Comment:

Surely "La vita e' bella" should be "La vita è bella" ?

Resolution: Accepted

We accept your correction and will fix the specification accordingly.

Email Trail:

Issue R103-9

From Richard Ishida (2006-03-21):

Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/

Comment 9
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 1.1, example
Owner: RI


Comment:

We expected an xml:lang attribute around the phrase "La vita e' bella".

Resolution: Rejected

In this example, a TTS synthesizer is rendering the text using the voice of an American English speaker (xml:lang="en-US"). The SSML specification contains the following warning about changing the language indication in midsentence [1]:

[[ xml:lang is a defined attribute for the voice, speak, p, and s elements. For vocal rendering, a language change can have an effect on various other parameters (including gender, speed, age, pitch, etc.) which may be disruptive to the listener. There might even be unnatural breaks between language shifts. For this reason authors are encouraged to use the voice element to change the language. xml:lang is permitted on p and s only because it is common to change the language at those levels. ]]

and continues:

[[ Specifying xml:lang does not imply a change in voice, though this may indeed occur. When a given voice is unable to speak content in the indicated language, a new voice may be selected by the processor. To avoid a potential incongruity, the language change was not indicated in this example. ]]

If you believe that the language in the SSML 1.0 specification differs in intent from the xml:lang definition in the XML 1.0 specification [2], the Voice Browser Working Group is currently collecting requirements for SSML 1.1.

[1] http://www.w3.org/TR/2004/REC-speech-synthesis-20040907/#S3.1.2
[2] http://www.w3.org/TR/2004/REC-xml-20040204/#sec-lang-tag

Email Trail:

Issue R103-10

From Richard Ishida (2006-03-21):

Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/

Comment 10
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 1.1, 2nd example
Owner: RI


Comment:

The quotation marks have been removed in this version of the example. Is that on purpose, or an omission?

Resolution: Accepted

We accept your correction and will fix the specification accordingly.

Email Trail:

Issue R103-11

From Richard Ishida (2006-03-21):

Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/

Comment 11
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 1.1
Owner: RI


Comment:

It would be nice (though not essential) to include a short and simple PLS document at the end of section 1.1 just to complete the picture for the user. A simple example will probably be easy enough to understand on its own.

Resolution: Accepted

We will add an example lexicon to Section 1.1 [1].

[1] http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/#S1.1

Email Trail:

Issue R103-12

From Richard Ishida (2006-03-21):

Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/

Comment 12
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 1.1, last para
Owner: RI


Comment:

s/then/than/

Resolution: Accepted

We accept your correction and will fix the specification accordingly.

Email Trail:

Issue R103-13

From Richard Ishida (2006-03-21):

Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/

Comment 13
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 1.5
Owner: RI


Comment:

"Example orthographies include Romaji, Kanji, and Hiragana"

Are Romaji, Kanji and Hiragana separate orthographies, or just different scripts in the Japanese orthography? Certainly, although the examples in the spec are usually only one or other of these alphabets per <grapheme>, mixtures are more usual for Japanese text.

Resolution: Accepted

Our proposal is to delete the sentence you mentioned.

An example of a mixture will be added to the PLS spec in response to issue R103-25.

Email Trail:

Issue R103-14

From Richard Ishida (2006-03-21):

Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/

Comment 14
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 2, 2nd para
Owner: RI


Comment:

Is there an online location that repeats the information in the (hardcopy) IPA handbook? Is it the same information as is found at http://www.arts.gla.ac.uk/ipa/ipachart.html? If so, it might be helpful to include a note pointing to that.

Resolution: Rejected

To our knowledge, there is no online version of the IPA handbook so we must normatively reference the printed book. The link you mention is a page on the web that describes how to acquire the printed copy.

Email Trail:

Issue R103-15

From Richard Ishida (2006-03-21):

Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/

Comment 15
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 2, 2nd para
Owner: RI


Comment:

In addition to IPAUNICODE1 and IPAUNICODE2, please point to the IPA Character Picker [http://people.w3.org/rishida/scripts/pickers/ipa/]. This was recently updated against the information on the IPA homepage, and allows people to easily create short strings of IPA text for insertion into their documents. (And will probably also be useful for creatingthis spec.)

Resolution: Accepted (w/modifications)

We asked Ian Jacobs, the Comm Team head, about the reference to software from a recommendation track specification.

He replied:

[[ Link to software from the group's public home page, not from the spec. Links to software from specs are likely to become outdated rapidly. On the basis of this comment, we decided to reject your request and instead add a link to the IPA Character Picker tool in the Voice Browser Home-Page. ]]

We double check with Ian Jacobs after your comments, see his answer:

[[ Thank you for the summary info. I think I still prefer the indirection. Here are some reasons why:

a) Because a different WG is managing the referenced page, there is an increased likelihood that they will make changes to their page(s) without realizing the impact of referring pages (in this case, TR documents). It is also more difficult for the Voice WG to track changes made by other groups.

b) In a year, there may be a tool far superior to picker, or picker may no longer work. Rather than have the TR specification refer to something out of date, the indirection allows the Voice WG to maintain (if it chooses) an up-to-date list of useful resources.

Therefore, I still prefer the indirection, especially for an informative reference. If the group feels very strongly that the direct reference will be a significant improvement to the specification, then perhaps including both links will be acceptable: "Try picker and <a>other tools</a>." or something like that. ]]

We will introduce a note about the presence of tools and invitation to visit the VBWG page. In the VBWG page we will add the link to the IPA Picker.

Email Trail:

Issue R103-16

From Richard Ishida (2006-03-21):

Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/

Comment 16
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 3
Owner: RI


Comment:

Please use some markup to clarify the locations of the normative usages of "must", "should", "must not" etc. in the text.

Resolution: Accepted

We will adopt the style you suggested for the RFC2119 must/should/etc

[[Check out: http://www.w3.org/2001/06/manual/#RFC ]]

Email Trail:

Issue R103-17

From Richard Ishida (2006-03-21):

Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/

Comment 17
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 4, table
Owner: RI


Comment:

Description says "Meta data container element"

Description for <meta> is misleading: it is not a container, but empty. For the typical reader, saying it is the sameas HTML would be helpful.

May be better to say 'element containing meta data'

Resolution: Accepted

We accept the wording you suggested.

Email Trail:

Issue R103-18

From Richard Ishida (2006-03-21):

Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/

Comment 18
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 4.1, 2nd para
Owner: RI


Comment:

"which indicates the pronunciation alphabet".

Since the alphabet setting can be overridden on a phoneme element, the text should say "which indicates the default pronunciation alphabet".

Resolution: Accepted

We accept the wording you suggested.

Email Trail:

Issue R103-19

From Richard Ishida (2006-03-21):

Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/

Comment 19
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 4.1, 4th para
Owner: RI


Comment:

Please clarify why lexicons are separated by language?

Resolution: Deferred

We imagine that a future version of PLS will be multilingual. However, for the first version, we'd prefer to defer this request.

We believe that an implementation of a monolingual lexicon is simpler and perhaps more efficient. SSML and SRGS both support multiple lexicons for defining pronunciations for multiple languages. Para 5 in section 4.1 of PLS 1.0 LCWD [1] already says this. If you feel this is not clear, please suggest alternative wording.

[1] http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/#S4.1

Email Trail:

Issue R103-20

From Richard Ishida (2006-03-21):

Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/

Comment 20
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 4.1, 4th para
Owner: RI


Comment:

s/RFC 3066/RFC 3066 or its successor/

(Note that 'its successor' has already been approved by the IETF and is just pending publication.)

Resolution: Accepted

We accept you request and we'll make the following changes:

* In fourth para, Section 4.1 [1]
We will replace 'RFC 3066 [RFC3066]' with 'IETF Best Current Practice 47 [BCP47]'

* In the references in Section 6.1 [2]
We will add a normative reference to [BCP47].

The reference will point to http://www.rfc-editor.org/rfc/bcp/bcp47.txt.

[1] http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/#S4.1
[2] http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/#S6.1

Email Trail:

Issue R103-21

From Richard Ishida (2006-03-21):

Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/

Comment 21
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: S
Location in reviewed document: 4.3, 1st example
Owner: RI


Comment:

How is dc:language="en-US" meant to be interpreted if it appears in a metadata element? How does it affect the xml:lang declaration on PLS elements?

Resolution: Accepted

We will remove dc:language from the example in Section 4.3 [1].

[1] http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/#S4.3

Email Trail:

Issue R103-22

From Richard Ishida (2006-03-21):

Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/

Comment 22
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 4.3, 2nd example
Owner: RI


Comment:

It would be helpful to explain why this lexicon, labelled as xml:lang="it" contains English graphemes.

Resolution: Accepted

The example you mention is not "4.3, 2nd example", but it should be "4.4 example".

We can add a clarification either before the example or in the comment. The example shows an Italian lexicon containing the loan word "file" which is used in technical discussions to have the same meaning as in English. This is distinct from the homograph "file" which is the plural form of "fila" meaning "queue".

Email Trail:

Issue R103-23

From Richard Ishida (2006-03-21):

Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/

Comment 23
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 4.3, 2nd example
Owner: RI


Comment:

Unless there is some particular reason, it is better (and potentially less confusing for the reader) to use "it" rather than"it-IT".

Resolution: Accepted

We will use "it" instead of "it-IT". This example is not specific to Italian language spoken in Italy.

Email Trail:

Issue R103-25

From Richard Ishida (2006-03-21):

Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/

Comment 25
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 4.5, 3rd bullet
Owner: RI


Comment:

"Alternate writing systems, e.g. Japanese uses a mixture of Han ideographs (Kanji), and phonemic spelling systems e.g. Katakana or Hiragana for representing the orthography of a word or phrase;"

The fact that Japanese mixes scripts is one thing, but i think the point here is that, for example, one sometimes writes the same word using hiragana and sometimes with kanji, according to preference or circumstance.

A good example might be 'shouyu' (soy sauce), which can be written using either kanji or hiragana: kanji 醤油; hiragana: しょうゆ

[See the comment at http://www.w3.org/International/reviews/0603-pls10/ if non-ASCII characters are corrupted by the mail]

Resolution: Accepted

We accept you comment by changing the third bullet in Section 4.5 [1]. The proposed text is the following which includes an inline example of mixed scripts:

[[ Alternate writing systems, e.g. Japanese uses a mixture of Han ideographs (Kanji), and phonemic spelling systems (Katakana or Hiragana) for representing the orthography of a word or phrase, and such mixture sometimes has several variations as in kana suffixes following kanji stems (Okurigana) for example "okonau" (行なう vs. 行う); ]]

[1] http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/#S4.5

Email Trail:

Issue R103-26

From Richard Ishida (2006-03-21):

Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/

Comment 26
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 4.5, 3rd para
Owner: RI


Comment:

"In order to remove the need for duplication of pronunciation information to cope with the above variations, the <lexeme> element may"

Here is an example of where it might be good to distinguish between TTS and ASR. You could say: "In order to remove the need for duplication of pronunciation information to cope with the above variations during text-to-speech, the <lexeme> element may contain"

Resolution: Rejected

The following text appears in Section 4.5 [1]:
"In order to remove the need for duplication of pronunciation information to cope with the above variations, the <lexeme> element may contain more than one <grapheme> element to define the base orthography and any variants which should share the pronunciations."

We believe that there is general utility, beyond text-to-speech, for supporting multiple graphemes. To illustrate one such case, the following lexicon might be used for US English:

<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
alphabet="ipa" xml:lang="en-US">

<lexeme>
<grapheme>judgment</grapheme>
<grapheme>judgement</grapheme>
<phoneme>ˈʤʌʤ.mənt</phoneme>
<\lexeme>
<lexeme>
<grapheme>fiancé</grapheme>
<grapheme>fiance</grapheme>
<phoneme>fiˈɑ̃ːn.seɪ</phoneme>
<phoneme>ˌfiː.ɑːnˈseɪ</phoneme>
</lexeme>
</lexicon>


In text-to-speech documents, as has been noted,

<?xml version="1.0" encoding="UTF-8"?>
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"
xml:lang="en-US">

<lexicon uri="http://www.example.com/lexicon_defined_above.xml"/>

<p> In the judgement of my fiancé, Las Vegas is the best place for a honeymoon.
I replied that I preferred Venice and didn't think the Venetian casino was an
acceptable compromise.<\p>
</speak>


but also in speech recognition grammars,

<?xml version="1.0" encoding="UTF-8"?>
<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar"
xml:lang="en-US" root="movies">

<lexicon uri="http://www.example.com/lexicon_defined_above.xml"/>

<rule id="movies" scope="public">
<one-of>
<item>Terminator 2: Judgment Day</item>
<item>My Big Fat Obnoxious Fiance</item>
<item>Pluto's Judgement Day</item>
</one-of>
</rule>
</grammar>


We feel that this is used both for TTS and ASR therefore we reject your proposal to add only "text-to-speech".

Please indicate whether you are satisfied with the VBWG's resolution, whether you think there has been a misunderstanding, or whether you wish to register an objection.

[1] http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/#S4.5

Email Trail:

Issue R103-27

From Richard Ishida (2006-03-21):

Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/

Comment 27
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: S
Location in reviewed document: 4.5
Owner: RI


Comment:

What is the value of the orthography attribute?

We see no value, and its purpose is not expressed in the text.

Resolution: Accepted

We decided to remove the "orthography" attribute. We also do not see its value and recognize the benefits of supporting a mixture of script types within a grapheme element.

Email Trail:

Issue R103-28

From Richard Ishida (2006-03-21):

Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/

Comment 28
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: S
Location in reviewed document: 4.5, 2nd example
Owner: RI


Comment:

There are a number of problems with the use of the orthography attribute in this example for Japanese:

The kana label is incorrect - it should say hira, since this is hiragana, not katakana.

There is currently no label available for the extremely common form of Japanese words that mixes both kanji and hiragana, eg. 混じる 'to mix' (contains one kanji and two hiragana characters).

Is nɪhɒŋɒ an accurate phonemic/phonetic transcription?

[See the comment at http://www.w3.org/International/reviews/0603-pls10/ if non-ASCII characters are corrupted by the mail]

Resolution: Accepted

As we will remove the "orthography" attribute, this comment reduces to the accurate transcription of the 2nd example in Section 4.5 [1].

We are looking for experts in IPA for the Japanese language to check the transcription, or alternately we will change the example to a simpler one.

[1] http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/#S4.5

Email Trail:

Issue R103-29

From Richard Ishida (2006-03-21):

Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/

Comment 29
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 4.7
Owner: RI


Comment:

What does transformation mean? Is it the first example, W3C? If so, please clarify briefly.

Resolution: Accepted

We will change the word 'transformations' to 'substitutions'.

Email Trail:

Issue R103-31

From Richard Ishida (2006-03-21):

Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/

Comment 31
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 5.3
Owner: RI


Comment:

We don't see any value in the additional examples in 5.3, since all examples are instances of homographs or homophones (or expansions, which are not referred to here). Why not skip this and go straight into 5.4?

Resolution: Accepted (w/modifications)

The examples in section 5.3 [1] do not strictly contain homophones. A pair of homophones is two different *words* (thus, with two different *meanings*) that have the same pronunciation. Each example in 5.3 contains one word that can be written in different ways and that retains the same meaning and pronunciation.

We think that "Multiple Orthographies" is a common phenomenon and worth to be presented in a separate section with examples and explanations.

[1] http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/#S5.3

Email Trail:

Issue R103-32

From Richard Ishida (2006-03-21):

Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/

Comment 32
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 5.4
Owner: RI


Comment:

I think the Smyth example just confuses things at the beginning of the section and in the example. It is an example of something that is both a homograph and homophone at the same time - for which there appears to be no good solution. I would just add a reference to the fact that such things exist after the example in 5.4, and perhaps use one of the examples in 5.3 rather than the Smyth one.

Resolution: Accepted (w/modifications)

We can clarify the Section 5.4 [1] by splitting the example in two parts. First the seed/cede examples and then the Smyth/Smith. We believe that PLS is most valuable for addressing the difficult cases that arise in human speech. We see a value to maintaining complex examples to illustrate how an author might address these complex cases.

[1] http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/#S5.4

Email Trail:

Issue R103-33

From Richard Ishida (2006-03-21):

Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/

Comment 33
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 5.4, 2nd para
Owner: RI


Comment:

"Pronunciations are explicitly bound to one or more orthographies within a <lexeme> element so homophones are easy to handle. See the following examples:"

This should say, "homophones are easy to handle for text-to-speech". They are not easy to handle in an ASR context, and there should be an informative note here like in 5.5, but referring to ASR rather than TTS!

Resolution: Accepted (w/modifications)

The two main uses of PLS are for SRGS (ASR) and SSML (TTS). In both these cases the PLS are applied on grapheme to define the phonemes to be recognized (for ASR) and to be pronounced (for TTS). There are other uses of PLS, for instance in a dictation or for unconstrained ASR, but which might not be covered by the current specification.

We accept this request and we will expand Section 1.2 with the following wording:

"PLS entries are applied to the graphemes inside SRGS grammar rules to convert them into the phonemes to be recognized. See the example below and the example in Section 1.3 for a PLS document used for both ASR and TTS.
There might be other uses of PLS, for instance in a dictation system or for unconstrained ASR, which might be beyond the scope of this specification."

Email Trail:

Issue R103-34

From Richard Ishida (2006-03-21):

Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/

Comment 34
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 5.5 example
Owner: RI


Comment:

Shouldn't the second 'refuse' be pronounced with a short e and a non-lengthened u and final z? (Note also that the comment is superfluous.)

There are other instances where the phonemic transcription seems strange (eg. use of 'e'). Please have them checked by phoneticians who are familiar with the languages.

Resolution: Accepted

We will check all the pronunciation transcriptions with phoneticians. On the example in Section 5.5 [1] we found this IPA transcription in both the Cambridge online dictionary [2] and in the book version of Longman Pronunciation Dictionary by JC Wells. Both these resources shows IPA pronunciations. Merriam-Webster (more US-centric) [3] uses a different pronunciation alphabet but gives a very similar pronunciation to the ones in the examples.

[1] http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/#S5.5
[2] http://dictionary.cambridge.org/
[3] http://www.m-w.com/cgi-bin/dictionary?book=Dictionary

Email Trail:

Issue R103-35

From Richard Ishida (2006-03-21):

Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/

Comment 35
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 5.5
Owner: RI


Comment:

This whole section seems strangely biased.

"In both cases the processor will not be able to distinguish when to apply the first or the second transcription."

The above statement only applies for the text-to-speech author. For ASR, this is a perfectly valid approach, and there solution will cause no problems.

"the current version of specification is not able to instruct the PLS processor how to distinguish the two pronunciations "should read "the current version of specification is not able to instruct the PLS processor *performing text-to-speech* how to distinguish the two pronunciations".

Resolution: Accepted

The resolution of issue #36 will significantly change Section 5.5 [1]. We will propose new wording and we will welcome your review.

[1] http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/#S5.5

Email Trail:

Issue R104

From Janina Sajka (2006-04-26):

On behalf of the WAI Protocols and Formats Working Group action:

http://www.w3.org/2006/03/29-pf-minutes.html#action01

PF supports the use of pronunciation lexicons because they have proven effective mechanisms to support accessibility for persons with disabilities as well as greater usability for all users. We support:

http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/

However, we would like to see pronunciation lexicons adopted more widely across the multimodal web as an available mechanism for any textual content that might be rendered through TTS by some user agent. We should not, in other words, concieve of this mechanism only in terms of voice browsers. It is not difficult to imagine how user agents might voice more than just SSML and SRGS markup. Indeed, this is already the case for persons who are blind using screen readers.

Screen readers have provided pronunciation lexicons for several decades now because correct pronunciation is a simple, highly effective mechanism for advancing comprehension. A W3C defined mechanism could do the same for web content and allow content providers a standard mechanism to insure domain specific terms will be correctly rendered by TTS engines where they otherwise would not have been correctly rendered. This mechanism could be used to pronounce names correctly (like mine), including geographic variants (like the capitol city of the U.S. State of South Dakota). Other examples abound.

Resolution: Accepted

A possible resolution would be to add a paragraph into Section 1 [1] that describes the scenario you mentioned.

We think the best would be if you can propose a paragraph and we will review it.

[1] http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/#S1

Email Trail:

Issue R105-1

From Deborah Dahl (2006-05-15):

The W3C Multimodal Interaction Working Group has reviewed the Pronunciation Lexicon Specification [1] and has prepared the following comments. These are not specific requests for changes, but comments on how the PLS might fit into multimodal applications.

Usually PLS is used within a voice modality component and is not exposed as its own modality component. However, a TTS component (e.g. a prompt) might want to expose PLS events. For example, loading a lexicon module, or when a specific pronunciation is unavailable. These events would be generated by the modality component that interprets and applies the PLS.

PLS might be useful for spelling correction as part of a multimodal application, but this isn't seen as an important use case. So for most purposes, PLS is transparent to MMI.

Pronunciation lexicons might be exposed through the Delivery Context Interfaces (DCI)[2]. In principle, you could use this to set the default lexicon and other configuration properties. The DCI models properties as a hierarchy of DOM nodes and could be used to expose capabilities and the means to adjust the corresponding properties, e.g. which languages are supported by the speech synthesiser, the default pitch, rate and many other properties.

Otherwise, no specific comments.

[1] http://www.w3.org/TR/pronunciation-lexicon/
[2] http://www.w3.org/TR/2005/WD-DPF-20051111/

Resolution: Deferred

As regards DCI, being that this is a first generation document, the immediate focus has been on providing the functionality necessary for existing specifications, we think that it will be considered for future versions which will be asked to serve broader classes of applications and different architectures.

While on spelling correction if there are other requirements from MMI for specific uses, we can collect them for a future version of PLS.

Email Trail:

Issue R105-2

From Deborah Dahl (2006-05-15):

Current synthesizers are weak with respect to contextualized pronunciations and it is desirable that PLS provide a convenient means for application developers to work around that, i.e. more convenient than providing explicit pronunciations in SSML for each occurrence of a word that would otherwise be mispronounced.

Resolution: Accepted

It will be addressed by a new "role" attribute to be added to next draft that will address this issue.

Email Trail:

Issue R106-1

From Kurt Fuqua (2006-11-20):

The document does not explicitly state whether the pronunciations in the lexicon are to be phonemic or phonetic. The difference is significant. The tag names (<phoneme>) and the examples in the document imply that the representation is phonemic. In an example (4.6), the word “huge” is transcribed using the phonemic transcription “hjuːʤ” rather than the phonetic transcription “çjuːʤ”.

I believe that all the pronunciations should be given phonemically rather than phonetically. First, a phonetic transcription is generally beyond the capability of those not trained in linguistics. Even a trained linguist would have a difficult time creating consistent phonetic transcriptions. There is a second and far more compelling theoretical reason why it has been standard practice for lexicons to be transcribed phonemically. Phonology rules need to be applied to a phonemic transcription in order to render a phonetic transcription for a sentence to be synthesized or spoken. This requires the underlying phoneme representation. If the phoneme representation were not given, one would first have to work backwards to determine the phonemic transcription before the phonology rules could be applied. Several of the phonology rules apply across word boundaries. Therefore a phonetic transcription of individual words is counter-productive.

Thus, I recommend that the specification be explicit that the transcriptions are to be phonemic, not phonetic, and that this be required.

Resolution: Deferred

VBWG understands your request to be able to indicate whether an IPA pronunciation is intended to be phonemic or phonetic. We acknowledge that its resolution might require to extend PLS specification.

The IPA phonetic alphabet allows the user to specify both detailed (allophones) or broad (phonemes) transcriptions, so it is open to diverse uses. In the meantime SSML 1.1 specification is working on a number of related issues, for instance the creation of alternate pronunciation alphabet to be registered as standard alphabets, and specifically to address issues that are related to internationalize SSML.

We think it would be better to address your request in a future release of PLS specification to be able to address it in conjunction with and by leveraging on the results of SSML 1.1 definition.

Email Trail:

Issue R106-3

From Kurt Fuqua (2006-11-20):

The IPA is a well developed and useful representation. However it does contain some significant ambiguities. I believe that the standard should recommend certain normalized forms.

Several consonant phonemes can be represented using alternate symbols under the official guidelines. This ambiguity means that comparing IPA symbols becomes quite complex. For example, the very common IPA symbol #110 is represented as /ɡ/ (x0261) and is logically equivalent to #210 /g/ (x0067). There many such ambiguous IPA symbols. There are also several obsolete IPA symbols which are still frequently used (e.g. #214). (This obsolete symbol is even included as an example in the draft.)

I recommend that a recommendation be made for the normalization of these ambiguous and obsolete IPA symbols within the PLS.

Resolution: Accepted

We better understand your request to add an informative note in Section 2 [1] to point out that in Appendix 2 on "Handbook of the International Phonetic Association", Cambridge Univ. Press in Table 3-6 there is a description of equivalence of IPA symbols, symbols that are not IPA usage, or which were once recommended but are no longer recommended. The user of PLS should be made aware of that.

The proposal in [2] was very interesting and rich of examples, but too descriptive for the PLS specification. It would be very useful for a Tutorial document on PLS. For the specification a concise note seems more appropriate.

After checking again Section 2 of PLS [1], where in the second paragraph Appendix 2 of [IPAHNDBK] is already referenced, the group decided to add the following Informative Note:

"Note that there are peculiarities in the IPA alphabet which might have implications for implementers, for instance equivalent, withdrawn and superseded IPA symbols; see Appendix 2 of [IPAHNDBK] for further details."

[1] http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20061026/#S2
[2] http://lists.w3.org/Archives/Public/www-voice/2007JulSep/0017.html

Email Trail:

Issue R106-4

From Kurt Fuqua (2006-11-20):

As its name implies, the IPA was created primarily as a phonetic alphabet, not a phonemic alphabet. It can be used for the representation of phonemes but unfortunately linguists often transcribe the identical language with slightly different IPA symbols and diacritics. There should be recommendations to normalize the transcription of phonemes using IPA symbols.

The central problem is that a phoneme is a set of allophones. IPA can transcribe allophones in a way that is generally unambiguous. The difficulty is selecting which symbol to use to represent the set. For example, some linguists would transcribe the English word ‘heat’ as /hiːt/ others would use /hit/. In English, vowel length is not contrastive, although the vowel /i/ is always long. The question is whether to include a diacritic, or suprasegmental such as length, if that feature is not contrastive in the language. This issue was resolved under SLAPI with the following three normalization guidelines:

1) No modifier should be used in a phoneme symbol which does not constitute a phonemic contrast for the phoneme in that language.

2) When a phoneme is phonetically rendered in allophones involving different base symbols, the symbol chosen to represent the phoneme should be the one from which the others are arguably derived by phonological rule.

3) The phoneme symbol must be unique within the particular language.

While these do not resolve every issue of phonemic representation, they do resolve most such issues and allow for a standard normalization.

I recommend that the same phoneme normalization guidelines be adopted.

Resolution: Accepted (w/modifications)

Our resolution was to consider the addition of an informative note, but instead of including the guidelines we would prefer to add a reference to the SLAPI documentation. An alternative resolution might be to reference similar guidelines from the IPA Handbook, which is already cited by the specification, but we were unable to find them in the IPA Handbook.

The Informative Note below will be added in the PLS specification:
"When IPA symbols are used to represent the phonemes of a language, there can be an ambiguity concerning which allophonic symbol to select to represent a phoneme. Note that this may result in inconsistencies between lexicons which were composed for the identical language."

The group finally made a decision that the reference to the Scalable Language API (SLAPI) documentation [1] will not be part of the Informative Note.

[1] https://sourceforge.net/projects/slapi/

Email Trail:

2.2 Technical Errors

Issue R103-4

From Richard Ishida (2006-03-21):

Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/

Comment 4
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: S
Location in reviewed document: Schema
Owner: FS


Comment:

The description of the schema in the text does not match the schema itself in various places.

For (one) example: the text defines a sequence of the meta element, the metadata element, and a sequence of lexeme elements:meta.elt.type*, metadata.elt.type*, lexeme.elt.type* but the schema says (lexeme.elt.type | meta.elt.type | metadata.elt.type)*

Resolution: Accepted

We agree that the Schema needs to be fixed. Our idea was to have a strict order of meta, metadata and then lexeme elements.

After a review of this issue our intention is to allow only one metadata element, because it can contain many subelements.

Email Trail:

Issue R103-6

From Richard Ishida (2006-03-21):

Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/

Comment 6
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E/S
Location in reviewed document: General
Owner: RI


Comment:

We have not seen the IPA Handbook, so cannot verify, but the examples in the spec use an apostrophe for primary word stress and a colon for vowel lengthening (eg. 5.1 example), whereas there are ipa characters for this, ˈ and ː.

eg. Newton is transcribed 'nju:tən rather than ˈnjuːtən

Section 2 does not mention alternate forms. Are the examples correct?

[See the comment at http://www.w3.org/International/reviews/0603-pls10/ if non-ASCII characters are corrupted by the mail]

Resolution: Accepted

We accept your comment. All the examples will be revised using consistent IPA characters in the transcriptions.

Email Trail:

2.3 Requests for Change to Existing Features

Issue R103-24

From Richard Ishida (2006-03-21):

Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/

Comment 24
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: S
Location in reviewed document: 4.5
Owner: RI


Comment:

In the glossary of terms you define 'grapheme' as "One of the set of the smallest units of a written language, such as letters, ideograms, or symbols, that distinguish one word from another; a representation of a single orthographic element." but then you use it as an element name to label content that almost always involves a *sequence* of graphemes.

Please find a better name for the element. How about 'text' or 'phrase' ?

Resolution: Rejected

The observation that the element named 'grapheme' [1] almost always involves a *sequence* of graphemes is quite true. However, it is not a requirement for the element to contain a *sequence* of graphemes; only one grapheme (smallest orthographic unit) is permissible (minimum requirement). This is why the element is named 'grapheme' rather than 'graphemes'. The grapheme or sequence of graphemes given in the 'grapheme' element corresponds to the phoneme or sequence of phonemes given in the 'phoneme' element. This is in accordance with the notion of "grapheme-to-phoneme conversion" (or, in layman's terms, letter-to-sound conversion). The name of the element 'grapheme' goes hand-in-hand with the name of the element 'phoneme', which has been borrowed from SSML 1.0 [1] because it has a similar usage.

Future revisions of PLS may wish to define the pronunciation of orthographic units larger than the grapheme, such as 'morpheme' or 'affix' (as is common in system internal lexicons). Grapheme, morpheme, affix, locution... are all terms that refer to orthographic units. A generic term such as 'text' or 'phrase' for this element seems inappropriate at this stage given that it would probably have to be changed to 'grapheme' in future.

It is thus our opinion that the current name 'grapheme' is the best name for this element.

[1] http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/#S4.5
[2] http://www.w3.org/TR/speech-synthesis/#S3.1

Email Trail:

2.4 New Feature Requests

Issue R100-1

From Paul Bagshaw (2006-02-03):

Comments made below refer to the 31 January 2006 publication of the PLS last call working draft: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/

They are presented here for your consideration and will be openly discussed during (and probably for some time to come) the WG teleconference scheduled for 9 February.

The principal comment is related to the way homographs are handled by PLS. The homographs problem has already been subject to some discussion (see http://lists.w3.org/Archives/Member/w3c-voice-wg/2005Sep/0124.html and http://lists.w3.org/Archives/Member/w3c-voice-wg/2005Nov/0005.html) and has been deferred up to the LCWD. I propose that it is now necessary to address this problem before moving on to the next stage and hope that the comments below will initiate discussion as to how to resolve the problem.

With regards,
Paul Bagshaw

1. The homograph (heterophone) problem.

PLS 1.0 aims to address only the most important aspects of the requirements document (http://www.w3.org/TR/lexicon-reqs/).

* Section 4.9.2 of the LCWD stipulates:

"If more than one <lexeme> contains the same <grapheme>, all their pronunciations will be collected in document order and a TTS processor must use the first one in document order that has the prefer attribute set to "true". If none of the pronunciations has prefer set to "true", the TTS processor must use the first one in document order."

The requirement 4.2 classes handling of homophones (heterographs) as MUST HAVE (for ASR), but in contrary, requirement 4.4 for handling homographs (heterophones) is classed only as NICE TO HAVE (for TTS), and has thus not been considered as essential to the LCWD. It’s a shame that handling homographs is not also classed as MUST HAVE. In its current status, PLS just won’t be used for applications exploiting TTS where homographs can occur. Many, if not ALL, applications for many languages depend on homograph disambiguation. An application MUST HAVE a means of indexing unambiguously every pronunciation in the dictionary. It is not possible in the current version of the PLS proposal.

It must be possible to associate some additional information (other than the lexeme orthography, <grapheme>) with each pronunciation.

For example, in a simple case, associating a grammatical category to a particular pronunciation in a lexeme is sufficient to distinguish ‘does’ (verb, to do) from ‘does’ (noun, plural of doe). Consider the more complex case of reading an address book full of proper nouns (place and people names) in which the pronunciation of a person’s name depends upon the area from which they come (in the same country speaking the same language – yes, it happens at least in French where final consonants may be pronounced for names originating from the west and south of France, but not elsewhere in the country). The application may have knowledge of the origin of the request for information and instruct the TTS to reply with an according pronunciation. Note that this second example is independent of part-of-speech tags (or grammatical categories) and sentence semantics.

The nature of the additional information is open-ended and subject to (too) much discussion (semantics, part-of-speech tags) since there is no standard representation (there’s no universal set of multilingual grammatical categories, for example, and there never will be since there is no universal grammar). The information required can also be application dependant (as illustrated above).

Proposition 1: add an interpret-as attribute to the <phoneme> and <alias> elements.

The problem with having multiple interpretations for a given orthography is equally addressed in the SSML <say-as> element. The proposition here is therefore to add the ‘interpret-as’ attribute with the same values as those in the SSML <say-as> element. <say-as interpret-as=”noun” does> could thus be used to index the lexeme in:

<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
alphabet="ipa" xml:lang="en-US">
<lexeme >
<grapheme>does</grapheme>
<phoneme interpret-as="verb">dez</phoneme>
<example>He does not like it.</example>
<phoneme interpret-as="noun">dowz</phoneme>
<example>The does hide behind the trees.</example>
</lexeme>
</lexicon>


(sorry if the IPA phonemes are inexact)

The value of the ‘interpret-as’ attribute in the PLS element must exactly match that of the SSML <say-as> ‘interpret-as’ attribute when it is to be rendered by a TTS system.

The secondary consequences of this proposition are: 1 the editor of the SSML and PLS files controls the content of the interpret-as values, 2 any future standardisation of SSML interpret-as values can be tied in with PLS.

There is an analogy to this proposed attribute in the <grapheme> element; the ‘orthography’ attribute associates additional information with the <grapheme> content.

Resolution: Accepted

We accept your proposal to add an attribute in the PLS as a way of uniquely matching homographs to pronunciations.

Email Trail:

Issue R100-2

From Paul Bagshaw (2006-02-03):

2. The homophone (heterograph) problem

* Section 5.4 of the requirements document refers to “pronunciation preference” and has been successfully accommodated for in the PLS by the ‘prefer’ attribute in <phoneme> and <alias> elements. However, ASR currently has no means of indexing a unique orthography from a particular pronunciation. The following requirement is surprisingly not present:

The pronunciation lexicon markup must enable indication of which orthography is the preferred form for use by speech recognition where there are multiple orthographies for a lexicon entry. The pronunciation lexicon markup must define the default selection behaviour for the situations where there are multiple orthographies but no indicated preference.

If PLS is to be used equitably in ASR and TTS environments, then functionality available for grapheme to phoneme mapping should equally be made available for phoneme to grapheme mapping (and visa versa).

Proposition 2: add a prefer attribute to the <grapheme> element.

For example, spelling variations could thus be marked with a preference for dictation applications.

<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
alphabet="ipa" xml:lang="en-US">
<lexeme>
<grapheme prefer="true">theater</grapheme>
<grapheme>theatre</grapheme>
<phoneme>'θɪətər</phoneme>
<!-- IPA string is: "&#x03B8;&#x026A;&#x0259;t&#x0259;r" -->
</lexeme>
</lexicon>

Resolution: Deferred

PLS 1.0 was mainly conceived to address issues for SSML and SRGS. Your proposal is extending the use of PLS to ASR for dictation.

Even if your proposal is very interesting, we'd prefer to defer this request to a future version of PLS.

Email Trail:

Issue R102

From Al Gilman (2006-03-15):

1. Provide better discrimination in determining pronunciation preference.
The specification provides for one, static 'preferred' pronunciation [2] for a lexeme, which may have multiple graphemes associated with it but none of them are at all aware of markup in the SRGS or SSML documents that are being processed.
[2] http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/#S4.6

1.1 problems with this situation
This limitation, which means that homographs cannot be given any sort of pronunciation selectivity, should not be accepted.
[3] http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/#S5.5

a. It defeats the use of the Pronunciation Lexicon Specification in the production to audio media of talking books [4]. This is an important use case for access to information by people with disabilities, print disabilities in this case.
[4] http://lists.w3.org/Archives/Public/www-voice/2001JanMar/0020.html

b. It defeats the intended interoperability of lexicons between ASR and TTS functions [5]. lexicons will serve ASR best with lots of pronunciations, and TTS best with few, unless the many pronunciations can be marked up as to when to use which.
[5] http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/#S1.3

c. It fails to interoperate with the intelligence already in SSML in the say-as element [6].
[6] http://www.w3.org/TR/2005/NOTE-ssml-sayas-20050526/
While many functional limitations have been incorporated in the Voice Browser specifications in order to reach a platform of well-supported common markup, it does not seem to make sense to have say-as capability in SSML with QName ability to indicate terms defined outside the corpus of Voice Browser specifications, and not use this information in determining which pronunciation is preferred when.

1.2 opportunities to do better
As suggested above, there would seem to be a ready path to resolving homographs and other preferred-pronunciation subtleties by use of the say-as element and its interpret-as attribute in SSML to distinguish cases where the preferred pronunciation was one way or another.

1.2.1 Allow markup in <grapheme>
One way to do this would be to allow <say-as> markup inside the <grapheme> element wrapping the plain text of the token being pronounced.

1.2.2 XPath selectors
A second, probably better way, would be to use XPath selectors to distinguish the cases where one pronunciation is preferred as opposed to another. This markup would closely resemble the use of XPath selectors in DISelect [7].
[7] http://www.w3.org/2001/di/Group/di-selection/

In either case, the value of ssml:say-as.interpret-as could be used as a discriminant in choosing preferred pronunciations. This value in turn can, as a best practice, be reliably tied to semantic information which is precise enough to assure a single appropriate pronunciation.
There are more complicated approaches that could be integrated using SPARQL queries of the <metadata> contents, but a little XPath processing of guard expressions is so readily achievable that it is hard to believe something should not be done to afford this capability.
The QName value of this attribute allows plenty of extension room to create unique keys for the proper names of individual people, along with the ability to refer to WordNet nodes or dictionary entries for pronunciation variants of homographs.

Resolution: Accepted

Thank you for your insightful comments. The subject of pronunciation selection has been mentioned by several reviewers. The PLS team has given considerable attention to the topic and we agree that adding a mechanism greatly improves the specification.

Returning to the homograph issue, the English language provides different pronunciations for 'read' depending on whether the word is used in the present or past tense. This might argue for part of speech as a determinant. On the other hand, a term such as 'Lima' (bean or city in Peru) or 'bass' (fish or musical instrument) may require a different determinant. Instead of enumerating the options, which we believe to be extremely difficult if not impossible, we have adopted an alternative that permits both standard selection mechanisms and allows for easy extensions.

Your comment offered two different mechanisms: a QName based selection process or an XPath based one. We examined each and have been persuaded by the power and simplicity of the QName approach. The PLS specification will be updated to allow a 'role' attribute on the <lexeme> element. This will take a list of QNames. Users of PLS such as SRGS and SSML will then be able to select which role is desired.

An example may be helpful. Looking at the 'read' vs. 'read' case in SSML:

<?xml version="1.0"?>
<speak version="1.1" xmlns="http://www.w3.org/2001/10/synthesis"
xmlns:claws="http://www.example.com/claws7tags" xml:lang="en">
<voice gender="male" age="3">
Can you <token role="claws:vvi">read</token> this book to me?</voice>
<voice gender="male" age="35">
I've already <token role="claws:vvn">read</token> it three times!</voice>
</speak>


Here the part of speech from the CLAWS tagger is used [1]. The corresponding lexicon might look like:

<?xml version="1.0"?>
<lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
xmlns:claws="http://www.example.com/claws7tags" alphabet="ipa"
xml:lang="en-US">
<lexeme role="claws:vvi">
<grapheme>read</grapheme>
<phoneme>rid</phoneme>
</lexeme>
<lexeme role="claws:vvn">
<grapheme>read</grapheme>
<phoneme>rɛd</phoneme>
</lexeme>
</lexicon>


Allowing a list of QNames in PLS allows lexicon entries to be marked with multiple tags (e.g. CLAWS7 vs CLAWS5 vs SEC vs ...) when required. This addresses the issue of pronunciation tagging. Solving the other half of the problem will require changes to SSML and to SRGS. Already the requirements for the SSML 1.1 specification are being collected and are expected to include a new <token> element on which a similar 'role' attribute might be specified.

[1] http://www.comp.lancs.ac.uk/ucrel/claws7tags.html

Email Trail:

Issue R103-1

From Richard Ishida (2006-03-21):

Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/

Comment 1
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: S
Location in reviewed document: Many
Owner: RI


Comment:

(Not clear whether this is a question about PLS or SSML.) Is it possible to choose a pronunciation dictionary on the basis of language? For example, in the case of


<?xml version="1.0" encoding="UTF-8"?><speak version="1.0"
xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
'Chat' in English refers to a conversation, but '<s xml:lang="fr">chat</s>' in French is the word for 'cat'.
</speak>


If not, it would not be possible to distinguish between the two instances of 'chat' correctly.

It seems that lexicons are expected to be in one language or another

Resolution: Rejected

This request is outside the current scope of PLS specification which describes the format of a PLS document. It does not describe how the PLS document will be activated in another markup.

Your request is more appropriate as an extension of SSML. If you like we can send this comment to the VBWG group for the SSML specification.

For PLS specification, we will reject the request.

Email Trail:

Issue R103-30

From Richard Ishida (2006-03-21):

Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/

Comment 30
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: S
Location in reviewed document: 4.8
Owner: RI


Comment:

If the example element can contain only text, it will not be possible to apply directional markup to bidirectional text. Since this text can be harvested for reading elsewhere, we propose that you allow, as a minimum a span-like element within the example element that can support a dir=ltr|rtl|lro|rlo attribute to handle bidirectional text.

You could also allow xml:lang on the span-like element for language markup.

Resolution: Deferred

We will consider your request in a future version of PLS. The current version of PLS will allow the following.

The <example> element has XML Schema type 'string' [1] which allows embedding of directionality marks and overrides (e.g. 0x200E, 0x200F, 0x202D, 0x202E, 0x202C). We've reviewed the I18N FAQ [2] and Unicode Technical Report #20 [3] and believe that embedded character codes are appropriate for the <example> element.

PLS documents cover a single language. We've assumed that the examples would be in the same language as the lexicon and that adding xml:lang to <example> was therefore unnecessary. In the case of 'borrowed' words such as 'hors d'oeuvres', the example would be written in the borrowing language as in:
"<example>As an appetizer, he prepared a wide selection of hors d'oeuvres such as cucumber sandwiches and garlic hummus with baked pita.</example>".

[1] http://www.w3.org/TR/xmlschema-2/#string
[2] http://www.w3.org/International/questions/qa-bidi-controls
[3] http://www.w3.org/TR/unicode-xml/#Charlist

Email Trail:

Issue R103-36

From Richard Ishida (2006-03-21):

Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/

Comment 36
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: S
Location in reviewed document: General
Owner: RI


Comment:

The problem of homographs for TTS and homophones for ASR seems very limiting.

It might be possible to alleviate the problem of homographs for TTS by altering the SSML text, so that tokens are unique, but that would damage portability of the PLS, and, more importantly, cause problems for the use of the same PLS for ASR.

Would it not be possible to tag tokens in SSML so with 'variant ids' using attributes that could be matched to ids in the PLS as a way of uniquely matching homographs to pronunciations?

Resolution: Accepted

We accept your proposal to add an attribute in the PLS as a way of uniquely matching homographs to pronunciations.

This will significantly change Section 5.5. We propose the following wording: "A QName in the attribute content is expanded into an expanded-name using the namespace declarations in scope for the containing lexeme element. Thus, each QName provides a reference to a specific item in the designated namespace. In the second example below, the QName "claws:VVI" within the role attribute expands to the "VVI" item in the http://www.example.com/claws7tags namespace. This mechanism allows for referencing defined taxonomies of word classes, with the expectation that they are documented at the specified namespace URI."

Email Trail:

Issue R106-2

From Kurt Fuqua (2006-11-20):

Lexicons are notorious for containing inconsistent information. It is therefore very useful to include integrity checks within the lexicon. The integrity checks allow for automated consistency checking. For more than a decade, the lexicons of the Scalable Language API have used a phonemic key. The lexicon contains a phonemic key for each language of the lexicon. The phonemic key is simply a list of all the phonemes for that language. If any pronunciation contains a phoneme which is not in that phoneme set, there is a consistency error. The concept is very simple, and it catches many errors immediately after an edit. Analogously there is also a grapheme key; this contains every grapheme used by that language. There are several other integrity checks possible and SLAPI implements most of them. For the sake of brevity, I will emphasize only these two keys.

I recommend that both a phonemic key and a graphemic key be included for each language in the lexicon, and that these keys be required.

Resolution: Deferred

Your proposal to add a phonemic and graphemic key to allow the implementation of integrity checks is very interesting but it was judged to be an advanced feature. It is currently outside the high priority features to be addressed in the requirement document of PLS [1].

We propose to address it in a future version of PLS language. For PLS 1.0, there are elements like Metadata (or Meta), see sections 4.2 [2] and 4.3 [3], that might be used to experiment the use of these consistency keys, even if the PLS language doesn't include them directly in the specification.

If you think that an informative note might be added in the current specification, we suggest you to make a proposal to be reviewed by the VBWG.

[1] http://www.w3.org/TR/2004/WD-lexicon-reqs-20041029/
[2] http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20061026/#S4.2
[3] http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20061026/#S4.3

Email Trail:

Issue R106-5

From Kurt Fuqua (2006-11-20):

I understand that a decision was made to not include part of speech information. I think that the lack of this most basic form of grammatical information will fundamentally handicap the standard. Part of speech information is used to differentiate pronunciations. English and other languages have many pairs of words that are pronounced differently. The draft includes such an example (4.9.3 example 2). The lexicon is the source of all word-level information for the applications. Without part of speech information in the lexicon, there is simply no way to differentiate which pronunciation to use.

I do recognize that this introduces another level of complexity in that the specification must include the possible parts of speech. However this has been addressed in other existing standards such as LexiconXML, the Scalable Language API and OSIS.

I recommend that part of speech information be included as a tag.

Resolution: Accepted (w/modifications)

As we mentioned in our previous answer, we partially accepted your request by adding the "role" attribute in Section 4.4 [1]. This is the current device to specify POS in PLS.

We believe that our group is not the right one to define a standard set of values that works for all languages of the world. Instead we believe the existing mechanism, which allows a reference to an external POS list, is more flexible and permits reference to standards created by groups who are expert in this area. We may revisit this issue for a future version of PLS specification and after the definition of SSML 1.1 specification to take care of requirements arising from that internationalization activity on SSML 1.1.

[1] http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20061026/#S4.4

Email Trail: