Copyright © 2006 W3C ® ( MIT , ERCIM , Keio), All Rights Reserved. W3C liability, trademark, document use, and software licensing rules apply.
This document details the responses made by the Voice Browser Working Group to issues raised during the first Last Call (beginning 31 January 2006 and ending 15 March 2006). Comments were provided by Voice Browser Working Group members, other W3C Working Groups, and the public via the www-voice-request@w3.org (archive) mailing list.
This document of the W3C's Voice Browser Working Group describes the disposition of comments as of 16th October 2006 on the first Last Call Working Draft of Pronunciation Lexicon Specification (PLS) Version 1.0. It may be updated, replaced or rendered obsolete by other W3C documents at any time.
For background on this work, please see the Voice Browser Activity Statement.
This document describes the disposition of comments in relation to Pronunciation Lexicon Specification (PLS) Version 1.0 (http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131). The goal is to allow the readers to understand the background behind the modifications made to the specification. In the meantime it provides an useful check point for the people who submitted comments to evaluate the resolutions applied by the W3C's Voice Browser Working Group.
This document provides the analysis of the issues that were submitted and resolved as part of the Last Call Review period. It includes issues that were submitted outside the official review period, up to July 2006.
The following table summarizes all the public comments received by the Voice Browser Working Group. The table includes following information on each Comment:
The subsection "2.1 Clarifications, Typographical, and Other Editorial" of this document describes the details of each comment including:
Note: The Disposition of "Waiting Response" means that we have not received a formal acceptance/denial from the Commentator, or the acceptance was pending to the resolution applied. And the Disposition of "Implicitly Accepted" means that the Commentator agrees to the resolution, but requires some additional modification and formal acceptance is not yet received.
| Item | Commentator | Nature | Disposition | 
|---|---|---|---|
| R100-1 | Paul Bagshaw (2006-02-03) | Feature Request | Accepted | 
| R100-2 | Paul Bagshaw (2006-02-03) | Feature Request | Accepted | 
| R100-3 | Paul Bagshaw (2006-02-03) | Clarification / Typo / Editorial | Accepted | 
| R100-4 | Paul Bagshaw (2006-02-03) | Clarification / Typo / Editorial | Accepted | 
| R101 | Mark Alexandre (2006-02-08) | Clarification / Typo / Editorial | Waiting Response | 
| R102 | Al Gilman (2006-03-15) | Feature Request | Waiting Response | 
| R103-1 | Richard Ishida (2006-03-21) | Feature Request | Accepted | 
| R103-2 | Richard Ishida (2006-03-21) | Clarification / Typo / Editorial | Accepted | 
| R103-3 | Richard Ishida (2006-03-21) | Clarification / Typo / Editorial | Accepted | 
| R103-4 | Richard Ishida (2006-03-21) | Technical Error | Waiting Response | 
| R103-5 | Richard Ishida (2006-03-21) | Clarification / Typo / Editorial | Accepted | 
| R103-6 | Richard Ishida (2006-03-21) | Technical Error | Accepted | 
| R103-7 | Richard Ishida (2006-03-21) | Clarification / Typo / Editorial | Waiting Response | 
| R103-8 | Richard Ishida (2006-03-21) | Clarification / Typo / Editorial | Accepted | 
| R103-9 | Richard Ishida (2006-03-21) | Clarification / Typo / Editorial | Accepted | 
| R103-10 | Richard Ishida (2006-03-21) | Clarification / Typo / Editorial | Accepted | 
| R103-11 | Richard Ishida (2006-03-21) | Clarification / Typo / Editorial | Accepted | 
| R103-12 | Richard Ishida (2006-03-21) | Clarification / Typo / Editorial | Accepted | 
| R103-13 | Richard Ishida (2006-03-21) | Clarification / Typo / Editorial | Accepted | 
| R103-14 | Richard Ishida (2006-03-21) | Clarification / Typo / Editorial | Accepted | 
| R103-15 | Richard Ishida (2006-03-21) | Clarification / Typo / Editorial | Accepted | 
| R103-16 | Richard Ishida (2006-03-21) | Clarification / Typo / Editorial | Accepted | 
| R103-17 | Richard Ishida (2006-03-21) | Clarification / Typo / Editorial | Accepted | 
| R103-18 | Richard Ishida (2006-03-21) | Clarification / Typo / Editorial | Accepted | 
| R103-19 | Richard Ishida (2006-03-21) | Clarification / Typo / Editorial | Accepted | 
| R103-20 | Richard Ishida (2006-03-21) | Clarification / Typo / Editorial | Waiting Response | 
| R103-21 | Richard Ishida (2006-03-21) | Clarification / Typo / Editorial | Waiting Response | 
| R103-22 | Richard Ishida (2006-03-21) | Clarification / Typo / Editorial | Accepted | 
| R103-23 | Richard Ishida (2006-03-21) | Clarification / Typo / Editorial | Accepted | 
| R103-24 | Richard Ishida (2006-03-21) | Change to Existing Feature | Accepted | 
| R103-25 | Richard Ishida (2006-03-21) | Clarification / Typo / Editorial | Accepted | 
| R103-26 | Richard Ishida (2006-03-21) | Clarification / Typo / Editorial | Implicitly accepted | 
| R103-27 | Richard Ishida (2006-03-21) | Clarification / Typo / Editorial | Accepted | 
| R103-28 | Richard Ishida (2006-03-21) | Clarification / Typo / Editorial | Accepted | 
| R103-29 | Richard Ishida (2006-03-21) | Clarification / Typo / Editorial | Accepted | 
| R103-30 | Richard Ishida (2006-03-21) | Feature Request | Waiting Response | 
| R103-31 | Richard Ishida (2006-03-21) | Clarification / Typo / Editorial | Accepted | 
| R103-32 | Richard Ishida (2006-03-21) | Clarification / Typo / Editorial | Accepted | 
| R103-33 | Richard Ishida (2006-03-21) | Clarification / Typo / Editorial | Implicitly accepted | 
| R103-34 | Richard Ishida (2006-03-21) | Clarification / Typo / Editorial | Accepted | 
| R103-35 | Richard Ishida (2006-03-21) | Clarification / Typo / Editorial | Waiting Response | 
| R103-36 | Richard Ishida (2006-03-21) | Feature Request | Waiting Response | 
| R104 | Janina Sajka (2006-04-26) | Clarification / Typo / Editorial | Waiting Response | 
| R105-1 | Deborah Dahl (2006-05-15) | Clarification / Typo / Editorial | Accepted | 
| R105-2 | Deborah Dahl (2006-05-15) | Clarification / Typo / Editorial | Accepted | 
From Paul Bagshaw (2006-02-03):
3. Specification ambiguity
* Section 4.4 of PLS stipulates:
"The <lexeme> element contains one or more <grapheme> elements, one or more of either <phoneme> or <alias> elements, and zero or more <example> elements."
However, it appears to be possible to have BOTH <phoneme> AND <alias> elements in <lexeme>, as illustrated in example 4 and more clearly described in section 4.9.2
. . . either by <phoneme> elements or <alias> elements or a combination of both . . .
The either/or of section 4.4 needs correction (Proposition 3: add “or a combination of both”).
Resolution: Accepted
Email Trail:
From Paul Bagshaw (2006-02-03):
4. Terminology
A final relatively minor comment: in section 4.5.
A <grapheme> may optionally contain an orthography attribute which identifies the script code used for writing the orthography.
The term ‘orthography’ has doubled use; one as a glossary term and the other as an attribute name. Only the font makes the specification clear. Rewording of the glossary term should be envisaged.
Resolution: Rejected
Email Trail:
From Mark Alexandre (2006-02-08):
In the Pronunciation Lexicon Specification (PLS) v1.0, as of Draft 31, the usage of the element tag <grapheme> reflects a misunderstanding of the meaning of the word "grapheme." The definition in the spec's Glossary of Terms is nevertheless quite accurate: "One of the set of the smallest units of a written language, such as letters, ideograms, or symbols, that distinguish one word from another; a representation of a single orthographic element."
Thus, the letter "g" and the numeral "4" are both examples of widely used graphemes, as are the question mark "?" and the dollar sign "$". In the current draft of the PLS however, the so-called "grapheme" element is mistakenly applied to what in English is commonly called the "spelling." I believe this will lead to confusion unless the element is renamed. As to what it should be named, more on that below.
Before that, however, I wish to draw attention to an attribute of the grapheme element, the "orthography" attribute. This value of this attribute is, according to the draft spec, supposed to be a "script code" compliant with the ISO 15924 standard. The title of that standard is "Information and documentation — Codes for the representation of names of scripts." All of this naturally leads to the question: why not name this attribute "script" or "scriptcode"?
The word "orthography", derived from Greek roots meaning (roughly) "correct" and "writing", can present some ambiguity between two related meanings, but neither meaning is the same as "script."
Orthography can be used as a synonym for what most English speakers more commonly call the "spelling" of a particular word. Thus the examples "colour" and "color" are two different orthographies for the same word in the English language — the latter being the American orthography that was adopted following a set of spelling reforms which the rest of the English-speaking world declined to follow. The corresponding word in the French language is written with the orthography "couleur."
In a related sense, the word "orthography" can be used to refer to an entire system of conventions for writing, including such issues as spelling and punctuation, plus even such trivia as the direction of writing (such as left-to-right or vice versa), the spacing and/or divisions of words, etc. Some may also comprehend the word in this broader sense to include issues of penmanship or calligraphy, that is, the correct method to compose or draw the graphemes (or characters, or symbols, or glyphs, if you like) of the language. Note that, in this sense, conventions of orthography can differ even between cultures that use the same alphabet — even between the style guides used by differing editorial staffs in the same metropolis!
Neither meaning of "orthography" is to be conflated with the meaning of the word "script." The Greek, Latin and Cyrillic alphabets, as well as ancient cuneiform, Egyptian hieroglyphics, Chinese characters (Hanzi, or Kanji in Japan), etc., are all most precisely referred to as scripts, collectively. Perhaps the only alternative to "script" is the much more vague and expansive term, "writing system."
Finally, then, it would seem clear that the weight of the evidence clearly argues for the attribute in question to be called "script" (or "scriptcode" to be verbose), as indeed it is called in ISO 15924. Having thus liberated the word "orthography" from misapplication, we may consider that word a candidate for the element incorrectly labelled "grapheme."
In addition to "orthography," other candidates for the element now called grapheme in the draft spec might include "spelling" or "writing" or the almost comically long-winded "graphic presentation form." Any one of these four terms would be vastly preferable to "grapheme," —which, again, is simply wrong—but each does have certain short- comings as well. I will briefly list the problems I am aware of.
The term "orthography" is almost ideal, except for its unavoidable connotation of "correctness." That is, were it ever desirable, for whatever reason, to list what may be deemed a "non-standard" written form of a word, then calling that an orthography for the word is misleading. On the other hand, obviously, if the PLS is specifically only intended to associate pronunciation with "correct" spellings (according to somebody's criteria of correctness), then orthography (in its narrower sense, vide supra) would be precisely accurate. An additional bonus is that this word is understood with pretty much the same meaning in other languages such as French and Spanish.
The term "spelling" is by far the more commonly used word by English speakers when referring to how to write out a particular word. Further- more, it carries no connotation of correctness, since you can easily refer to "alternate spelling" or even "bad spelling." The downside, a minor one, is that the notion of spelling is strongly associated with alphabets; it is not at all clear what spelling means in the context of Chinese writing or similar non-alphabetic systems.
The term "writing" is just vague enough to mean anything you want. Since it applies to every aspect of, well, writing, it could be applied to any aspect of it. Put another way, its upside is its downside.
Finally, as for "graphic presentation form," or something similarly long and comically precise: one is tempted to wonder whether every XML parser out there really can handle sentence-length element names, as well as how many folks have access to an XML editor with a contextual auto-completion feature!
Just to throw out one last (off-the-wall) possibility, consider that the Spanish cognate of the English word orthography is "ortografía", which commonly gets shortened to just "grafía." [See for example, http://www.xtec.es/~faguile1/grafia/]. This suggests that this originally Greek root for "writing" suffices all by itself to communicate the idea we are talking about here. Perhaps an Anglicized (or Anglicised, if you prefer) coining such as "graphy" or a more internationally flavored "graphia" or "graphie" would actually be the least open to misuse and misinterpretation, since—to paraphrase Humpty-Dumpty— it would mean just what we chose it to mean.
Resolution: Rejected
Email Trail:
From Richard Ishida (2006-03-21):
Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/
Comment 2
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: Many
Owner: RI
Comment:
(All examples show escapes in the code and ipa characters in comments. Please reverse this. It would be fine to say in one place that people could use escapes if it is difficult to type in characters (as you do at the end of 4.1) (though see the suggestion for overcoming that difficulty using a character picker, later). The current approach encourages the use of escapes, and makes the examples difficult to read.
Resolution: Accepted
Email Trail:
From Richard Ishida (2006-03-21):
Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/
Comment 3
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: General
Owner: FS
Comment:
Is a pronounciation lexicon embeddable in other formats, e.g. like MathML in HTML? Please address this question at someplace.
Resolution: Rejected
Email Trail:
From Richard Ishida (2006-03-21):
Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/
Comment 5
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: -
Location in reviewed document: General
Owner: RI
Comment:
Thank you for specifying the encoding on the XML declaration throughout!
Resolution: Accepted
Email Trail:
From Richard Ishida (2006-03-21):
Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/
Comment 7
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: General
Owner: RI
Comment:
Please make it clearer, throughout the document, when talking about multiple instances of grapheme or phoneme, whether this is useful for speech synthesis or speech recognition.
Resolution: Accepted
Email Trail:
From Richard Ishida (2006-03-21):
Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/
Comment 8
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 1.1, example
Owner: RI
Comment:
Surely "La vita e' bella" should be "La vita è bella" ?
Resolution: Accepted
Email Trail:
From Richard Ishida (2006-03-21):
Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/
Comment 9
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 1.1, example
Owner: RI
Comment:
We expected an xml:lang attribute around the phrase "La vita e' bella".
Resolution: Rejected
Email Trail:
From Richard Ishida (2006-03-21):
Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/
Comment 10
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 1.1, 2nd example
Owner: RI
Comment:
The quotation marks have been removed in this version of the example. Is that on purpose, or an omission?
Resolution: Accepted
Email Trail:
From Richard Ishida (2006-03-21):
Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/
Comment 11
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 1.1
Owner: RI
Comment:
It would be nice (though not essential) to include a short and simple PLS document at the end of section 1.1 just to complete the picture for the user. A simple example will probably be easy enough to understand on its own.
Resolution: Accepted
Email Trail:
From Richard Ishida (2006-03-21):
Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/
Comment 12
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 1.1, last para
Owner: RI
Comment:
s/then/than/
Resolution: Accepted
Email Trail:
From Richard Ishida (2006-03-21):
Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/
Comment 13
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 1.5
Owner: RI
Comment:
"Example orthographies include Romaji, Kanji, and Hiragana"
Are Romaji, Kanji and Hiragana separate orthographies, or just different scripts in the Japanese orthography? Certainly, although the examples in the spec are usually only one or other of these alphabets per <grapheme>, mixtures are more usual for Japanese text.
Resolution: Accepted
Email Trail:
From Richard Ishida (2006-03-21):
Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/
Comment 14
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 2, 2nd para
Owner: RI
Comment:
Is there an online location that repeats the information in the (hardcopy) IPA handbook? Is it the same information as is found at http://www.arts.gla.ac.uk/ipa/ipachart.html? If so, it might be helpful to include a note pointing to that.
Resolution: Rejected
Email Trail:
From Richard Ishida (2006-03-21):
Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/
Comment 15
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 2, 2nd para
Owner: RI
Comment:
In addition to IPAUNICODE1 and IPAUNICODE2, please point to the IPA Character Picker [http://people.w3.org/rishida/scripts/pickers/ipa/]. This was recently updated against the information on the IPA homepage, and allows people to easily create short strings of IPA text for insertion into their documents. (And will probably also be useful for creatingthis spec.)
Resolution: Accepted (w/modifications)
Email Trail:
From Richard Ishida (2006-03-21):
Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/
Comment 16
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 3
Owner: RI
Comment:
Please use some markup to clarify the locations of the normative usages of "must", "should", "must not" etc. in the text.
Resolution: Accepted
Email Trail:
From Richard Ishida (2006-03-21):
Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/
Comment 17
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 4, table
Owner: RI
Comment:
Description says "Meta data container element"
Description for <meta> is misleading: it is not a container, but empty. For the typical reader, saying it is the sameas HTML would be helpful.
May be better to say 'element containing meta data'
Resolution: Accepted
Email Trail:
From Richard Ishida (2006-03-21):
Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/
Comment 18
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 4.1, 2nd para
Owner: RI
Comment:
"which indicates the pronunciation alphabet".
Since the alphabet setting can be overridden on a phoneme element, the text should say "which indicates the default pronunciation alphabet".
Resolution: Accepted
Email Trail:
From Richard Ishida (2006-03-21):
Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/
Comment 19
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 4.1, 4th para
Owner: RI
Comment:
Please clarify why lexicons are separated by language?
Resolution: Deferred
Email Trail:
From Richard Ishida (2006-03-21):
Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/
Comment 20
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 4.1, 4th para
Owner: RI
Comment:
s/RFC 3066/RFC 3066 or its successor/
(Note that 'its successor' has already been approved by the IETF and is just pending publication.)
Resolution: Accepted
Email Trail:
From Richard Ishida (2006-03-21):
Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/
Comment 21
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: S
Location in reviewed document: 4.3, 1st example
Owner: RI
Comment:
How is dc:language="en-US" meant to be interpreted if it appears in a metadata element? How does it affect the xml:lang declaration on PLS elements?
Resolution: Rejected
Email Trail:
From Richard Ishida (2006-03-21):
Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/
Comment 22
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 4.3, 2nd example
Owner: RI
Comment:
It would be helpful to explain why this lexicon, labelled as xml:lang="it" contains English graphemes.
Resolution: Accepted
Email Trail:
From Richard Ishida (2006-03-21):
Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/
Comment 23
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 4.3, 2nd example
Owner: RI
Comment:
Unless there is some particular reason, it is better (and potentially less confusing for the reader) to use "it" rather than"it-IT".
Resolution: Accepted
Email Trail:
From Richard Ishida (2006-03-21):
Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/
Comment 25
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 4.5, 3rd bullet
Owner: RI
Comment:
"Alternate writing systems, e.g. Japanese uses a mixture of Han ideographs (Kanji), and phonemic spelling systems e.g. Katakana or Hiragana for representing the orthography of a word or phrase;"
The fact that Japanese mixes scripts is one thing, but i think the point here is that, for example, one sometimes writes the same word using hiragana and sometimes with kanji, according to preference or circumstance.
A good example might be 'shouyu' (soy sauce), which can be written using either kanji or hiragana: kanji 醤油; hiragana: しょうゆ
[See the comment at http://www.w3.org/International/reviews/0603-pls10/ if non-ASCII characters are corrupted by the mail]
Resolution: Accepted
Email Trail:
From Richard Ishida (2006-03-21):
Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/
Comment 26
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 4.5, 3rd para
Owner: RI
Comment:
"In order to remove the need for duplication of pronunciation information to cope with the above variations, the <lexeme> element may"
Here is an example of where it might be good to distinguish between TTS and ASR. You could say: "In order to remove the need for duplication of pronunciation information to cope with the above variations during text-to-speech, the <lexeme> element may contain"
Resolution: Rejected
Email Trail:
From Richard Ishida (2006-03-21):
Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/
Comment 27
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: S
Location in reviewed document: 4.5
Owner: RI
Comment:
What is the value of the orthography attribute?
We see no value, and its purpose is not expressed in the text.
Resolution: Accepted
Email Trail:
From Richard Ishida (2006-03-21):
Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/
Comment 28
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: S
Location in reviewed document: 4.5, 2nd example
Owner: RI
Comment:
There are a number of problems with the use of the orthography attribute in this example for Japanese:
The kana label is incorrect - it should say hira, since this is hiragana, not katakana.
There is currently no label available for the extremely common form of Japanese words that mixes both kanji and hiragana, eg. 混じる 'to mix' (contains one kanji and two hiragana characters).
Is nɪhɒŋɒ an accurate phonemic/phonetic transcription?
[See the comment at http://www.w3.org/International/reviews/0603-pls10/ if non-ASCII characters are corrupted by the mail]
Resolution: Accepted
Email Trail:
From Richard Ishida (2006-03-21):
Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/
Comment 29
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 4.7
Owner: RI
Comment:
What does transformation mean? Is it the first example, W3C? If so, please clarify briefly.
Resolution: Accepted
Email Trail:
From Richard Ishida (2006-03-21):
Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/
Comment 31
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 5.3
Owner: RI
Comment:
We don't see any value in the additional examples in 5.3, since all examples are instances of homographs or homophones (or expansions, which are not referred to here). Why not skip this and go straight into 5.4?
Resolution: Accepted (w/modifications)
Email Trail:
From Richard Ishida (2006-03-21):
Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/
Comment 32
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 5.4
Owner: RI
Comment:
I think the Smyth example just confuses things at the beginning of the section and in the example. It is an example of something that is both a homograph and homophone at the same time - for which there appears to be no good solution. I would just add a reference to the fact that such things exist after the example in 5.4, and perhaps use one of the examples in 5.3 rather than the Smyth one.
Resolution: Accepted (w/modifications)
Email Trail:
From Richard Ishida (2006-03-21):
Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/
Comment 33
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 5.4, 2nd para
Owner: RI
Comment:
"Pronunciations are explicitly bound to one or more orthographies within a <lexeme> element so homophones are easy to handle. See the following examples:"
This should say, "homophones are easy to handle for text-to-speech". They are not easy to handle in an ASR context, and there should be an informative note here like in 5.5, but referring to ASR rather than TTS!
Resolution: Accepted (w/modifications)
Email Trail:
From Richard Ishida (2006-03-21):
Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/
Comment 34
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 5.5 example
Owner: RI
Comment:
Shouldn't the second 'refuse' be pronounced with a short e and a non-lengthened u and final z? (Note also that the comment is superfluous.)
There are other instances where the phonemic transcription seems strange (eg. use of 'e'). Please have them checked by phoneticians who are familiar with the languages.
Resolution: Accepted
Email Trail:
From Richard Ishida (2006-03-21):
Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/
Comment 35
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E
Location in reviewed document: 5.5
Owner: RI
Comment:
This whole section seems strangely biased.
"In both cases the processor will not be able to distinguish when to apply the first or the second transcription."
The above statement only applies for the text-to-speech author. For ASR, this is a perfectly valid approach, and there solution will cause no problems.
"the current version of specification is not able to instruct the PLS processor how to distinguish the two pronunciations "should read "the current version of specification is not able to instruct the PLS processor *performing text-to-speech* how to distinguish the two pronunciations".
Resolution: Accepted
Email Trail:
From Janina Sajka (2006-04-26):
On behalf of the WAI Protocols and Formats Working Group action:
http://www.w3.org/2006/03/29-pf-minutes.html#action01
PF supports the use of pronunciation lexicons because they have proven effective mechanisms to support accessibility for persons with disabilities as well as greater usability for all users. We support:
http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/
However, we would like to see pronunciation lexicons adopted more widely across the multimodal web as an available mechanism for any textual content that might be rendered through TTS by some user agent. We should not, in other words, concieve of this mechanism only in terms of voice browsers. It is not difficult to imagine how user agents might voice more than just SSML and SRGS markup. Indeed, this is already the case for persons who are blind using screen readers.
Screen readers have provided pronunciation lexicons for several decades now because correct pronunciation is a simple, highly effective mechanism for advancing comprehension. A W3C defined mechanism could do the same for web content and allow content providers a standard mechanism to insure domain specific terms will be correctly rendered by TTS engines where they otherwise would not have been correctly rendered. This mechanism could be used to pronounce names correctly (like mine), including geographic variants (like the capitol city of the U.S. State of South Dakota). Other examples abound.
Resolution: Accepted
Email Trail:
From Deborah Dahl (2006-05-15):
The W3C Multimodal Interaction Working Group has reviewed the Pronunciation Lexicon Specification [1] and has prepared the following comments. These are not specific requests for changes, but comments on how the PLS might fit into multimodal applications.
Usually PLS is used within a voice modality component and is not exposed as its own modality component. However, a TTS component (e.g. a prompt) might want to expose PLS events. For example, loading a lexicon module, or when a specific pronunciation is unavailable. These events would be generated by the modality component that interprets and applies the PLS.
PLS might be useful for spelling correction as part of a multimodal application, but this isn't seen as an important use case. So for most purposes, PLS is transparent to MMI.
Pronunciation lexicons might be exposed through the Delivery Context Interfaces (DCI)[2]. In principle, you could use this to set the default lexicon and other configuration properties. The DCI models properties as a hierarchy of DOM nodes and could be used to expose capabilities and the means to adjust the corresponding properties, e.g. which languages are supported by the speech synthesiser, the default pitch, rate and many other properties.
Otherwise, no specific comments.
[1] http://www.w3.org/TR/pronunciation-lexicon/
[2] http://www.w3.org/TR/2005/WD-DPF-20051111/
Resolution: Deferred
Email Trail:
From Deborah Dahl (2006-05-15):
Current synthesizers are weak with respect to contextualized pronunciations and it is desirable that PLS provide a convenient means for application developers to work around that, i.e. more convenient than providing explicit pronunciations in SSML for each occurrence of a word that would otherwise be mispronounced.
Resolution: Accepted
Email Trail:
From Richard Ishida (2006-03-21):
Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/
Comment 4
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: S
Location in reviewed document: Schema
Owner: FS
Comment:
The description of the schema in the text does not match the schema itself in various places.
For (one) example: the text defines a sequence of the meta element, the metadata element, and a sequence of lexeme elements:meta.elt.type*, metadata.elt.type*, lexeme.elt.type* but the schema says (lexeme.elt.type | meta.elt.type | metadata.elt.type)*
Resolution: Accepted
Email Trail:
From Richard Ishida (2006-03-21):
Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/
Comment 6
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: E/S
Location in reviewed document: General
Owner: RI
Comment:
We have not seen the IPA Handbook, so cannot verify, but the examples in the spec use an apostrophe for primary word stress and a colon for vowel lengthening (eg. 5.1 example), whereas there are ipa characters for this, ˈ and ː.
eg. Newton is transcribed 'nju:tən rather than ˈnjuːtən
Section 2 does not mention alternate forms. Are the examples correct?
[See the comment at http://www.w3.org/International/reviews/0603-pls10/ if non-ASCII characters are corrupted by the mail]
Resolution: Accepted
Email Trail:
From Richard Ishida (2006-03-21):
Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/
Comment 24
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: S
Location in reviewed document: 4.5
Owner: RI
Comment:
In the glossary of terms you define 'grapheme' as "One of the set of the smallest units of a written language, such as letters, ideograms, or symbols, that distinguish one word from another; a representation of a single orthographic element." but then you use it as an element name to label content that almost always involves a *sequence* of graphemes.
Please find a better name for the element. How about 'text' or 'phrase' ?
Resolution: Rejected
Email Trail:
From Paul Bagshaw (2006-02-03):
Comments made below refer to the 31 January 2006 publication of the PLS last call working draft: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/
They are presented here for your consideration and will be openly discussed during (and probably for some time to come) the WG teleconference scheduled for 9 February.
The principal comment is related to the way homographs are handled by PLS. The homographs problem has already been subject to some discussion (see http://lists.w3.org/Archives/Member/w3c-voice-wg/2005Sep/0124.html and http://lists.w3.org/Archives/Member/w3c-voice-wg/2005Nov/0005.html) and has been deferred up to the LCWD. I propose that it is now necessary to address this problem before moving on to the next stage and hope that the comments below will initiate discussion as to how to resolve the problem.
With regards,
Paul Bagshaw
1. The homograph (heterophone) problem.
PLS 1.0 aims to address only the most important aspects of the requirements document (http://www.w3.org/TR/lexicon-reqs/).
* Section 4.9.2 of the LCWD stipulates:
"If more than one <lexeme> contains the same <grapheme>, all their pronunciations will be collected in document order and a TTS processor must use the first one in document order that has the prefer attribute set to "true". If none of the pronunciations has prefer set to "true", the TTS processor must use the first one in document order."
The requirement 4.2 classes handling of homophones (heterographs) as MUST HAVE (for ASR), but in contrary, requirement 4.4 for handling homographs (heterophones) is classed only as NICE TO HAVE (for TTS), and has thus not been considered as essential to the LCWD. It’s a shame that handling homographs is not also classed as MUST HAVE. In its current status, PLS just won’t be used for applications exploiting TTS where homographs can occur. Many, if not ALL, applications for many languages depend on homograph disambiguation. An application MUST HAVE a means of indexing unambiguously every pronunciation in the dictionary. It is not possible in the current version of the PLS proposal.
It must be possible to associate some additional information (other than the lexeme orthography, <grapheme>) with each pronunciation.
For example, in a simple case, associating a grammatical category to a particular pronunciation in a lexeme is sufficient to distinguish ‘does’ (verb, to do) from ‘does’ (noun, plural of doe). Consider the more complex case of reading an address book full of proper nouns (place and people names) in which the pronunciation of a person’s name depends upon the area from which they come (in the same country speaking the same language – yes, it happens at least in French where final consonants may be pronounced for names originating from the west and south of France, but not elsewhere in the country). The application may have knowledge of the origin of the request for information and instruct the TTS to reply with an according pronunciation. Note that this second example is independent of part-of-speech tags (or grammatical categories) and sentence semantics.
The nature of the additional information is open-ended and subject to (too) much discussion (semantics, part-of-speech tags) since there is no standard representation (there’s no universal set of multilingual grammatical categories, for example, and there never will be since there is no universal grammar). The information required can also be application dependant (as illustrated above).
Proposition 1: add an interpret-as attribute to the <phoneme> and <alias> elements.
The problem with having multiple interpretations for a given orthography is equally addressed in the SSML <say-as> element. The proposition here is therefore to add the ‘interpret-as’ attribute with the same values as those in the SSML <say-as> element. <say-as interpret-as=”noun” does> could thus be used to index the lexeme in:
<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
alphabet="ipa" xml:lang="en-US">
<lexeme >
<grapheme>does</grapheme>
<phoneme interpret-as="verb">dez</phoneme>
<example>He does not like it.</example>
<phoneme interpret-as="noun">dowz</phoneme>
<example>The does hide behind the trees.</example>
</lexeme>
</lexicon>
(sorry if the IPA phonemes are inexact)
The value of the ‘interpret-as’ attribute in the PLS element must exactly match that of the SSML <say-as> ‘interpret-as’ attribute when it is to be rendered by a TTS system.
The secondary consequences of this proposition are: 1 the editor of the SSML and PLS files controls the content of the interpret-as values, 2 any future standardisation of SSML interpret-as values can be tied in with PLS.
There is an analogy to this proposed attribute in the <grapheme> element; the ‘orthography’ attribute associates additional information with the <grapheme> content.
Resolution: Accepted
Email Trail:
From Paul Bagshaw (2006-02-03):
2. The homophone (heterograph) problem
* Section 5.4 of the requirements document refers to “pronunciation preference” and has been successfully accommodated for in the PLS by the ‘prefer’ attribute in <phoneme> and <alias> elements. However, ASR currently has no means of indexing a unique orthography from a particular pronunciation. The following requirement is surprisingly not present:
The pronunciation lexicon markup must enable indication of which orthography is the preferred form for use by speech recognition where there are multiple orthographies for a lexicon entry. The pronunciation lexicon markup must define the default selection behaviour for the situations where there are multiple orthographies but no indicated preference.
If PLS is to be used equitably in ASR and TTS environments, then functionality available for grapheme to phoneme mapping should equally be made available for phoneme to grapheme mapping (and visa versa).
Proposition 2: add a prefer attribute to the <grapheme> element.
For example, spelling variations could thus be marked with a preference for dictation applications.
<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
alphabet="ipa" xml:lang="en-US">
<lexeme>
<grapheme prefer="true">theater</grapheme>
<grapheme>theatre</grapheme>
<phoneme>'θɪətər</phoneme>
<!-- IPA string is: "θɪətər" -->
</lexeme>
</lexicon>
Resolution: Deferred
Email Trail:
From Al Gilman (2006-03-15):
1. Provide better discrimination in determining pronunciation preference.
The specification provides for one, static 'preferred' pronunciation [2] for a lexeme, which may have multiple graphemes associated with it but none of them are at all aware of markup in the SRGS or SSML documents that are being processed.
[2] http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/#S4.6
1.1 problems with this situation
This limitation, which means that homographs cannot be given any sort of pronunciation selectivity, should not be accepted.
[3] http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/#S5.5
a. It defeats the use of the Pronunciation Lexicon Specification in the production to audio media of talking books [4]. This is an important use case for access to information by people with disabilities, print disabilities in this case.
[4] http://lists.w3.org/Archives/Public/www-voice/2001JanMar/0020.html
b. It defeats the intended interoperability of lexicons between ASR and TTS functions [5]. lexicons will serve ASR best with lots of pronunciations, and TTS best with few, unless the many pronunciations can be marked up as to when to use which.
[5] http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/#S1.3
c. It fails to interoperate with the intelligence already in SSML in the say-as element [6].
[6] http://www.w3.org/TR/2005/NOTE-ssml-sayas-20050526/
While many functional limitations have been incorporated in the Voice Browser specifications in order to reach a platform of well-supported common markup, it does not seem to make sense to have say-as capability in SSML with QName ability to indicate terms defined outside the corpus of Voice Browser specifications, and not use this information in determining which pronunciation is preferred when.
1.2 opportunities to do better
As suggested above, there would seem to be a ready path to resolving homographs and other preferred-pronunciation subtleties by use of the say-as element and its interpret-as attribute in SSML to distinguish cases where the preferred pronunciation was one way or another.
1.2.1 Allow markup in <grapheme>
One way to do this would be to allow <say-as> markup inside the <grapheme> element wrapping the plain text of the token being pronounced.
1.2.2 XPath selectors
A second, probably better way, would be to use XPath selectors to distinguish the cases where one pronunciation is preferred as opposed to another. This markup would closely resemble the use of XPath selectors in DISelect [7].
[7] http://www.w3.org/2001/di/Group/di-selection/
In either case, the value of ssml:say-as.interpret-as could be used as a discriminant in choosing preferred pronunciations. This value in turn can, as a best practice, be reliably tied to semantic information which is precise enough to assure a single appropriate pronunciation.
There are more complicated approaches that could be integrated using SPARQL queries of the <metadata> contents, but a little XPath processing of guard expressions is so readily achievable that it is hard to believe something should not be done to afford this capability.
The QName value of this attribute allows plenty of extension room to create unique keys for the proper names of individual people, along with the ability to refer to WordNet nodes or dictionary entries for pronunciation variants of homographs.
Resolution: Accepted
Email Trail:
From Richard Ishida (2006-03-21):
Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/
Comment 1
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: S
Location in reviewed document: Many
Owner: RI
Comment:
(Not clear whether this is a question about PLS or SSML.) Is it possible to choose a pronunciation dictionary on the basis of language? For example, in the case of
<?xml version="1.0" encoding="UTF-8"?><speak version="1.0"
xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
'Chat' in English refers to a conversation, but '<s xml:lang="fr">chat</s>' in French is the word for 'cat'.
</speak>
If not, it would not be possible to distinguish between the two instances of 'chat' correctly.
It seems that lexicons are expected to be in one language or another
Resolution: Rejected
Email Trail:
From Richard Ishida (2006-03-21):
Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/
Comment 30
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: S
Location in reviewed document: 4.8
Owner: RI
Comment:
If the example element can contain only text, it will not be possible to apply directional markup to bidirectional text. Since this text can be harvested for reading elsewhere, we propose that you allow, as a minimum a span-like element within the example element that can support a dir=ltr|rtl|lro|rlo attribute to handle bidirectional text.
You could also allow xml:lang on the span-like element for language markup.
Resolution: Rejected
Email Trail:
From Richard Ishida (2006-03-21):
Comment from the i18n review of: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/
Comment 36
At http://www.w3.org/International/reviews/0603-pls10/
Editorial/substantive: S
Location in reviewed document: General
Owner: RI
Comment:
The problem of homographs for TTS and homophones for ASR seems very limiting.
It might be possible to alleviate the problem of homographs for TTS by altering the SSML text, so that tokens are unique, but that would damage portability of the PLS, and, more importantly, cause problems for the use of the same PLS for ASR.
Would it not be possible to tag tokens in SSML so with 'variant ids' using attributes that could be matched to ids in the PLS as a way of uniquely matching homographs to pronunciations?
Resolution: Accepted
Email Trail: