I18n comments on Pronunciation Lexicon Specification (PLS) 1.0

Version reviewed

http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/

Main reviewers

Richard Ishida

Felix Sasaki fsasaki@w3.org

Notes

These are comments on behalf of the I18N Core WG. The Owner column indicates who has been assigned the responsibility of tracking discussions on a given comment.

We recommend that responses to the comments in this table use a separate email for each point. This makes it far easier to track threads.

Comments

ID Location Subject Comment Owner Ed. /
Subs.
Discussion threads
1 Many Possible to choose language specific lexicon?

(Not clear whether this is a question about PLS or SSML.) Is it possible to choose a pronunciation dictionary on the basis of language? For example, in the case of

<?xml version="1.0" encoding="UTF-8"?>
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" 
    xml:lang="en-US">
    
    'Chat' in English refers to a conversation, but 
    '<s xml:lang="fr">chat</s>' in French is the word 
    for 'cat'. 
</speak>

If not, it would not be possible to distinguish between the two instances of 'chat' correctly.

It seems that lexicons are expected to be in one language or another

RI S -
2 Many Use characters rather than escapes in code

All examples show escapes in the code and ipa characters in comments. Please reverse this. It would be fine to say in one place that people could use escapes if it is difficult to type in characters (as you do at the end of 4.1) (though see the suggestion for overcoming that difficulty using a character picker, later). The current approach encourages the use of escapes, and makes the examples difficult to read.

RI E -
3 General embeddable in other formats?

Is a pronounciation lexicon embeddable in other formats, e.g. like MathML in HTML? Please address this question at some place.

FS E -
4 Schema Text out of synch with schema

The description of the schema in the text does not match the schema itself in various places.

For (one) example: the text defines a sequence of the meta element, the metadata element, and a sequence of lexeme elements: meta.elt.type*, metadata.elt.type*, lexeme.elt.type* but the schema says (lexeme.elt.type | meta.elt.type | metadata.elt.type)*

FS S -
5 General encoding=utf8

Thankyou for specifying the encoding on the XML declaration throughout!

RI -
6 General Alternate IPA characters ' and :

We have not seen the IPA Handbook, so cannot verify, but the examples in the spec use an apostrophe for primary word stress and a colon for vowel lengthening (eg. 5.1 example), whereas there are ipa characters for this, ˈ and ː.

eg. Newton is transcribed 'nju:tən rather than ˈnjuːtən

Section 2 does not mention alternate forms. Are the examples correct?

RI E/S
7 General TTS or ASR?

Please make it clearer, throughout the document, when talking about multiple instances of grapheme or phoneme, whether this is useful for speech synthesis or speech recognition.

RI E
8 1.1, example Incorrect accent on italian text

Surely "La vita e' bella" should be "La vita è bella" ?

RI E
9 1.1, example Add lang="it"

We expected an xml:lang attribute around the phrase "La vita e' bella".

RI E -
10 1.1, 2nd example Quotes missing

The quotation marks have been removed in this version of the example. Is that on purpose, or an omission?

RI E -
11 1.1 Show simple PLS doc in intro

It would be nice (though not essential) to include a short and simple PLS document at the end of section 1.1 just to complete the picture for the user. A simple example will probably be easy enough to understand on its own.

RI E -
12 1.1, last para s/then/than/

s/then/than/

RI E -
13 1.5 Definition of orthography

"Example orthographies include Romaji, Kanji, and Hiragana"

Are Romaji, Kanji and Hiragana separate orthographies, or just different scripts in the Japanese orthography? Certainly, although the examples in the spec are usually only one or other of these alphabets per <grapheme>, mixtures are more usual for Japanese text.

RI E
14 2, 2nd para IPAHNDBK alternative ?

Is there an online location that repeats the information in the (hardcopy) IPA handbook? Is it the same information as is found at http://www.arts.gla.ac.uk/ipa/ipachart.html? If so, it might be helpful to include a note pointing to that.

RI E
15 2, 2nd para Point to IPA picker

In addition to IPAUNICODE1 and IPAUNICODE2, please point to the IPA Character Picker. This was recently updated against the information on the IPA home page, and allows people to easily create short strings of IPA text for insertion into their documents. (And will probably also be useful for creating this spec.)

RI E
16 3 Styling for MUST, SHOULD, etc

Please use some markup to clarify the locations of the normative usages of "must", "should", "must not" etc. in the text.

RI E
17 4, table Meta data container element

Description says "Meta data container element"

Description for <meta> is misleading: it is not a container, but empty. For the typical reader, saying it is the same as HTML would be helpful.

May be better to say 'element containing meta data'

RI E -
18 4.1, 2nd para Alphabet on lexicon is default

"which indicates the pronunciation alphabet".

Since the alphabet setting can be overridden on a phoneme element, the text should say "which indicates the default pronunciation alphabet".

RI E
19 4.1, 4th para Why one lexicon per language?

Please clarify why lexicons are separated by language?

RI E
20 4.1, 4th para 3066 or its successor

s/RFC 3066/RFC 3066 or its successor/

(Note that 'its successor' has already been approved by the IETF and is just pending publication.)

RI S
21 4.3, 1st example dc:language

How is dc:language="en-US" meant to be interpreted if it appears in a metadata element? How does it affect the xml:lang declaration on PLS elements?

RI S
22 4.3, 2nd example English in italian lexicon

It would be helpful to explain why this lexicon, labelled as xml:lang="it" contains English graphemes.

RI E
23 4.3, 2nd example it-IT

Unless there is some particular reason, it is better (and potentially less confusing for the reader) to use "it" rather than "it-IT".

RI E
24 4.5 Name of grapheme element

In the glossary of terms you define 'grapheme' as "One of the set of the smallest units of a written language, such as letters, ideograms, or symbols, that distinguish one word from another; a representation of a single orthographic element." but then you use it as an element name to label content that almost always involves a *sequence* of graphemes.

Please find a better name for the element. How about 'text' or 'phrase' ?

RI S
25 4.5, 3rd bullet Japanese mixtures

"Alternate writing systems, e.g. Japanese uses a mixture of Han ideographs (Kanji), and phonemic spelling systems e.g. Katakana or Hiragana for representing the orthography of a word or phrase;"

The fact that Japanese mixes scripts is one thing, but i think the point here is that, for example, one sometimes writes the same word using hiragana and sometimes with kanji, according to preference or circumstance.

A good example might be 'shouyu' (soy sauce), which can be written using either kanji or hiragana: kanji 醤油;hiragana: しょうゆ

RI E
26 4.5, 3rd para TTS vs. ASR in 4.5

"In order to remove the need for duplication of pronunciation information to cope with the above variations, the <lexeme> element may"

Here is an example of where it might be good to distinguish between TTS and ASR. You could say: "In order to remove the need for duplication of pronunciation information to cope with the above variations during text-to-speech, the <lexeme> element may contain"

RI E
27 4.5 Why orthography attribute?

What is the value of the orthography attribute?

We see no value, and its purpose is not expressed in the text.

RI S
28 4.5, 2nd example Problems with 4.5 japanese example

There are a number of problems with the use of the orthography attribute in this example for Japanese:

The kana label is incorrect - it should say hira, since this is hiragana, not katakana.

There is currently no label available for the extremely common form of Japanese words that mixes both kanji and hiragana, eg. 混じる 'to mix' (contains one kanji and two hiragana characters).

Is nɪhɒŋɒ an accurate phonemic/phonetic transcription?

RI S
29 4.7 Transformation ?

What does transformation mean? Is it the first example, W3C? If so, please clarify briefly.

RI E
30 4.8 Sub-elements for example

If the example element can contain only text, it will not be possible to apply directional markup to bidirectional text. Since this text can be harvested for reading elsewhere, we propose that you allow, as a minimum a span-like element within the example element that can support a dir=ltr|rtl|lro|rlo attribute to handle bidirectional text.

You could also allow xml:lang on the span-like element for language markup.

RI S
31 5.3 Section 5.3

We don't see any value in the additional examples in 5.3, since all examples are instances of homographs or homophones (or expansions, which are not referred to here). Why not skip this and go straight into 5.4?

RI E
32 5.4 Smyth

I think the Smyth example just confuses things at the beginning of the section and in the example. It is an example of something that is both a homograph and homophone at the same time - for which there appears to be no good solution. I would just add a reference to the fact that such things exist after the example in 5.4, and perhaps use one of the examples in 5.3 rather than the Smyth one.

RI E
33 5.4, 2nd para Not easy in ASR

"Pronunciations are explicitly bound to one or more orthographies within a <lexeme> element so homophones are easy to handle. See the following examples:"

This should say, "homophones are easy to handle for text-to-speech". They are not easy to handle in an ASR context, and there should be an informative note here like in 5.5, but referring to ASR rather than TTS!

RI E
34 5.5 example Phonemic examples

Shouldn't the second 'refuse' be pronounced with a short e and a non-lengthened u and final z? (Note also that the comment is superfluous.)

There are other instances where the phonemic transcription seems strange (eg. use of 'e'). Please have them checked by phoneticians who are familiar with the languages.

RI E
35 5.5 Bias in 5.5

This whole section seems strangely biased.

"In both cases the processor will not be able to distinguish when to apply the first or the second transcription."

The above statement only applies for the text-to-speech author. For ASR, this is a perfectly valid approach, and the resolution will cause no problems.

"the current version of specification is not able to instruct the PLS processor how to distinguish the two pronunciations" should read "the current version of specification is not able to instruct the PLS processor *performing text-to-speech* how to distinguish the two pronunciations".

RI E
36 General Variant id suggestion for TTS homograph disambiguation

The problem of homographs for TTS and homophones for ASR seems very limiting.

It might be possible to alleviate the problem of homographs for TTS by altering the SSML text, so that tokens are unique, but that would damage portability of the PLS, and, more importantly, cause problems for the use of the same PLS for ASR.

Would it not be possible to tag tokens in SSML so with 'variant ids' using attributes that could be matched to ids in the PLS as a way of uniquely matching homographs to pronunciations?

RI S

Version: $Id: Overview.html,v 1.5 2006/03/21 18:25:26 rishida Exp $