Session 8 - Words with multiple pronunciations and meanings Presentation: Du-Seong Chang (Korea Telecom) Usage of Pos for Resolving Multiple Pronunciations in SSML [proposing a POS attribute on the phoneme element (or POS element) in SSML and PLS] Jim: what are the parts of speech, how big is the set Korea Telecom: depends on company, phonoligist. We're using 30-40 categories in our running system. Jim: can your system be used in any languages? KT: yes, think so. Loquendo: penn was mentioned. But all of them depend on the goal of the POS. There can be up to 300. On exemples we use verbs, nouns, 5 or 6. But there are many more. Problem is that it's simple in principle, but difficult in practice. Each single engine has its own set of POS. Even linguists don't agree on putting things together. I don't see a way of standardising without the community complaining. Even a restrictive set. An attribute where each engine puts any value would be ok, but not interoperable. KT: not about standardisation of pos, about using pos information. Jim: we have an example of a system, using 30-40 pos. It appears to work well. Loquendo: one in korea. Another korean engine will be different. Simple like "verb" is not enough. All systems use POS information, internally, which they detect. Jim: too early to standardise it? Loquendo: either we define pos and not standardise content. This will open the standardisation. But I don't believe that that standardisation will work. Nokia: I agree that it might be too early, but I wanted to mention LC-STAR, EU project to make lexicons for SS,ASR,AST. They made lexicons and they added information for POS, nouns (including names, personal names, etc.). Because they made so many. They made a compromise, with projects LC-STAR, TC-STAR. Maybe it's early for PLS, but inspiration (note: also ECESS project) Vocalocity: language independent Nokia: big set and languages pick a subset Vocalocity: so it's like IPA. Something very very general. Loquendo: which took 50 years... Vocalocity: as long as all would map to one particular syntax, that might work Paolo: maybe. But different from phoneme, because POS is a keyword, symbols. Depends on level of detail you want inside. Vocalocity: not necessarily a requirement to define the meaning of POS items. Paolo: that's another problem. It's like word splitting, it's semantic tagging. Which is still under developments. iFlyTek: we use POS inside the system, but we don't have requirements for markup SinoVoice: we use POS to form prosody structure. If included in SSML, it can be helpful to remove ambiguities. IBM: we have categories and subcats: 40-50 POS values. How to define POS depends on how your system works. The purpose of POS is for the quality of the synthesis. In our system, POS is defined one particular way, but others will have to do it differentky. Polish Telecom: Personally, I don't think we should advise the TTS if there are ambiguities, everything should come from context. Missing diacritics is different and needs more information. TTS engines should follow the behaviour of humans. AI algorithms and text processing are better. Also, if we write SSML docs, we already have control how to descbire one word. e.g. element. With freeform text, you can't tag it and you need to infer. JEITA: we use POS internally to the engine. Japanese TTS cannot resolve multiple pronunciation. * SUMMARY: We note that synthesis processors can make use of POS information. However, there is disagreement on whether users of SSML and PLS should be able to provide this information to the processor (as opposed to the processor determining this information for itself. There may not be enough industry consensus to standardize what POS information to represent and how to represent it.