Session 8 - Words with multiple pronunciations and meanings

Presentation: Du-Seong Chang (Korea Telecom) Usage of Pos for Resolving Multiple Pronunciations in SSML
[proposing a POS attribute on the phoneme element (or POS element) in SSML and PLS]

Jim: what are the parts of speech, how big is the set
Korea Telecom: depends on company, phonoligist. We're using 30-40 categories in our running system.
Jim: can your system be used in any languages?
KT: yes, think so.

Loquendo: penn was mentioned. But all of them depend on the goal of
       the POS. There can be up to 300.  On exemples we use verbs,
       nouns, 5 or 6. But there are many more. Problem is that it's
       simple in principle, but difficult in practice. Each single
       engine has its own set of POS. Even linguists don't agree on
       putting things together. I don't see a way of standardising
       without the community complaining. Even a restrictive set. An
       attribute where each engine puts any value would be ok, but not
       interoperable.

KT: not about standardisation of pos, about using pos information.

Jim: we have an example of a system, using 30-40 pos. It appears to work well.

Loquendo: one in korea. Another korean engine will be
	  different. Simple like "verb" is not enough. All systems use
	  POS information, internally, which they detect.

Jim: too early to standardise it?

Loquendo: either we define pos and not standardise content. This will open 
          the standardisation. But I don't believe that that standardisation will work.

Nokia: I agree that it might be too early, but I wanted to mention
       LC-STAR, EU project to make lexicons for SS,ASR,AST. They made
       lexicons and they added information for POS, nouns (including
       names, personal names, etc.). Because they made so many.  They
       made a compromise, with projects LC-STAR, TC-STAR. Maybe it's
       early for PLS, but inspiration (note: also ECESS project)

Vocalocity: language independent

Nokia: big set and languages pick a subset

Vocalocity: so it's like IPA. Something very very general.

Loquendo: which took 50 years...

Vocalocity: as long as all would map to one particular syntax, that might work

Paolo: maybe. But different from phoneme, because POS is a keyword, symbols. Depends on
       level of detail you want inside.

Vocalocity: not necessarily a requirement to define the meaning of POS items.

Paolo: that's another problem. It's like word splitting, it's semantic
       tagging. Which is still under developments.

iFlyTek: we use POS inside the system, but we don't have requirements for markup

SinoVoice: we use POS to form prosody structure. If included in SSML, it can be helpful
	   to remove ambiguities.

IBM: we have categories and subcats: 40-50 POS values. How to define POS depends on how your
     system works. The purpose of POS is for the quality of the synthesis. In our system, 
     POS is defined one particular way, but others will have to do it differentky.

Polish Telecom: Personally, I don't think we should advise the TTS if
    there are ambiguities, everything should come from context.
    Missing diacritics is different and needs more information. TTS
    engines should follow the behaviour of humans. AI algorithms and
    text processing are better. Also, if we write SSML docs, we
    already have control how to descbire one word. e.g. <sub>
    element. With freeform text, you can't tag it and you need to
    infer.

JEITA: we use POS internally to the engine. Japanese TTS cannot
	  resolve multiple pronunciation.

* SUMMARY: 

  We note that synthesis processors can make use of POS
  information. However, there is disagreement on whether users of SSML
  and PLS should be able to provide this information to the processor
  (as opposed to the processor determining this information for
  itself. There may not be enough industry consensus to standardize
  what POS information to represent and how to represent it.