Session 9: Text with multiple languages

Presentation: SSML Extensions Aimed to Improve Asian languages TTS Rendering

It is natural to use syllables or tonal syllables as the basic unit of a TTS system

Only retroflex uses two characters for one syllable. Otherwise a character corresponds to a syllable.

We need to find a word to refer to either a syllable or a phoneme. This will unify phonetic and syllabic languages.

Needed: multilingual extensions to SSML

An interesting example “MP3” where the “MP” is read in English, but the “3” is read in Chinese

The presented example uses token, word, pos, syllable, phonetic, duration, frequency elements.

The example could be used in a Lexicon.

Now we discuss how to deal with multiple languages in the same text.

Given a string of text with a foreign word, we need to (a) identify the language of the foreign word, and (b) explain how to render the word (either in the foreign language or in the local language.

“Language switching” can occur in the middle of a sentence. There is a “dominate” language and a “code switched language” language. The code switch language may be multiple token, a single token, or (as in the case of MP3) part of a token.

A more interesting example: I said ##### to him. (where #### is a string of Japanese characters). We need to use xml:lang = “ja” for the #### and xml:lang = “en” for the English part.

We need a language output identifier (which may be different from Japanese)

You come across (1) an entirely different language, (2) a language that you might no, or (2) a language that the TTS can attempt to pronounce.

If the target language appears only in a limited way, we can place it into the lexicon. Otherwise we may need:

Matrix language target language

Script script

Render render

Just insert a « another language » mark.

Summary: This is a hard problem. We want to separate scripting and rendering, but it’s not clear that this is necessary. We have a lot of things to work out.