Session 9: Text with multiple languages 

 

 

Presentation: SSML Extensions Aimed to Improve Asian languages TTS Rendering 

 

 

It is natural to use syllables or tonal syllables as the basic unit of a TTS system 

 

Only retroflex uses two characters for one syllable. Otherwise a character corresponds to a syllable.  

 

We need to find a word to refer to either a syllable or a phoneme.  This will unify phonetic and syllabic languages.

 

Needed: multilingual extensions to SSML 

 

An interesting example “MP3”  where the “MP” is read in English, but the “3” is read in Chinese

 

The presented example uses  token, word, pos, syllable, phonetic, duration, frequency elements.

 

The example could be used in a Lexicon. 

 

Now we discuss how to deal with multiple languages in the same text. 

 

Given a string of text with a foreign word, we need to (a) identify the language of the foreign word, and (b) explain how to render the word (either in the foreign language or in the local language.  

 

“Language switching” can occur in the middle of a sentence.  There is a “dominate” language and a “code switched language” language.  The code switch language may be multiple token, a single token, or (as in the case of MP3) part of a token.  

 

A more interesting example:  I said ##### to him.  (where #### is a string of Japanese characters).  We need to use xml:lang = “ja” for the #### and xml:lang = “en” for the English part.  

 

We need a language output identifier (which may be different from Japanese) 

 

You come across (1) an entirely different language, (2) a language that you might no, or (2) a language that the TTS can attempt to pronounce. 

 

If the target language appears only in a limited way, we can place it into the lexicon.  Otherwise we may need:

 

Matrix        language        target language

Script                        script

Render                        render

 

Just insert a « another language » mark. 

 

Summary:  This is a hard problem.  We want to separate scripting and rendering, but it’s not clear that this is necessary.  We have a lot of things to work out.