Session 9: Text with multiple languages
Presentation: SSML Extensions Aimed to Improve Asian languages TTS Rendering
It is natural to use syllables or tonal syllables as the basic unit of a TTS system
Only retroflex uses two characters for one syllable. Otherwise a character corresponds to a syllable.
We need to find a word to refer to either a syllable or a phoneme. This will unify phonetic and syllabic languages.
Needed: multilingual extensions to SSML
An interesting example “MP3” where the “MP” is read in English, but the “3” is read in Chinese
The presented example uses token, word, pos, syllable, phonetic, duration, frequency elements.
The example could be used in a Lexicon.
Now we discuss how to deal with multiple languages in the same text.
Given a string of text with a foreign word, we need to (a) identify the language of the foreign word, and (b) explain how to render the word (either in the foreign language or in the local language.
“Language switching” can occur in the middle of a sentence. There is a “dominate” language and a “code switched language” language. The code switch language may be multiple token, a single token, or (as in the case of MP3) part of a token.
A more interesting example: I said ##### to him. (where #### is a string of Japanese characters). We need to use xml:lang = “ja” for the #### and xml:lang = “en” for the English part.
We need a language output identifier (which may be different from Japanese)
You come across (1) an entirely different language, (2) a language that you might no, or (2) a language that the TTS can attempt to pronounce.
If the target language appears only in a limited way, we can place it into the lexicon. Otherwise we may need:
Matrix language target language
Script script
Render render
Just insert a « another language » mark.
Summary: This is a hard problem. We want to separate scripting and rendering, but it’s not clear that this is necessary. We have a lot of things to work out.