Session 4: Representing word boundaries --------- 14:00-14:20 iFLYTEK: Overview of Chinese Speech Synthesis Markup Language iFlyTek is the only company that supports CSSML today. They intend to update CSSML as SSML changes. 14:20-14:40 IBM: Chinese Romanization for Chinese Voice Browsing 14:40-15:00 Toshiba: Suggestions on Tone and Word Boundary of Mandarin for SSML 15:30-15:50 Panasonic: Position Paper of Panasonic Beijing Laboratory for W3C Workshop on Internationalizing the SSML The presenter clarified that the translation value of the interpret-as attribute of say-as would be an optional feature. 15:50-16:20 Focused discussion: How to represent word boundaries? Jim summarized the papers as representing three different approaches: iFlyTek/UofHK Panasonic Toshiba P P P S S S Phrase L0,L1,L2,L3 W (with phrase capability) Word Kazayuki described that Japanese has same issues but needs to understand the morphemes that make up words. Korean has spaces between word phrases. A Korean pronunciation dictionary is morpheme-based, not word-based. You need to know the part of speech. Absolute minimum is word boundary. Korean may need something else -- still not clear what yet. The dictionary would be too large if they used words for lookup rather than morphemes. If we change "word" to "token" this would work for everyone. Paolo suggests a workshop in Turkey or somewhere else that may present new word segmentation challenges.