1.  W3C organization by Max

a)         How does W3C work?

                         i.              A member organization

                       ii.              About 50 working groups.

                      iii.              A well-defined and efficient work process.

b)        The voice browser working group

                         i.              88 participants in 33 organizations.

c)        Track

                         i.              Working Draft ->LCWD->Candidate Recommendation->PR->Recommendation

d)        Intellectual Property

                         i.              A clear patent policy

                       ii.              Open and Royalty Free Standards

                      iii.              Unique in the standards world.

e)         Join W3C and help us build.

                         i.              W3.org/2005/talks/11-maxf-w3c

2.  Understanding say-as: chief editor, SSML

a)         Background on this confusing feature of the language.

b)        Guiding principles of SSML

                         i.              Convenient annotation of existing text for audio rendering.

                       ii.              Control at all levels, from text structure and normalization to prosodic control and even voice characteristics.

                      iii.              Limited critical error conditions-“rendering must go on”.

c)        Guiding principles of say-as

                         i.              Primary purpose of say-as to be able to correctly interpret text commonly written in human-readable documents.

                       ii.              “Intended for when the processor has insufficient context to interpret ambiguous text”.

                      iii.              Interpretation, not rendering.

d)        Interpretation, not rendering.

                         i.              Pronounce the contained text.

e)         Types limited in behavior.

                         i.              Say-as Note type (“interpret-as” value) inclusion criteria.

                       ii.              Difficult to write the orthography for by hand.

f)         ‘Data’, ‘Time’,

g)        Why did you remove type blah? Or why is there no type foo?

                         i.              Either the use case for it was about rendering rather than interpretation, or there was doubt on its importance.

h)        Summary, Conclusions, & 2 Questions.

                         i.              What is the best way to accomplish semantic category-based rendering control.

3.  Pronunciation Lexicon

a)         Standard way: IPA

b)        Other alphabet: (They should be a standard)

                         i.              SAMPA (No standard till now)

                       ii.              Pying, JEITA, etc.

c)        The current PLS in monolingual.

d)        The PLS language -<lexeme>

                         i.              The <lexeme> element is the container of a lexicon entry. It is composed of.

e)         The PLS language-<grapheme>

                         i.              Different style with same pronunciation

f)         The PLS language-<phoneme>

                         i.              Can change alphabet.

g)        The PLS language-<alias>

                         i.              Especially useful for ASR.

h)        Use cases/Future Issues

                         i.              Multiple pronunciations for ASR

                       ii.              Homographs

                      iii.              Abbreviations

i)          But it can not deal with:

                         i.              Homophones

                       ii.              Part of speech annotations ( and other contextual information.

j)          Quick demo of SSML + PLS

                         i.              GPS

k)        Standard lexicons?

4.  Why internationalizing SSML?

a)         Global users of the Web.

                         i.              The web is not only for English-native people but also everyone in the world.

b)        Extension of SSML ability.

                         i.              Enhancement for non-

c)        Problem to be solved: Pronunciations ambiguity.

d)        Prosodic Controls

                         i.              Text Analysis: <p>,<s>, <say-as>, <lexicon>, <phoneme>

                       ii.              Prosody Analysis: duration and speech rate.

                      iii.              Fundamental frequency transition. <prosody> <emphasis> <break>

e)         Goals & Scope of the workshop

                         i.              To identify and prioritize extensions and additions to SSML.

5.  Session 2:

a)         Polish Telecom

                         i.              The nature of the problem

1.         Diacritics: sometimes called an accent mark, is a mark added to a letter to alter a word’s pronunciation or to distinguish between similar words.

2.         Example: Polish letters with diacritics.

a)         35 letters = 26 basic + 9 with diacritics

3.         Different pronunciation with diacritics.

4.         Why Polish Diacritics sometimes disappear?

a)         No possibility to obtain while typing.

b)        5 times pressing key to input one diacritics

5.         quasi-Polish text(without diacritics)

a)         Sometimes it is the only possibility to represent text.

                       ii.              Similarities among other languages

1.         Other languages:

a)         Czech, Slovak

b)        German

c)        Russian

d)        French, etc.

                      iii.              Possible solutions

1.         How to solve the problem?

a)         A new dialect?

b)        An alternative spelling (context dependent orthography)?

c)        An erroneous text that requires correction (jargon)?

2.         TTS solve it or External lexicons

                     iv.              Discussion.

1.         Instant message: invented words & phrases.

2.         Reduced character or Different character set:

a)         zh_CN.

                                                                   i.              Simplified Chinese: Zh_Hans  

                                                                 ii.              Traiditonal Chinses: zh_Hant

                                                                iii.              Chinese Romanization: zh_latin

3.         To solve the problem

a)         We can use “slan’ce’” to describe diacritics.

b)        Jargon or broken

                                                                   i.              Jargon: no lose of information.

                                                                 ii.              Broken Text: lose information.

4.         May we have a possibility of freely choosing components from different vendors.

6.  Session 3:

a)         An Introduction to S3ML

                         i.              Background

1.         SSML & SinoVoice

2.         Pinyin in Phoneme attribute.

3.         <say-as> Definition

a)         name, address, math, net

4.         Domain Support

a)         <voice domain = “”> element.

b)        <domain name = “”> element

c)        Some jargon in the domain.

b)        Chinese University of Hong Kong

                         i.              Charactersitics of Chinese

1.         Rich in dialects:

2.         No explicit phrase and word boundaries.

3.         Monosyllablic and tonal.

                       ii.              Proposed attributes for existing elements

1.         “dialect-accent”

                      iii.              Proposed elements

1.         <phrase> and <word>

a)         If we know the <word> boundary, there won’t be homograph.

2.         <tone>

                     iv.              Proposed attribute values

1.         Chinese-name

2.         fraction

3.         measure

4.         net

5.         percentage

6.         ratio

c)         Discussion

                        i.              Which do you believe to be particular necessary for Chinese.

1.         Chinese name: difficult to distinguish from other character.

2.         URL is distinguishable from Chinese Characters.

7.  Session 4

a)         Iflytek company

                         i.              Pinyin is widely used in China.

                       ii.              Words composed of English letters, we need to separate the Pinyin and English words.

1.         English words: James, New York

2.         Pinyin words: Anhui, Hefei, Jiang Zemin

                      iii.              Segmentation of Chinese Word

1.         Word and Phrase element

2.         What is definition of word and phrase?

                     iv.              Using background music.

1.         <environment repeat = “yes” src = “1.wav”> Text…<environment>

                       v.              CSSML:

1.         iFLYTEK setup the enterprise standard CSSML in 2002.

2.         Since 2003, the CSSML has been supported by iFLYTEK products.

3.         CSSML was voted as a candidate of national standard.

4.         Is there any other company support CSSML? No.

5.         Is there any intelligent patent for CSSML? No.

b)        IBM

                         i.              Chinese Romanization for Chinese Voice Browsing.

c)        Toshiba

                         i.              Tone: (Special Attribute in phoneme element)

                       ii.              Pinyin:

                      iii.              Word boundary:

1.         misunderstanding:

a)         上海是个 大都会:Shanghai is a metropolitican

b)        上海人 大都 会:

                                                                   i.              <w detail = ‘3’>上海人大都会</w>

d)        Panasonic

                         i.              Character Pronunciations*

1.         pronunciation

2.         Part of speech: POS

                       ii.              Word/Phrase Boundaries*

1.         L0: syllable boundary

2.         L1: prosodic word boundary

3.         L2: minor phrase boundary

4.         L3: major phrase boundary

                      iii.              Dialect

<p xml:lang = “zh-cn” ssml:lang2 = “cn-sc”> <p>

                     iv.              Sound Effect

1.         <prosody post-filter = “some-filter”></prosody>

                       v.              Speaking Style

1.         <p ssml:prosody-template = “#1”>***</p>

                     vi.              Macro

1.         <macro name = “date”>2005/10/20</macro>

                    vii.              Say-as Extension:

1.         Translation:

a)         if you can synthesize foreign language, then no need to translation

2.         substitution

a)         Should allow multiple choice.

e)         Discussion

                         i.              Sentence Structure & Word Boundary

1.         Paragraph->Sentence->Phrase->Word

2.         Paragraph->Sentence->L0,L1,L2,L3

3.         Paragraph->Sentence->W (word)

4.         Japanese : morpheme, POS is useful for the pronunciation

5.         Korean: have space separate word.

                      ii.              Discussion

1.         Token

a)        Morpheme Dictionary

b)        Word boundary

2.         Part of Speech

3.         Phrase marking in general for all language: not

8.  Session 5

a)         JEITA

                         i.              JEITA Speech Group:

1.         Expert Committee on Speech Input/Output

2.         First version: JEITA-62-2000, Revised version: JEITA-IT-4002

                       ii.              Japanese Pronunciation in phoneme element:

1.         “x-JEITA-IT-4002-kana”, etc.

                      iii.              How to specify speaking rate in Japanese

1.         A basic unit in Japanese in Mora“mora” is called “”(haku) in Japanese.

a)         ko N ni chi wa -> 5 moras

b)        sya si n -> 3 moras

c)        sya sin -> 2 moras

2.         So mora is fit for Japanese speaking rate.

a)         Japanese can specify: 4.5 mora per second.

b)        mora can be used to indicate break length.

c)        Chinese can specify: ** syllables per minute.

                     iv.              Ruby element

1.         Pronunciation Annotation: 今日 kyowa (may be wrong)

a)         There is a “Ruby Annotation-W3C Recommendation 32 May 2001”.

b)        But the proposed one is simpler and enough.

2.         Can this be covered by phoneme?

a)         Is there any standard for Japanese Pronunciation Annotation?

                       v.              Expansion of an say-as element

1.         Interpret as “wago” : some kind of date format in Japan

a)         Several kind of date formats in Japan.

b)        Korea

                         i.              Hanguil: Chinese characters in Korea

1.         Pronunciations in China, Korea, Japanese are different.

2.         Korean people basically in Korean way, but sometimes in Chinese way or Japanese way.

                       ii.              Chinese Characters in Korean

1.         Chinese characters can be used.

2.         2000 Chinese characters are frequency used Chinese characters.

                      iii.              Annotations: