1. W3C organization by Max
a) How does W3C work?
i. A member organization
ii. About 50 working groups.
iii. A well-defined and efficient work process.
b) The voice browser working group
i. 88 participants in 33 organizations.
c) Track
i. Working Draft ->LCWD->Candidate Recommendation->PR->Recommendation
d) Intellectual Property
i. A clear patent policy
ii. Open and Royalty Free Standards
iii. Unique in the standards world.
e) Join W3C and help us build.
i. W3.org/2005/talks/11-maxf-w3c
2. Understanding say-as: chief editor, SSML
a) Background on this confusing feature of the language.
b) Guiding principles of SSML
i. Convenient annotation of existing text for audio rendering.
ii. Control at all levels, from text structure and normalization to prosodic control and even voice characteristics.
iii. Limited critical error conditions-“rendering must go on”.
c) Guiding principles of say-as
i. Primary purpose of say-as to be able to correctly interpret text commonly written in human-readable documents.
ii. “Intended for when the processor has insufficient context to interpret ambiguous text”.
iii. Interpretation, not rendering.
d) Interpretation, not rendering.
i. Pronounce the contained text.
e) Types limited in behavior.
i. Say-as Note type (“interpret-as” value) inclusion criteria.
ii. Difficult to write the orthography for by hand.
f) ‘Data’, ‘Time’,
g) Why did you remove type blah? Or why is there no type foo?
i. Either the use case for it was about rendering rather than interpretation, or there was doubt on its importance.
h) Summary, Conclusions, & 2 Questions.
i. What is the best way to accomplish semantic category-based rendering control.
3. Pronunciation Lexicon
a) Standard way: IPA
b) Other alphabet: (They should be a standard)
i. SAMPA (No standard till now)
ii. Pying, JEITA, etc.
c) The current PLS in monolingual.
d) The PLS language -<lexeme>
i. The <lexeme> element is the container of a lexicon entry. It is composed of.
e) The PLS language-<grapheme>
i. Different style with same pronunciation
f) The PLS language-<phoneme>
i. Can change alphabet.
g) The PLS language-<alias>
i. Especially useful for ASR.
h) Use cases/Future Issues
i. Multiple pronunciations for ASR
ii. Homographs
iii. Abbreviations
i) But it can not deal with:
i. Homophones
ii. Part of speech annotations ( and other contextual information.
j) Quick demo of SSML + PLS
i. GPS
k) Standard lexicons?
4. Why internationalizing SSML?
a) Global users of the Web.
i. The web is not only for English-native people but also everyone in the world.
b) Extension of SSML ability.
i. Enhancement for non-
c) Problem to be solved: Pronunciations ambiguity.
d) Prosodic Controls
i. Text Analysis: <p>,<s>, <say-as>, <lexicon>, <phoneme>
ii. Prosody Analysis: duration and speech rate.
iii. Fundamental frequency transition. <prosody> <emphasis> <break>
e) Goals & Scope of the workshop
i. To identify and prioritize extensions and additions to SSML.
5. Session 2:
a) Polish Telecom
i. The nature of the problem
1. Diacritics: sometimes called an accent mark, is a mark added to a letter to alter a word’s pronunciation or to distinguish between similar words.
2. Example: Polish letters with diacritics.
a) 35 letters = 26 basic + 9 with diacritics
3. Different pronunciation with diacritics.
4. Why Polish Diacritics sometimes disappear?
a) No possibility to obtain while typing.
b) 5 times pressing key to input one diacritics
5. quasi-Polish text(without diacritics)
a) Sometimes it is the only possibility to represent text.
ii. Similarities among other languages
1. Other languages:
a) Czech, Slovak
b) German
c) Russian
d) French, etc.
iii. Possible solutions
1. How to solve the problem?
a) A new dialect?
b) An alternative spelling (context dependent orthography)?
c) An erroneous text that requires correction (jargon)?
2. TTS solve it or External lexicons
iv. Discussion.
1. Instant message: invented words & phrases.
2. Reduced character or Different character set:
a) zh_CN.
i. Simplified Chinese: Zh_Hans
ii. Traiditonal Chinses: zh_Hant
iii. Chinese Romanization: zh_latin
3. To solve the problem
a) We can use “slan’ce’” to describe diacritics.
b) Jargon or broken
i. Jargon: no lose of information.
ii. Broken Text: lose information.
4. May we have a possibility of freely choosing components from different vendors.
6. Session 3:
a) An Introduction to S3ML
i. Background
1. SSML & SinoVoice
2. Pinyin in Phoneme attribute.
3. <say-as> Definition
a) name, address, math, net
4. Domain Support
a) <voice domain = “”> element.
b) <domain name = “”> element
c) Some jargon in the domain.
b)
i. Charactersitics of Chinese
1. Rich in dialects:
2. No explicit phrase and word boundaries.
3. Monosyllablic and tonal.
ii. Proposed attributes for existing elements
1. “dialect-accent”
iii. Proposed elements
1. <phrase> and <word>
a) If we know the <word> boundary, there won’t be homograph.
2. <tone>
iv. Proposed attribute values
1. Chinese-name
2. fraction
3. measure
4. net
5. percentage
6. ratio
c)
Discussion
i.
Which do you believe to be particular necessary for Chinese.
1. Chinese name: difficult to distinguish from other character.
2. URL is distinguishable from Chinese Characters.
7. Session 4
a) Iflytek company
i.
Pinyin is widely used in
ii. Words composed of English letters, we need to separate the Pinyin and English words.
1.
English words:
2.
Pinyin
words: Anhui, Hefei, Jiang Zemin
iii.
Segmentation
of Chinese Word
1.
Word
and Phrase element
2. What is definition of word and phrase?
iv. Using background music.
1. <environment repeat = “yes” src = “1.wav”> Text…<environment>
v. CSSML:
1. iFLYTEK setup the enterprise standard CSSML in 2002.
2. Since 2003, the CSSML has been supported by iFLYTEK products.
3. CSSML was voted as a candidate of national standard.
4. Is there any other company support CSSML? No.
5. Is there any intelligent patent for CSSML? No.
b) IBM
i. Chinese Romanization for Chinese Voice Browsing.
c) Toshiba
i. Tone: (Special Attribute in phoneme element)
ii. Pinyin:
iii. Word boundary:
1. misunderstanding:
a) 上海是个 大都会:Shanghai is a metropolitican
b) 上海人 大都 会:
i. <w detail = ‘3’>上海人大都会</w>
d) Panasonic
i. Character Pronunciations*
1. pronunciation
2. Part of speech: POS
ii. Word/Phrase Boundaries*
1. L0: syllable boundary
2. L1: prosodic word boundary
3. L2: minor phrase boundary
4. L3: major phrase boundary
iii. Dialect
<p xml:lang = “zh-cn” ssml:lang2 = “cn-sc”> <p>
iv. Sound Effect
1. <prosody post-filter = “some-filter”></prosody>
v. Speaking Style
1. <p ssml:prosody-template = “#1”>***</p>
vi. Macro
1. <macro name = “date”>2005/10/20</macro>
vii. Say-as Extension:
1. Translation:
a) if you can synthesize foreign language, then no need to translation
2. substitution
a) Should allow multiple choice.
e) Discussion
i. Sentence Structure & Word Boundary
1. Paragraph->Sentence->Phrase->Word
2.
Paragraph->Sentence->L0,L1,L2,L3
3.
Paragraph->Sentence->W
(word)
4. Japanese : morpheme, POS is useful for the pronunciation
5. Korean: have space separate word.
ii.
Discussion
1.
Token
a)
Morpheme Dictionary
b)
Word boundary
2. Part of Speech
3. Phrase marking in general for all language: not
8. Session 5
a) JEITA
i. JEITA Speech Group:
1. Expert Committee on Speech Input/Output
2. First version: JEITA-62-2000, Revised version: JEITA-IT-4002
ii. Japanese Pronunciation in phoneme element:
1.
“x-JEITA-IT-4002-kana”,
etc.
iii. How to specify speaking rate in Japanese
1. A basic unit in Japanese in Mora“mora” is called “拍”(haku) in Japanese.
a)
ko N
ni chi wa -> 5 moras
b)
sya
si n -> 3 moras
c)
sya
sin -> 2 moras
2. So mora is fit for Japanese speaking rate.
a) Japanese can specify: 4.5 mora per second.
b) mora can be used to indicate break length.
c) Chinese can specify: ** syllables per minute.
iv. Ruby element
1. Pronunciation Annotation: 今日 kyowa (may be wrong)
a) There is a “Ruby Annotation-W3C Recommendation 32 May 2001”.
b) But the proposed one is simpler and enough.
2. Can this be covered by phoneme?
a) Is there any standard for Japanese Pronunciation Annotation?
v. Expansion of an say-as element
1.
Interpret as “wago” : some kind of date format in
a)
Several kind of date formats in
b)
i.
Hanguil: Chinese characters in
1.
Pronunciations in
2. Korean people basically in Korean way, but sometimes in Chinese way or Japanese way.
ii. Chinese Characters in Korean
1. Chinese characters can be used.
2. 2000 Chinese characters are frequency used Chinese characters.
iii. Annotations: