W3C Workshop on Internationalizing the Speech Synthesis Markup Language III — Minutes

13-14 January 2007

The International Institute of Information Technology (IIIT), Gachibowli, Hyderabad, 500 032, India

Photo_from_third_workshop

Each session includes the presentation of one or two papers, followed by a discussion about at least one item presented in the papers. Some discussions will refer to items from several previously presented papers.

 

Attendees

Dan Burnett (Nuance)
Kazuyuki Ashimura (W3C)
Samuel Thomas (IBM)
Raghunath Joshi (CDAC, Munbai)
Paolo Baggia (Loquendo)
Sheth Raxit
Muralidhara Reddy (Infosys)
Sai Chand Chintala (Infosys)
Anuraag Agrawal (Motorola)
Shiv Gupta (Motorola)
Ruvini Ramanayake (University of Moratuwa)
Paul Bagshaw (France Telecom)
Richard Ishida (W3C)
Kishore Prahallad (IIIT)
Nixon Patel (Bhrigus)
Bill Whisenhunt (Bhrigus)
Varalwar Madhavi (Bhrigus)
Anand Arokia Raj (Bhrigus)
G. Raghavendra Rao (Bhrigus)
Veera Raghavendra Eluru (IIIT)
Sarmad Hussain (National University of Computer and Emerging Sciences; through Skype)

 

Saturday, 13 January 2007

Session 1. Introduction

Moderator:
Dan Burnett
Scribe:
Kishore Prahallad & Kazuyuki Ashimura
Welcome to the workshop by Dan Burnett
Host welcome and logistics by Nixon Patel
Introduction to W3C & VBWG by Kazuyuki Ashimura
Internationalization and new language tags by Richard Ishida
Paul: potential problem 
... particular languege tags
... language should be compatible between PLS & SSML
SSML 1.1 Activities by Dan Burnett
- Requirements
  = "xml:lag" problem

Richard: The content is "written text"
Pronunciation Lexicon Background by Paolo Baggia
Paolo: role attribute

Kishore: lexical information?

Paolo: stress?
... IPA or proprietery(?) vendor dependent specification
... not interoperable part
Question & Answer
(No specific discussion)

Session 2. Language-specific (or Indic-specific) issues, part I

Moderator:
Richard Ishida
Scribe:
Kishore Prahallad & Kazuyuki Ashimura
Overview of Indian languages by Richard Ishida [ Handout / Slides (TBD)]
- geminant of 'l' (l:) or 'a' (a:)
CHARACTERISTICS OF INDIAN LANGUAGES, Madhavi Varalwar [ Paper / Slides ]
- conjunction
- different graphemes between dialects
- reduplication: onomatopoeic, echo
A Note on Interpreting Text for Indian Language TTS, Kalika Bali [ Paper ] (Cancelled)
Focused discussion: Echo expressions, word compounding
Dan: don't care meaning change, but do care pronunciation change
... because SSML is speech synthesis

Paolo: coarticulation in Italian

Dan: in English as well e.g. "New York"
Summary
Issue:
There are combinations of lexical items which must be treated as a
single unit in order to have proper prosody and pronunciation.  The
items in a combination may be used in the same sequence and *not* be
treated as a single unit.  These combinations occur frequently enough
that using the role attribute may be cumbersome or impractical.

A combination may still be ambiguous in pronunciation (and possibly
meaning) and require, for example, homograph disambiguation.

Session 3. Language-specific (or Indic-specific) issues, part II

Moderator:
Nixon Patel
Scribe:
Paolo Baggia
Characteristics of Sinhala Pronunciation, Ruvini Ramanayake [ Slides ]
Presentation of
Ruvini Ramanayake, Univ. of Moratuwa, Sri Lanka

Characteristics of Sinhala Pronunciation

- Research project on TTS engine compliant SSML

Sinhala: mother tongue of majority of Sri Lanka
25000 years old

Sinhala is phonetic language

Sound system:
- 14 vowel sounds, seven are short and seven long.
  two are unique to this language, not present
  in other Indo-Aryan or Dravidian languages
- 26 consonants of pre-nasalized stops

Sinhala localization initiative
- encoding Sinhala characters
- development of Sinhala fonts
- standardization of keyboards

Sinhala encoding
- encoded using consonants, vowels and vowel modifiers
- based on written representation

Unicode encoding:
- Sinhala Unicode defined 1997

Sinhala pronunciation
- vowel schwa not represented
- other are marked with diacritics
- exceptions for consonants
- few exceptions for vowels
  examples
- there are paired words that should be pronounced
  together to get proper sentence
- not much regional variations

Pronunciation exceptions
- differences on part of speech (ex.)
- Pronunciation of some words depends on word
  origin (ex.)
- some words are written differently form pronunciations

Impact of loan-words
- modern Sinhala is more Spoken than Literary
- added vowels (open back vowel and long central vowels)

Representation of Phonological phenomena that rea unique
to Sinhala
- segmental and suprasegmental features to express
  emotions
  - length of consonants and vowels
  - relative timings
- style of reading

Conclusions
- lack of linguistic resources and analysis (no work
  on stress for Sinhala)
- a few documentation on Spoken Sinhala
- romanization non covered

Richard: some specific things we need to think

Ruvini: 
- emotions and speaking styles not covered in SSML
- transliterations there are many issues for latin
  words (text messages) no support in SSML

Richard: In Sinhala there are many differences 
  between formal and informal speech. More than
  English and it occurs in other languages

Ruvini: prosody, way you are expressing words

Paul: comparison with liasons
SSML for Urdu Speech Synthesis, Sarmad Hussain [ Paper / Slides ]
Presentation of Sarmad Hussain (by Skype)
SSML for Urdu Speech Synthesis

Comments based on experience in Urdu TTS system

Using SSML in Urdu
- SSML standards works well for Urdu
- SSML needs to be enhanced to handle some
  Urdu specific problems

Multilingual text
- need to identify one of stress strategies
  - switch to English
  - transliterate and read
  - spell-out

Example English text in Urdu
- few words
- Newspaper
- English paragraph

Digits:
- should be ale to read out Urdu digits
- two strategies for English digits
  - read in English
  - read in Urdu

Date Formats:
- cover other date formats in Urdu (list can be
  provided)
- cover two calendars (lunar based islamic calendar)
  regular calendar


Diacritics:
- optional incompletely and incorrectly given
- examples
- lexical look
  - should be ignored and look up base form
  - must match give diacritics
- If lexicon look-up fails, it uses a pronunciation
  guessing engine

Word Segmentation
- no concept of space
- space used for visual output
- should it point to word segmentation
  (use default, if use engine accuracy
  vs. performance

Dan: why do you need a tag to indicate
  strategy for multilingual text instead of
  tags that implements the strategy.

Sarmad: ...

Dan: Hold discussion for tomorrow morning, not
  today.
Focused discussion: optional/missing diacritics, space is a bad word delimiter, lack of vowel modifier in traditional orthography
Discussion on missing diacritics
Dan: in the presentation on of the things is that
  sometimes the diacritics are missing.
  This topic was requested in another workshop
  from Poland people (for instance for text message)

Richard: Was not standard Arabic problem you have:?

Paul: I don't think is the same problem of re-voweling
  Arabic text.

Dan: Seems we have a trouble to understand

Paul: You need to understand if it is voweled, but
  if it is partial. He will not apply to vocalization.

Sarmad:
- engine to process Urdu text, two possibilities:
  - ignore the diacritics and access the base form
    lexicon
  - if are given you need to look for them
- either ignore or use them
- lexicon look-up phase

Dan: on a word by word or for a document?

Sarmad: perhaps at both levels
- one generic strategy for document level

Dan: These are interesting strategies, but in general
  SSML does not provide strategy indicators, only
  information to the processor to do good TTS.
  What information do you need?

Sarmad:
- Processor can have all these possibilities
  
Dan: You need to know when diacritics are stripped
  or missing? Is that sufficient?

Sarmad: no, the diacritics are almost always missing

Joshi: observations - diacritics in context of Indian
  languages. In Sanskrit we have diacritics. These mark
  for intonation. 

Dan: It is not a problem for other languages?

Joshi: No

Kishore: There are models for Arabic to insert vowels,
  for instance based on HMMs

Dan: Is it performance issue?

Sarmad: For most part is.

Dan: What is not performance?

Sarmad: calendar, multilingual-text

Dan: It is a separate topic to be discussed tomorrow.

Sarmad: ok

Richard: When we will discuss it?

Dan: we can revisit if there is time at the end of
  the workshop

Topic: Space is a bad word-delimiter

Dan: Word segmentation from Sarmad

Dan: We are already considering a word boundary
  mark-up. Do you need anything  more that that.

Sarmad: Or are there other aspects that we haven
  understood in Urdu?
Summary
ISSUE - :

Missing diacritics: Algorithms to recover missing
diacritics in Urdu can be costly. It would be
helpful for the speech synthesizer to know
whether or not it must use an algorithm to recover
diacritics (rather just trusting what's there.

Word Segmentation: Word segmentation algorithms
for Urdu can be costly. It would be helpful for the
speech synthesizer to know whether or not it must
do word segmentation (other than using the
whitespace given).
Discussion on lack of vowel modifier
Dan: Discussion on lack of vowel modifier

Ruvini: there are exceptions in Sinhala.

Vibhre: Given a language there are rules for
  modifications. The issue is: are there rules
  for that.

Joshi: ???

Vibhre: The process might do that. The person
  may chose to switch on or off

Kishore: yes

Richard: These are homographs, but are there
  a huge number of that.

Ruvini: there might be many

Dan: We need to make author work easy, not
  the engine creation easy. 
  How prevalent is this problem of homographs?
  Are there other way to modify it?

Ruvini: There are no way to modify the script.

Dan: Is it enough?

Ruvini: Yes

Dan: In my opinion there is not the material
  for an issue.

Kishore: It happens often?

Ruvini: 1 on 50 words

Dan/Richard: Is a relevant figure.

Kishore: What would you use?

Session 4. Alternative/mixed-language support, Part I

Moderator:
Paul Bagshaw
Scribe:
Kazuyuki Ashimura
Using SSML for Indian Languages Text to Speech Synthesis, Vibhu Agarwal [ Paper / Slides ]
vibhu: (starts his talk)
... page2: common difficulties...
... page3: Handling encoding...
... mentioning <transliterate> tag and its example
... page4: Example of a transliteration rule
... page5: Handling encoding issues...
... rule specification using URL
... page6: Encoding Map
... page7: Handling misxed...
... specify hint to processor

richard: how does the processor can tell  appropriate encoding?
dan: we'll have transliteration session tomorrow in fact
... but this encoding question is interesting

richard: it's a mapping question

dan: yes. this discussion is interesting
... but he has to leave now
... any quick clarification question?

paul: one question on mixed lang
... on <break domain="...">

vibhu: this idea originally came from the mixture of Indian words and
... English words

dan: how about "read" [ri:d] vs. "read" [red]

shiv: how about specifying lang?

dan: you can specify lang and encoding

richard: but what about mixture 
SSML Extensions for Indian Languages, Samuel Thomas [ Paper / Slides ]
samuel: page 2: outline
... 2 issues
... page4: interepreting the input by SSML
... - unicode as is vs. various transliterations
... page5: interepreting the TTS vocabulary
... page6: interpreting the input
... page7: proposed tag for correct interepretation of input
... - codepage
... - URI
... page8: an example of <transliterate> tag
... - specify how to transliterate using codepage and URI
... page10: handling foreign language words
... - loan words
... - which need to be pronunced differently
... slide11: handling foreign language words (existing solution #1)

dan: <voice> element?
... which can specify local lang

samuel: page12: (existing solution #2)
... using "<phoneme>"
... page13: (existing solution #3)
... using "<lexicon>"
... slide14: proposed tag for handling foreign language words
... -lang
... - uri
... slide15: an example of <foreign> tag
... slide16: conclusions
... - two extensions: <transliterate> and <foreign>

dan: any clarification question?

anand: on existing solution #3 lexicon

paul: any other?

raxit: what about 1.1?

paolo: should we refer 1.1?

sarmad: I have a question
... where we are using lang tags

samuel: 1.1 is just published, and so not referred

sarmad: considering we are sensitive to pronunciations
... why not use locale instead of lang
... to identify the dialectal details as well

kishore: how about encoding?
... like utf-8

sarmad: I want to retrieve my question
... if file is mentioned than locale may not be necessary

kaz: if using utf-8, would it better?

sarmad: but then, is lang tag necessary?
Focused discussion: broader language/dialect/script support, loan words
dan: using inter-roman notification?

sarmad: (hard for me to jump in, as I do not have video and I would
... interrupt somebody when I speak)

richard: (goes to the blackboard to explain something)

sarmad: (has question:
... if we are specifying the dictionary, then why lang is needed)

kaz: (will interupt them after richard's explanation
... is that ok with you?)

sarmad: let me know when to speak

kaz: ok

richard: several kinds of interepretation:
... - translation
... - transliteration
... - transcription
... - transcoding

paolo: shows IPA handbook

sarmad: URI should be sufficient, lang many not be required
kaz: (is consulting paul about when sarmad should join in the discussion)

kishore: (came to the blackboad and start some explanation)
... (picture)

sarmad: if we are specifying the dictionary, then why lang is needed?

richard: there was a discussion in W3C

dan: thinks it's required for the distinction between transcription
... and transliteration
... related to pronunciation registry issue
... my preference is using "xml:lang" option

richard: but the question is "why lang is necessary"
... (in short, sometimes, somebody wants to use co mbination of
... various words from various languages in *one* file)

sarmad: then lang should be replaced by transcription_scheme
... to interpret the lexicon

kaz: (e.g. room, kmra, kamra and room number like 203, 468, etc.)
... (IMHO, it's possible idea)
... (at least, it's not that everybody agrees "lang" is the nicest name :)
... please join the discussion itself

sarmad: why do we need to know if it is hindi if pronunciation is
... given on how to speak the word

kaz: because I have to scribe all the discussion...

sarmad: ok

kaz: tx

samuel: if we use English synthesizer, we may have some part of some
... other languages

dan: to me, this is script question

sarmad: let me know when I can jump in

paul: if we use pinyin, why should we specify lang?

kaz: (to sarmad: whenever you want. please feel free to join. like
... ordinary telephone call ;-)
sarmad: ok

dan: (goes to blackboard and try to explain Russian example)

richard: let me try

dan: thanks
... (continue)
... (picture)
... (picture)
... in some languages, certain character is used as (actually) not a
... word but punctuation

richard: what about "to you => 2 U" ?

joshi: let my show more examples
... (go to blackboard)
... (picture)

dan: the difference of orthography, transcription and transliteration,
... though not sure how to define orthography in this case
... a bit different from codepage issue

samuel: but I don't change the engine
... keep using same engine

dan: how to indicate: chinese lang + korean voice, etc. (possible
... various combination of options) ?
... it's underspecified issue
... is this common issue in Indian languages?
... richard?

sarmad: could I comment?
kaz: sure

sarmad: we are talking about "speech locale" as an extension of text
... locale we use
kaz: please speak after dan's talk :)

sarmad: speech locale = country+lang+script

sarmad: would like to suggest a "speech locale" which is combination
of contry, lang and script
... necessary information from speech synthesis vewpoint

dan: interesting suggestion
... several thoughts on this
... possibly needing target text which includes several languages 
... from text content view point, there is a possible spcifier, BCP47 

richard: clarification question
... do you want to point some name to transcription, samuel?

samuel: my proposal (interepreting the input by SSML) is different
... from xml:lang
... (is this your voice not for this call but the ordinary phone?)

dan: what does "codepage" mean?
... could be converted to xml:lang?
... "mapping table" is a bit confusing to me...

richard: another example on blackboard
... (picture)
... (btw, xml:lang can specify all the combination of contry, lang and script)
... just saying "mapping" is ambiguous...
... example of difficult parsing given simple mapping
... (picture)

dan: if we need some new script, we can continue the discussion in session 6 
... many Indian participants mentioned this issue as "transliteration issue"
... it's interesting and should be discussed
... let's wrap up today's discussion. Is anybody concerned there is
... something not covered in today's sessions?
... (though of course not necessarily all are resolved)
... (adds items to "Discussion list")
... [Discussion list]
... - Emotion
... - Speaking styles
... - Formal/Informal distinction (Sinhara, French)
... - Non-pronunciation syllabic control
... - Error handling
... [Say-as issues]
... - Date-multiple calendars
... - Digits

 

Sunday, 14 January 2007

 

Session 5. Alternative/mixed-language support, Part II

Moderator:
Richard Ishida
Scribe:
Kazuyuki Ashimura
Indic extensions Accent marks and Concrete text, R. K. Joshi [ Paper / Slides ]
page0: title page
- construction of syllable

page1: Indian languages: written and spoken

page2: Single word with different pronunciations in different Indian languages

page3: Proper name: written and spoken
- between writing and speech
- phonemic construction

page4: Sentences: written and spoken

page6: The following 4 extensions are being proposed:
- say-as-if
- say-to-self
- say-bil
- pnps

page7: say-as-if
- co-relationship between writing and speech
- speak something as if it were something else

page8: say-to-self
- thinking aloud

Q: monologue???

page9: say-bil
- sudden and frequent change of lang
- e.g. bilingual speaker

page11: pnps

page13: Position Statement
- Identification of phonemic strings
- additional ponohemic information
Internationalizing W3C's Speech Synthesis Markuup Language Workshop III, Raxit Sheth [ Slides ]
page2: Important...!!!
- mixed languages contents

page3: example

page4: alternate content

page3': demo
(showing the example SSML)
1. <p xml:lang="en-US">English</p$gt;
2. <p xml:lang="HI">Hindic</p$gt;
3. <p xml:lang="en-US">English</p$gt;

raxit: possible action if engine can't synth?
dan: just skip and no output


page 5: Behaviour of SSML Engine for Non-Supported Languages
- several possible actions...

(revisiting the example SSML)

raxit: so just synth number in the Hindic sentence?
... if forcefully render, maybe junk contained...
... will show you 2 demos

(demo speeches)
("junk" played...)

richard: what was the encoding for the "HI" sentences
... is 'latin' forcefully specified?

dan: any additional implicit specification on encoding?

raxit: no. nothing added. engine processed this input asis.

page7: SSML 1.1

page8: SSML Application Developer is not in Control, But the platform is
page9: Nice to have error reporting/handling mechanism in SSML
       Not only for mixed lang

page10: Possibilities for following
- <alt>
- <log>
- <SkipTag>
(revisit the example SSML again)
- <SkipContent>
- <stopOn*Error>

page11: <error>
- example notation of prospective <error>

page12: Some More Rules...
- the info of these proposed tags should be inherited by children

page13: in addition <catch>

page14: some thoughts

page15: Points to Consider...
- "1.3 Document Generation, Applications and Contexts" SSML 1.1
- fit for next VB/MI platform

page16: Error handling for Language support

page17: Application Developer knows more about
features/functionalities of SSML processor
- Better User Experience

(revisit the example SSML again)

paul: needs one clarification
... compatibility between 1.0 and 1.1?
... still capable to specify "1.0"?

dan: if the vendor supports "1.1" they should/must support "1.0"
Focused discussion: support for mixed-language text
dan: let's restrict the discussion to the "error handling"
... because handling all the topics proposed by this talk is 
... beyond this "mixed-languages" session

(add "Behavior when language unsupported" to the "Discussion list")

dan: those related to this "Altenative/mixed-language" session:
4: Madhavi
9: Samuel
11: Vibhu
8: Joshi
10: Kalika
13: Raxit

paul: dominant language
(go to blackboad and explain some example)
(picture)

[[
<lang EN> ... </lang>
(what happen here???)
<lang HI> ... </lang>
]]

[[
<lang EN> (as lang1)
  <lang HI> (as lang2) </lang>
  English
</lang>
]]

dan: there was similar mixed lang discussion in previous workshop

richard: so what's the "issue" in the end???
... what should SSML support?

dan: if US-English, Indian-English and Hindi are mixed, what would be
... expected? or Italian script vs. American phoneme set

Q: proper name, number etc.?

raxit: (go to blackboard and explain some example)
[[
<p lang="en-US">
My Name is SANJAY
]]

dan: (also provide example)
[[
<p lang="en-US">
My Name is Raxit
]]

kishore: it's question of "user's phoneset vs. engine support"

dan: synth engine just has limited phonesets
... two separated attributes were proposed in previous workshop on
... this topic combination of "voice", "lang", "country" and "script"

samuel: what should happen, if an Indian guy mainly use English like
... me speak? the voice is of course mine, though

raxit: (go to blackboad and starts another example)
[[
<p lang="en-US">
My Name is <w lang="HI">Raxit</w>
</p>
]]

dan: how about e.g. Japanese text but should be pronunced as if spoken
... in US by English speaker?

richard: you have to add varient for specify it

ruvini: why need separate tag?

kaz: can't we revist this topic in the session7 as the question of
... "pronunciation of proper name"?
dan+richard: no

richard: shall we take break now and continue discussion in the break?

paolo: we have 2 things: "speaker + lang" vs. "text + lang"

kishore: <w> can have "lang" in it?
dan: yes

dan: SSML tag is not control but annotation
... "lang" just specify the language but doesn't specify
...  "what to output"

richard: let's take break now and continue discussion in the break!
Summary
When languages are not supported by a platform, platforms vary in
their behavior and can behave quite badly (from a UI perspective).
Developers should have control over what happens.  Raxit Sheth has
some examples and suggestions.

Session 6. Pronunciation Alphabets

Moderator:
Paolo Baggia
Scribe:
Paul Bagshaw
IT-3 Transliteration, M Nageshwara Rao [ Slides ]
Indian languages share a common phonemic base.
Indian Transliteration (IT-3) each character not mapped to more than 3
letters and it is case insensitive.

Clarifications: IT-3 is not a standard, people use different
schemes. A single "latin" transliteration in IT-3 can map to many
Indian language scripts, and can give, for example a Tamil script of a
Hindi word.
Focused discussion: Transliteration
[Dan] Comment: This transliteration might appear to be similar to
Pinyin.

[Richard] The mapping is sometimes not so simple.

[Samuel Thomas] Yes, sometimes you need to drop a schwa.

[Richard] Why do we need transliteration in speech synthesis?

Use of Latin symbols serves as a common basis in the synthesis of the
many scripts.

A pointer the transliteration scheme may be required because the
author might have devised transliteration mapping (non standardised
way of representing sounds).

[Dan] Is this a generalisation of the alias element operating at a
character level?

[Paolo] This is like vowelisation in Arabic, which may be done outside
of TTS. Is this transliteration process really necessary inside TTS.

[Samuel] It is necessary for a TTS system to be able to process both
text in native script and when it has been transliterated. When it is
transliterated we need to know what scheme has been used to perform
the transliteration.

Discussion leads to clarification that text normalisation (<sub> for
example) is applied to transliterated script. Transliterated "latin"
script (not the native script) is used throughout the TTS processing
(linguistic/syntactic analysis on the input text, etc.)

Transliterated script may be interleaved with native script, that's
why mark-up is necessary within an SSML document.

Discussion on the content of the transliteration scheme file at the
end of the URI. No standard format. Idea floats around of using a PLS
lexicon to define the transliteration, but this is not a use of PLS
that should be encouraged and it may not be entirely appropriate.

[Dan] Presents * ISSUE STATEMENT *

Those present OK with statement.
Focused discussion: non-IPA and syllable-based pronunciation alphabets
[Dan] Given that there may be a pronunciation alphabet registry and
that the scripts are essential phonetic, is there an issue here?

[Kirshore] Want to specify the pronunciation by specifying the
syllables rather than specifying the phonemes.

The problem seems to be in the wording of the element <phoneme> and
that clarification is required stating that this element may be used
to give information about the pronunciation (it may be partial,
e.g. just the stress pattern of a word; it may be a syllable-based
pronunciation alphabet).

Session 7. Miscellaneous

Moderator:
Kishore Prahallad
Scribe:
Paolo Baggia
Pronunciation of Nouns in Text to Speech systems, Lavanya Prahallad [ Paper / Slides ]
Dan: Review the final wording on: Transliteration
Everybody happy!

Pronunciaiton of Nuans in TTS systems
Veera Raghavendra, IIIT

Agenda:

Nature of Indian language script
- originate from Brahmin
- basic units

Convergence & Divergence
- 21 official langauges 1652 dialects
etc

Particles:
- Hindi adding particle for instance "ji"
  give respect
- Examples

Proposal <particle> tag
type "ji"

Use of Loanwords
- 33% errors TTS are due to loan  words
- example "cancer" "kaansar" no 

Use of mention
- more emphasize on the first occurrence
 of the proper name

Duration prediction using Mention Information
- duration modeling
- example <mention>

Conclusions:
- issues on Indian Script

Paul: occurrence in <mention> cannot count?

Kishore: it is indicating the 

Paul: You go in the direction of theme/rheme
  I'm not convince it is occurrence the key.
Discussions
Dan: Two discussions:
- proper names
- generic issues 

Proper Names:

Dan: Veera mentioned particles, why you do it?

Veera: Not spoken all the time

Dan: Why you need the mark-up?

Raxit: You can write in the text

Dan: Whynyou need mark-up for that?

Kishore: We don't need specific markup, but it is
  at choice of the developer

Dan: There isn't a rule on this on SSML that very
  minor changes should be written in the orthography.
  The SSML are hints on how to interpret.

Kishore: There might be difference in politeness

Richard: How do you give that to the TTS?
  Second comment there are many other features
  you should change for giving more respect

Dan: I agree. Were there other presentations on
  proper names?

Kishore: The Joshi example was on difference of
  pronunciation. Eliminate the pause between
  name surname.

Paolo: It seems to me a case of <say-as>

Dan: Let's focus on proper name. Is there something
  specific?

Kishore: There are no capital letters, you understand
  from the context.

Veera: There are also other words: "good morning"

Dan: I'd like to understand how is used in the language
  Is there difference from proper names and the word
  of this last example?

Kishore: no

Dan: THere are examples in English, "water bottle"

Paul: This is related to the language

Paolo: This applies to all the names in India

Kishore: Many names are single word.

Raxit: You can use a <break> of zero lenght or no break

(discussion on changes of pronunciation of proper name
 in different languages)

Dan: This seems the case to mark the name with <say-as>

***  Check 
the requirements document

Section 7
- better support to Chinese and Korean
- no capitalization
- mark the name to change the pronuncitation

RESOLUTION:

7.2 Seems to satisfy this requirement

General agreement.

Dan: The problem on this area is that there is no
  agreement on a W3C note. Critical area
  This is the only reason.

  We are interested on any requirement that 
  improve the SMML.

================================================
Proposal:

We know this is a proposal for indian languages
too.

We may wish to modify the Requirement document
to accomodate the issues.

================================================

Paul: You can send comments to the Requirement
  document.

Kaz: This is the reason because it is a Working
  Draft

Dan: There might be another change you might
  ask. To remove "A future version".

Dan: Alternate say-as types. Joshi suggested
  some. Are there any?

Samuel: What about the loan word 

Raxit will send the final version of his presentation.

================================================

Dialects:

Samuel: How to register the dialects?

Richard: Go to http://www.w3.org/International/

- "say-as-if"

Dan: Highlights troubles with "sub"

Paul: ??

- "say-bil" 

Dan: this is the mixed-language issue

Paul: I thought it was more related to language
  predominance. Difference of bilingual to person
  speaking foreach language

Dan: I agree there is a subtle difference

Remaining features:
Say-as there is a request from Sarmad
of say-as on date in multiple calendare

Dan: Closed discussion on say-as.

================================================

Homographs in Sinhala

Ruvini: discussion on short-phrases that are
  pronounced in different ways

Dan: you can address linke homgraph in Sinhala

================================================

Complete Discussion List: 
- Emotions
- Speaking style
- Formal/informal distinction
- Non-pronunciation syllabic control
- Behaviour when language unsupported
- say-as-if
- Homographs in Sinhala

Pruned list:
4 Emotions (2)
6 Speaking style (5)
3 Formal/informal distinction (0)
5 Behaviour when language unsupported (1)

Final list:
- Speaking Style 15 min
- Emotions 5 min
- Behaviour when language unsupported 5 min
- Formal/informal ?

===============

Speaking Style:

Paul: Is related to voice or to markup the content?

Paolo: In my opinion to the content

Dan: Is there particular issues for Indian languages?

Paul: It might be useful to know the origin of the
  text. For some languages ...

Kishore: I agree

Dan: Why is language dependent?

Discussion

N.Rao: Reading religious texts: Gita

Paul: Not sure if it is more speaking or add
  more information on content

Dan: not sure why is relevant to internationalization

Richard: Proposal to write a proposal after get togheter

Dan: Good but in general

===============

Emotions:

Paul: In speech you have expressivity, not emotions

Raxit: mode element Funny, to render funny content
  it should for the SSML engine. Not for all engines
  no embedded.

Paolo: advertisement of Incubator Emotions and 
  proposal 

Madhavi: In Indian languages there are specific words
  that indicates emotions.

===============

Behaviour of language not supported

Dan: Two main approaches:
- so far, define and document
- another should provide control at a higher level
  report error
  not report error and keep going

Dan: If you give control you need to have it implemented
  and easily

Raxit: This is important for SCXML, CCXML, VoiceXML

Dan: They have control flow, SSML no

Raxit: It should have.

Dan: Markup has syntax and a certain control, but
  I was speaking of control flow.

Dan: Can you explain why it will help internationalization?

Raxit: The application developer can do more if the
  strategy is simple. It is all the application scenario.
  MMI requirement

Paul: On documentation of language supported per voice

Dan: Some variable may be needed for caching control
  To control is more complex

Samuel: Throw error

===============

Conclusion:

There is significant interest in throwing error 
and some interest in catching error in SSML.

=================================================

Session 8. Conclusions and next steps

Moderator:
Dan Burnett
Scribe:
Richard Ishida
Finish writing problem statements
Paul remembers developing a table about the use of xml:lang and voice.

Havnign acknowledged that xml:lang is to described document content,
and that it is independent of the selection of voice, when you specify
voice in a particular language context you want to be able to define
the behaviour.


Voice selects acoustic elements (sometimes prosodic) - every
synthesiser has to do this (eg. this is my GB male voice...)

It is also possible to set xml:lang to various values.

"Long discussion about table" photographed by Kaz.

It would be helpful to build a table that has xml:lang and voice as
the two variables, where xml:lang is set to something and where voice
describes the languages for which there are acoustic units and the
accent for them.

We are not convinced that the texxt in the first working draft of SSML
1.1 adequately captures the desires of application developers for
control of tts processing. We do not know what application developers
need to control. In order to better understand what should be done we
need to construct tables contrasting values for voice and values for
xml:lang. For xml:lang we believe we need to consider language, region
and script. For voice we need to capture descriptions of acoustic
units support by language and possibly accent for the unit support.
These tables should help us understand what the developers want to
control.

We need to consider which text to acoustic unit rules are to be used.

In addition to the tables, we may wish to separately list practical
combinations of voices and text to acoustic unit rules desired by
authors. We will likely wish to break down text to acoustic unit into
multiple levels.
W3C participation recap by Kazuyuki Ashimura

 

[Workshop ends]

 

 


The Call for Participation, the Logistics, the Agenda, the Presentation guideline and are also available.


Dan Burnett and Kazuyuki Ashimura, Workshop Co-chairs
Max Froumentin, Voice Activity Lead

$Id: minutes.html,v 1.20 2007/02/05 14:04:00 ashimura Exp $