W3C

SSML 1.0: Candidate Recommendation Disposition of Comments

This version:
June 29, 2004
Editor:
Daniel C. Burnett, Nuance

Abstract

This document details the responses made by the Voice Browser Working Group to issues raised during the Candidate Recommendation (beginning 18 December 2003 and ending 18 February 2004) review of Speech Synthesis Markup Language (SSML) Version 1.0 Candidate Recommendation. Comments were provided by Voice Browser Working Group members, other W3C Working Groups, and the public via the www-voice-request@w3.org (archive) mailing list.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document of the W3C's Voice Browser Working Group describes the disposition of comments as of June 29, 2004 on Speech Synthesis Markup Language (SSML) Version 1.0 Candidate Recommendation. It may be updated, replaced or rendered obsolete by other W3C documents at any time.

Comments on this document and requests for further information should be sent to the Working Group's public mailing list www-voice@w3.org (archive). Note as a precaution against spam, you should first subscribe to this list by sending an email to <www-voice-request@w3.org> with the word subscribe in the subject line (include the word unsubscribe if you want to unsubscribe).

This document has been produced as part of the W3C Voice Browser Activity, following the procedures set out for the W3C Process. The authors of this document are members of the Voice Browser Working Group (W3C Members only).

Table of Contents


1. Introduction

This document describes the disposition of comments in relation to the Speech Synthesis Markup Language (SSML) Version 1.0 (http://www.w3.org/TR/2003/CR-speech-synthesis-20031218/). Each issue is described by the name of the commenter, a description of the issue, and either the resolution or the reason that the issue was not resolved.

Notation: Each original comment is tracked by a "Candidate Recommendation Public Comment" [CRPC] designator. Each point within that original comment is identified by a point number. For example, "CRPC5-1" is the first point in the fifth CR public comment for the specification.

2. Comments

Item Commenter Proposed disposition Status
CRPC1-1    David Descamps    N/A (Question)    Implicitly accepted   
CRPC1-2    David Descamps    N/A (Question)    Implicitly accepted   
CRPC2-1    David Descamps    N/A (Question)    Implicitly accepted   
CRPC3-1    Roopa Trivedi    N/A (Question)    Implicitly accepted   
CRPC4-1    Susan Lesch    Accepted    Accepted   
CRPC4-2    Susan Lesch    Partially accepted    Accepted   
CRPC5-1    Roopa Trivedi    N/A (Question)    Implicitly accepted   
CRPC6-72    I18N Interest Group    Accepted    Accepted   

Issue CRPC1-1

From David Descamps

quote:

"...

Relative changes in prosodic parameters should be carried across voice changes. However, different voices have different natural defaults for pitch, speaking rate, etc. because they represent different personalities, so absolute values of the prosodic parameters may vary across changes in the voice.

..."

if I understand that, the synthesis processor must make the difference between, for example, baseline pitch change and relative pitch change?

In the "prosody" element: when you change your pitch by

- a number followed by "Hz", you change the baseline pitch

- and a relative change or "x-low", "low", "medium", "high", "x-high", or "default", you change the relative pitch.

Is It right?

Proposed disposition: N/A (Question)

Yes, the processor must differentiate between absolute changes of the baseline and changes relative to the baseline.

A number followed by "Hz" and "x-low" through "x-high", etc. are absolute (baseline) pitch changes. Only relative changes change the relative pitch.

Email Trail:



Issue CRPC1-2

From David Descamps

If my understanding of the previous point is correct, does a baseline change in a "prosody" element cancel previous relative change?

Proposed disposition: N/A (Question)

Yes. Note that it would only cancel relevant relative changes. For example, setting the baseline pitch would not reset relative pitch *range* changes.

Although it is easy to construct silly or bizarre example combinations of absolute and relative changes, most of which will hopefully be ignored by intelligent processors, the goal of this separation was to simplify the case where an author has increased the tempo/pitch/etc. of a voice and wishes that same relative change to apply when a small amount of text in another language (and voice) is embedded in the stream.

Email Trail:



Issue CRPC2-1

From David Descamps

quote:
"...
Relative changes in prosodic parameters should be carried across voice changes. However, different voices have different natural defaults for pitch, speaking rate, etc. because they represent different personalities, so absolute values of the prosodic parameters may vary across changes in the voice.
..."

How the syntesis processor have to deal with relative pitch change in Hertz or in semitone?

/******

For example, you have a male voice (baseline pitch of : +/- 100Hz) with a relative change of 10Hz. You change the voice in a female one (baseline pitch : +/- 180Hz) and you keep the relative pitch change:

male : 100 -> 110 Hz : +10.0%
female : 180 -> 190 Hz : + 5.5%

the proportion of your relative pitch change in Hertz have been corrupted!
*******/

How the synthesis processor have to deal with relative change in Hz or st through a voice change: keep the value or keep the proportion?

Proposed disposition: N/A (Question)

You would keep the value. It is already possible to make relative changes in percentage terms. It is the author's responsibility to specify the relative change in the terms desired.

Email Trail:



Issue CRPC3-1

From Roopa Trivedi

SSML 1.0 spec says that

"gender: optional attribute indicating the preferred gender of the voice to speak the contained text. Enumerated values are: "male", "female", "neutral""

Does this mean that "neutral" is yet another type of gender supported by some TTS vendors? Or, does it mean that the user does not wish to specify a gender and thus uses "neutral" to leave the gender selection elsewhere.

Proposed disposition: N/A (Question)

When you use the "<voice>" element you are requesting to the sythesis processor the "best" voice that you need for your application. So if you ask for a "neutral" voice the engine will do its best to find the voice that best suits your request.

Email Trail:



Issue CRPC4-1

From Susan Lesch

The document is served ISO-8859-1 as far as I can tell from .htaccess
but the change notes say "Changed examples to use utf-8." So somewhere
in production there is an encoding mismatch. For example:

     &Acirc;&sect;

looks like this:
Image showing how the section symbol is displaying as a combination of characters in section 1.3.
     &auml;&raquo;&Scaron;&aelig;&mdash;&yen;&atilde;&macr;
[etc.]

looks like this:
Image showing how Japanese text in section 1.2, bullet 3, is displaying as characters that are most definitely not written Japanese.
Analysis:
2004-03-09: Max proposed a response.
2004-05-18: SSML was written to be UTF-8. There still seems to be a problem with the document not being served as UTF-8. Dan will send email to Dave and Max asking them to address this problem. Assigned to Dave and Max.
2004-05-18: Dan sent email to Dave and Max.
2004-05-25: Max requests the most recent version of the document. Dan sends his final SSML draft of last year to Max and Dave. 2004-06-01: We agree to the following public response: "The original document was in UTF-8, but an error occurred somewhere in the publication process. It is our understanding that all specifications are now being served in UTF-8, so this should not be a problem for the Proposed Recommendation and Recommendation."

Proposed disposition: Accepted

The original document was in UTF-8, but an error occurred somewhere in the publication process. It is our understanding that all specifications are now being served in UTF-8, so this should not be a problem for the Proposed Recommendation and Recommendation

Email Trail:



Issue CRPC4-2

From Susan Lesch

These caps can be lowercase to match your RFC 2119 convention:

     the processor MUST render
     The processor SHOULD also
     text MUST be rendered

But because of this use of must:

     Defining a comprehensive set of text format types is difficult
     because of the variety of languages that must be considered and
     because of the innate flexibility of written languages.

RFC 2119 markup would help. There is example XHTML and CSS in the
Manual of Style (can be adapted).
Analysis:
2004-03-09: Max proposed a partial response.
2004-05-18: Group accepts Max's suggestion. Group accepts Susan's suggestion to lowercase the three keyword instances listed. Style changes suggested by Susan are not considered necessary and will only be done if Editor has spare time.

Proposed disposition: Partially accepted

We will convert the three keyword instances you list to lower case.
We will change the offending "must" to "have to" in the sentence you quote.
Thank you for the style suggestion. We may or may not implement this, as time permits.

Email Trail:



Issue CRPC5-1

From Roopa Trivedi

The SSML 1.0 spec says in the following section http://www.w3.org/TR/speech-synthesis/#S3.1.8

that

"SSML only specifies the say-as element, its attributes, and their purpose. It does not enumerate the possible values for the attributes. The Working Group expects to produce a separate document that will define standard values and associated normative behavior for these values."

Is there a separate document already that we can refer to for the values of say-as attributes? If yes, where can we find that? If not, would these values be vendor dependent for now?

Proposed disposition: N/A (Question)

Proposed:

Currently, values for these attributes are not defined and are therefore effectively vendor-dependent as of today. We are currently working on the document mentioned above.

Email Trail:



Issue CRPC6-72

From I18N Interest Group

This item was originally point 145-72 in the Last Call Disposition of Comments. I18N wished to see further clarification in the next process stage for the specification.

Proposed disposition: Accepted

We will remove the offending line.

Email Trail: