SSML 1.0 Implementation Report

Version: 15 July 2004

Contributors:

Laura Ricotti, Loquendo (Chief Editor)
Paolo Baggia, Loquendo (co-editor)
An Buyle, Scansoft
Dave Burke, VoxPilot
Daniel Burnett, Nuance
Jerry Carter, Independent Consultant
Sasha Caskey, IBM
William Gardella, SAP
Frederic Gavignet, France Telecom
Edouard Hinard, France Telecom
Jeff Kusnitz, IBM
Paul Lamere, Sun
Rob Marchand, VoiceGenie
Sheyla Militello, Loquendo
Luc Van Tichelen, Scansoft

1. Introduction
- 1.1 Implementation Report Objectives
- 1.2 Implementation Report Non-objectives
2. Work During the Candidate Recommendation Period
3. Participating in the Implementation Report
4. Entrance Criteria for the Proposed Recommendation phase
5. Implementation Report Requirements
- 5.1 Detailed requirements for Implementation Report
- 5.2 Notes on Testing
- 5.3 Out of Scope
6. Systems
7. Test Classification
- 7.1 Introduction
- 7.2 Method
- 7.3 Test assertions labels
8. Test Results
9. References
Appendices
- Appendix A. Test assertion XML API definition
  - A.1 Instruction
  - A.2 Reference Markup
  - A.3 Test Markup
  - A.4 Document Type Definition
  - A.5 Test examples
- Appendix B. Downloading tests
  - B.1 The Manifest
  - B.2 The Report Submission Template
  - B.3 The Stylesheet
- Appendix C. Acknowledgements

1. Introduction

The SSML 1.0 Specification entered the Candidate Recommendation period on 18 December 2003.

The planned date for entering Proposed Recommendation is 15 July 2004. Preparation of an Implementation Report is a key criterion for moving beyond the Candidate Recommendation phase. This document describes the requirements for the Implementation Report and the process that the Voice Browser Working Group will follow in preparing the report.

1.1 Implementation Report Objectives

Must verify that the specification is implementable.
Must demonstrate interoperability of implementations of the specification.

1.2 Implementation Report Non-objectives

The IR does not attempt conformance testing of implementations.

2. Work During the Candidate Recommendation Period

During the CR period, the Working Group will carry out the following activities:

Clarification and improvement of the exposition of the specification.
Disposing of Comments that are communicated to the WG during the CR period.
Preparation of an Implementation Report meeting the criteria outlined in this document.

3. Participating in the Implementation Report

You are invited to contribute to the assessment of the W3C SSML 1.0 Specification by participating in the Implementation Report process.

Deadline for submission of a SSML Implementation Report is 18 February 2004.
All the tests reported in the table below and the report submission format are provided with this document. The whole test suite including the report submission format is also contained in ssml-ir-20040715.zip, which is a revised version after the comments of SSML implementers enumerated in Section 6, Systems. The changes applied to the test suite are detailed here. The content of the zip file is described in Appendix B.
Comments on this document and the test suite or requests for further information should be sent to the Working Group's public mailing list www-voice@w3.org (archive). If an updated version of this document or the test set is published, a notification will be sent to this mailing list.

4. Entrance Criteria for the Proposed Recommendation phase

The Voice Browser Working Group established the following entrance criteria for the Proposed Recommendation phase in the Request for CR:

Sufficient reports of implementation experience have been gathered to demonstrate that synthesis processors based on the specification are implementable and have compatible behavior.
Specific Implementation Report Requirements (outlined below) have been met.
The Working Group has formally addressed and responded to all public comments received by the Working Group.

5. Implementation Report Requirements

5.1 Detailed requirements for Implementation Report

Testimonials from implementers will be included in the IR when provided to document the utility and implementability of the specification.
IR must cover all specified features in the specification. For each feature the IR should indicate:
- Feature status: required, optional, other
- Feature utility/usefulness based on feedback from implementers
- Implementability of the feature specification
- Interoperability of multiple implementations of the feature
Feature status is a factor in test coverage in the report:
- Required specification features must have at least two implementations. Implementations that do not implement a required specification feature must document the reason for not implementing the feature.
- Optional specification features must have either at least one or at least two implementations, depending on whether the feature's conformance requirements have an impact on interoperability. Implementations that do not implement an optional specification feature should document the reason for not implementing the feature. The following criteria were used to decide whether an optional test assertion required one or two tests:
  1. If the specification text contains the word "should", then the associated test assertion will be optional and will be characterized by 1 test;
  2. If the specification text is of the form "if an implementation claims to support X, it must do Y", then the associated test assertion (for feature "Y") will be optional and will be characterized by 2 tests;
  This categorization provides a distinction between optional test assertions that demonstrate portability among vendors who support the feature (2 test cases) and optional test assertions for a feature for which no portability is truly to be expected (1 test case).

5.2 Notes on Testing

A test report must indicate the outcome of each test. Possible outcomes are "pass", "fail" or "not-impl". "pass" requires output of the synthesis processor that has been judged valid by one or more test validators (see below). Note that the evaluation criteria for some tests are subjective. A report must document the way test output was verified. "not-impl" means the synthesis processor has not implemented the specific feature required by a test.
A test report may contain an additional comment for each test. If a test fails, a comment should be added (see also Detailed requirements for Implementation Report).
Every attempt has been made to keep the tests language-neutral through the use of the Test API described in Appendix A. Tests are written in US English, with the exception of some tests, which that are language-dependent.
Some tests contain notes that should be read before executing them. These notes are contained in the instructions inside the tests. See Appendix A for a detailed description of the coding rules.

5.4 Out of Scope

SSML Implementation Report will not cover:

Integration with other markup languages (SMIL/ACSS/VoiceXML)

6. Systems

France Telecom

Exec Summary

As a global telecommunications carrier, the France Telecom group believes that the SSML 1.0 Candidate Recommendation establishes a comprehensive solution facilitating the contribution to projects with technologies that will be part of many people's daily life in the near future.

Committed to customer care, responsibility and Innovation, France Telecom is therefore happy to contribute to this Recommendation by submitting the following SSML 1.0 Implementation Report and to support the activities of the W3C Voice Browser working group.

Loquendo S.p.A.

Exec Summary

As a leading player in speech technologies and voice platforms, Loquendo believes that SSML 1.0 Candidate Recommendation has been an essential step to simplify the use of text-to-speech. Indeed, this will allow the content creators to deliver a much richer user experience by exploiting the full potentiality of the text-to-speech. Moreover the adoption of SSML 1.0 will allow a higher degree of usability for people with disabilities (especially blind people to interact with the Web).

Loquendo is very pleased to contribute by submitting the SSML 1.0 Implementation Report and will continue to give a strong support to the activities of the W3C Voice Browser and Multimodal Interaction Working Groups, and of the VoiceXML Forum for actively contributing to the evolution of this and other related specifications.

ScanSoft

Exec Summary

ScanSoft is pleased to have been an active participant in the W3C Voice Browser Working Group, and in the development and proliferation of the SSML specification. It is clear that SSML, together with the related SRGS, VoiceXML and SI specifications, is integral to the development of advanced technologies that change the way we communicate, from interactive voice response solutions to in-vehicle automotive applications. Businesses and consumers alike will benefit from the support of SSML in ScanSoft's range of RealSpeak synthesis products to add the most natural speech output to their applications.

Some general comments on the tests

The system used for this implementation report does not handle protocol aspects and by consequence doesn't handle the meta element.
Tests that involve setting pitch, contour or range targets do not necessarily lead to audible differences as the system may elect ignoring these targets for realizing optimal natural speech output as per section 3.2.4. To ease the testing for this implementation report we examined whether the synthesis engine received the correct targets from the SSML parser rather than whether the resulting speech output actually realized these targets.

Voxpilot

Exec Summary

Voxpilot is delighted that the W3C Voice Browser Working Group has published the Speech Synthesis Markup Language (SSML) 1.0 as a W3C Candidate Recommendation. SSML plays an integral part in the W3C Speech Interface Framework, enabling developers to create compelling voice applications by facilitating rich control of audio rendering via an open standards, markup-based approach. Voxpilot is pleased to have contributed to the W3C Voice Browser Working Group's effort to develop the SSML 1.0 specification.

This implementation report is based on the updated version of the SSML test suite. Voxpilot's implementation of its SSML Processor interfaces through an abstraction layer with a number of leading TTS engine vendors and Voxpilot's own engine for streaming local and remote audio files.

W3C

Exec Summary

The SSML specification refers to text-only synthesisers (in 3.3.1 and 3.3.3), and describes their behaviour in some cases. A few tests in the SSML test suite cover this behaviour, and this implementation was written to demonstrate that a text-only implementation can be written to implement them.

This implementation processes a SSML file by first checking it against the SSML XML Schema, and then by running the file through an XSLT transform that extracts the text and handles the cases described in sections 3.3.1 and 3.3.3

Because this is quite a special case of an implementation of SSML, it was often not obvious whether to decide if it passed some test, or whether it implemented them. For instance, many tests require that the output of the test and that of the reference file be similar. It was often the case with this implementation (as the text output was indeed the same). However it could also be argued that this text synthesize does not implement the feature, as it does not apply to text output. The author does not consider the distinction critical in this case, as the actual purpose of this implementation is to show that assertions 109 and 268 are implementable.

Many tests require that an attribute value be "accepted". This implementation passes those tests as it recognises the values as valid. However they are not actually processed: the output is not affected by the attribute value.

7. Test Classification

7.1 Introduction

The aim of this section is to describe the taxonomy of tests developed for the SSML 1.0 Specification. Some basic assumptions (described in the sections below) led to the development of criteria that were used to categorize different tests according to what would be needed to structure, run, and pass the test. This categorization approach was then verified against each element and attribute described in the SSML 1.0 Specification. The overall work has resulted in a classification of the tests outlined below, where each test assertion has to be labelled.

7.2 Method

It is assumed that most effects generated by SSML testing would be reflected only subjectively verifiable via human perception of a given quality or characteristic of the TTS output. Consequently, the evaluation approach is based on subjective assessment methods present in the literature; each test is to be performed by presenting to testers the outputs of one or more SSML documents fed into a synthesis processor. The outputs are to be judged with respect to the perceived (or just detected) subjective effect. This approach is an adaptation of the classical "Paired Comparisons" (PC) method (see [PM], [PC]). Normally, in PC the respondents are presented with two objects at a time and asked to pick the one they prefer or the one that has the higher level of a given attribute. In this case, the task of ranking objects with respect to one feature is not difficult, since only a few characteristics are compared at a time. The simplifications applied were in line with the general recommendation to keep the features to be compared as isolated as possible, since as the number of items increases, the number of comparisons increases geometrically (n*(n -1)/2), and if the number of comparisons is too great, testers may be stressed and no longer carefully discriminate among them.

There are four separate assumptions:

The tests fall into three categories based on how they are to be executed: by giving an "absolute" answer to a single test; by comparing a reference test (raw text or not marked-up text rendered by a synthesis processor) and a marked test (marked-up text with SSML elements); and by comparing two or more marked tests with different features to be assessed. The third category requires by far the most demanding setup and care in the articulation of the tasks; it has therefore been used for only the minimum number of test cases possible in order to avoid misleading or out of scope effects (i.e. a possible nuisance effect on the scale of value for some specific synthesis processors or for a given language).
In the interest of semplicity, the evaluation results are expressed as either "pass" or "fail"
The tests are classified according to the difficulty of evaluation: a) The ones that generate an effect that is easily detected by a single tester and b) The ones that measure the effect of features which are subject to individual differences. This reduced perception reliability may also result in a single tester generating a less precise answer. In these cases, repeated test sessions from a single user or the use of a panel of testers is suggested. There are also tests that produce effects on processor output that only an expert can assess properly: hus the use of expert or well trained testers is recommended.
Almost all the test cases simply result in a series of comparisons, including tests which are intended to distinguish among an ordinal scale of values (i.e. monotonically non-decreasing scale).

These assumptions are reflected in the implementation test table (see below) in the form of additional labels for each test assertion.

7.3 Test assertions labels

Based on the method described above, the following dimensions of classification were applied to define a two-fold test classification:

7.3.1 Test Class

Test class identifies a classification of the test assertion based on its testing complexity. A test assertion may belong to one of three possible cases:

Absolute Rating : A single SSML document is sufficient to test the implementation of a particular feature (to highlight a marked-up behavior of the Specification);
Simple Paired comparison : A comparison between a raw text and a marked-up SSML document (to highlight the marked-up behavior)is needed;
Multiple Paired comparison : Multiple comparisons are needed on the marked-up SSML document. In the actual implementation this has been restricted to a binary comparison between two marked-up SSML documents, so that a scale is tested by binary tests of increasing values.

7.3.2 Test Level

Test level characterises a test assertion according to the easiness of discrimination of the audio feature under test.

Simple: The feature under test generates an effect that is easily detected by a single tester;
Medium: The feature under test generates an effect that may be difficult to perceive. It is suggested a panel of evaluators or repeat test sessions from a single tester may be required to produce an accurate result;
Complex: The feature under test requires expert assessment.

8. Test Results

The following table lists all the assertions that were derived from the SSML 1.0 Specification.

The Assert ID column uniquely identifies the assertion and is linked to the corresponding test. The Spec column identifies the section of the SSML 1.0 Specification from which the assertion was derived.

The Required column indicates whether or not the SSML 1.0 Specification requires the synthesis processor to implement the feature described by the test assertion. If a test assertion is optional, then in the Required column the indicated value is "No", followed by the string "(1 test)" or "(2 tests)" if the test assertion can be successfully tested by one or two tests.

The Manual column indicates whether or not the associated test requires an adaptation to the testing environment. If the test assertion is considered manual, then in the instructions section of the associated test more details about how the test must be modified are reported (see Appendix A.1).

Test Class and Test Level are described in the Test Assertions Label paragraph. Finally the Assertion column describes the assertion. The Results column tabulates the results submitted by the SSML implementers enumerated in Section 6, Systems.

Assert ID	Spec	Required	Manual	Test Class	Test Level	Assertion	Results
							Pass	Fail	N/I
290	[2.1]	Yes	No	Abs_Rating	Simple	The meta element must occur before all other elements and text contained within the root speak element.	4	1	0
291	[2.1]	Yes	No	Abs_Rating	Simple	metadata elements must occur before all other elements and text contained within the root speak element.	4	1	0
292	[2.1]	Yes	Yes	Abs_Rating	Simple	lexicon elements must occur before all other elements and text contained within the root speak element.	4	1	0
63	[3.1.1]	Yes	No	Abs_Rating	Simple	The version number for this specification is 1.0.	5	0	0
79	[3.1.1]	Yes	No	Abs_Rating	Simple	The xml:lang attribute is required on the element.	5	0	0
80	[3.1.1]	Yes	Yes	Abs_Rating	Simple	The xml:base attribute may be present on the element.	5	0	0
81	[3.1.1]	Yes	No	Abs_Rating	Simple	The version attribute must be present on the element.	5	0	0
22	[3.1.10]	Yes	No	Abs_Rating	Simple	The sub element is employed to indicate that the specified text replaces the contained text for pronunciation.	5	0	0
23	[3.1.10]	Yes	No	Abs_Rating	Simple	The alias attribute is required	5	0	0
24	[3.1.10]	Yes	No	Abs_Rating	Simple	The alias attribute specifies the string to be substituted for the enclosed string	5	0	0
25	[3.1.10]	No (1 test)	No	Simple_Pair_Comp	Simple	The processor should apply text normalization to the alias value	4	0	1
26	[3.1.10]	Yes	No	Abs_Rating	Simple	No elements can occur within the content of the sub	5	0	0
85	[3.1.2]	Yes	No	Abs_Rating	Simple	Language information is inherited down the document hierarchy.	4	0	1
86	[3.1.2]	Yes	Yes	Abs_Rating	Simple	Language information nests, i.e. inner attributes overwrite outer attributes.	4	0	1
87	[3.1.2]	No (1 test)	No	Abs_Rating	Simple	No change in the voice or prosody should occur if the xml:lang value is the same as the inherited value.	4	0	1
88	[3.1.2]	No (1 test)	Yes	Abs_Rating	Simple	All elements should process their contents specific to the enclosing language.	4	0	1
91	[3.1.3]	Yes	Yes	Abs_Rating	Simple	The base URI declaration affects the interpretation of a relative URI specified by the audio element's source attribute.	3	0	2
92	[3.1.3]	Yes	Yes	Mult_Pair_Comp	Simple	The base URI declaration affects the interpretation of a relative URI specified by the lexicon element's uri attribute.	3	0	2
94	[3.1.3]	Yes	Yes	Abs_Rating	Simple	When both are available, the base URI is defined by xml:base instead of by meta data discovered during a protocol interaction.	3	0	2
95	[3.1.3]	Yes	Yes	Abs_Rating	Simple	When both are available, the base URI is defined by xml:base instead of by the current document.	3	0	2
96	[3.1.3]	Yes	Yes	Abs_Rating	Simple	When both are available, the base URI is defined by meta data discovered during a protocol interaction instead of by the current document.	2	0	3
98	[3.1.4]	Yes	Yes	Mult_Pair_Comp	Simple	The pronunciation information contained within a lexicon document is used for words defined within the enclosing document.	3	0	2
100	[3.1.4]	Yes	Yes	Mult_Pair_Comp	Simple	Any number of lexicon elements may occur as immediate children of the speak element.	3	0	2
271	[3.1.5]	Yes	Yes	Abs_Rating	Simple	The seeAlso property of name attribute is used to specify a resource that might provide additional metadata information about the content.	5	0	0
274	[3.1.5]	Yes	No	Abs_Rating	Simple	Either a name or http-equiv attribute is required.	3	1	1
276	[3.1.5]	Yes	No	Abs_Rating	Simple	It is an error to provide both name and http-equiv attributes.	2	1	2
27	[3.1.6]	Yes	Yes	Abs_Rating	Simple	The metadata element is a container in which information about the document can be placed using a metadata schema	5	0	0
28	[3.1.6]	Yes	No	Abs_Rating	Simple	Any metadata schema can be used with metadata, but it is recommended that the Resource Description Format (RDF) schema [RDF-SCHEMA] be used in conjunction with the general metadata properties defined in the Dublin Core Metadata Initiative [DC]	3	1	1
29	[3.1.7]	Yes	No	Simple_Pair_Comp	Simple	A p element represents the paragraph structure in text	4	0	1
30	[3.1.7]	Yes	No	Simple_Pair_Comp	Simple	A s element represents the sentence structure in text	4	0	1
33	[3.1.7]	Yes	Yes	Simple_Pair_Comp	Simple	The attribute "xml:lang" of paragraph or p element specifies in which language the paragraph must be performed	4	0	1
34	[3.1.7]	Yes	Yes	Simple_Pair_Comp	Simple	The attribute xml:lang of s element specifies in which language the paragraph must be performed	4	0	1
35	[3.1.7]	Yes	No	Simple_Pair_Comp	Simple	The audio element can occur into p element	4	0	1
36	[3.1.7]	Yes	No	Simple_Pair_Comp	Simple	The break element can occur into p element	4	0	1
44	[3.1.7]	Yes	No	Simple_Pair_Comp	Simple	The emphasis element can occur into a p element	4	0	1
45	[3.1.7]	Yes	No	Simple_Pair_Comp	Simple	The mark element can occur in a p element	4	0	1
46	[3.1.7]	Yes	Yes	Simple_Pair_Comp	Simple	The phoneme element can occur into a p element	4	0	1
47	[3.1.7]	Yes	Yes	Simple_Pair_Comp	Simple	The say-as element can occur into a p element	4	0	1
48	[3.1.7]	Yes	No	Simple_Pair_Comp	Simple	The sub element can occur into a p element	5	0	0
49	[3.1.7]	Yes	No	Simple_Pair_Comp	Medium	The s element can occur into p element	5	0	0
50	[3.1.7]	Yes	No	Simple_Pair_Comp	Simple	The voice element element can occur into a p element	4	0	1
51	[3.1.7]	Yes	No	Simple_Pair_Comp	Simple	The audio element can occur in a s element	4	0	1
52	[3.1.7]	Yes	No	Simple_Pair_Comp	Simple	The break element can occur in a s element	4	0	1
54	[3.1.7]	Yes	No	Simple_Pair_Comp	Simple	The emphasis element can occur in a s element	4	0	1
55	[3.1.7]	Yes	No	Simple_Pair_Comp	Simple	The mark element can occur in a s element	4	0	1
56	[3.1.7]	Yes	Yes	Simple_Pair_Comp	Simple	The phoneme element can occur in a s element	4	0	1
57	[3.1.7]	Yes	Yes	Simple_Pair_Comp	Simple	The say-as element can occur in a s element	4	0	1
58	[3.1.7]	Yes	No	Simple_Pair_Comp	Simple	The sub element can occur in a s element	5	0	0
60	[3.1.7]	Yes	No	Simple_Pair_Comp	Simple	The voice element can occur in a s element	4	0	1
255	[3.1.7]	Yes	No	Simple_Pair_Comp	Simple	The prosody element can occur in a s element	4	0	1
256	[3.1.7]	Yes	No	Simple_Pair_Comp	Simple	The prosody element can occur in a p element	4	0	1
3	[3.1.8]	Yes	Yes	Simple_Pair_Comp	Simple	When the value for the interpret-as attribute is unknown or unsupported by a processor, it must render the contained text as if no interpret-as value were specified.	4	0	1
4	[3.1.8]	Yes	Yes	Mult_Pair_Comp	Simple	When the value for the format attribute is unknown or unsupported by a processor, it must render the contained text as if no format value were specified.	5	0	0
5	[3.1.8]	Yes	Yes	Abs_Rating	Simple	The interpret-as attribute is always required.	5	0	0
6	[3.1.8]	No (1 test)	Yes	Abs_Rating	Simple	The 'format' attribute is optional.	5	0	0
8	[3.1.8]	Yes	Yes	Abs_Rating	Simple	When the content of the element contains other text in addition to the indicated content type, the synthesis processor must attempt to render such text.	5	0	0
9	[3.1.8]	No (1 test)	Yes	Abs_Rating	Simple	A synthesis processor should pronounce the contained text in a manner in which such content is normally produced for the language.	3	0	2
10	[3.1.8]	Yes	Yes	Simple_Pair_Comp	Simple	The detail attribute can be used for all say-as interpret-as types.	4	0	1
11	[3.1.8]	Yes	Yes	Simple_Pair_Comp	Simple	Every value of the detail attribute must render all of the informational content in the contained text.	3	0	2
12	[3.1.8]	Yes	Yes	Simple_Pair_Comp	Simple	If the detail attribute is not specified, the level of detail that is produced by the synthesis processor depends on the text content and the language.	3	0	2
13	[3.1.8]	Yes	Yes	Mult_Pair_Comp	Simple	When the value for the detail attribute is unknown or unsupported by a processor, it must render the contained text as if no value were specified for the detail attribute	4	0	1
142	[3.1.8]	Yes	Yes	Abs_Rating	Simple	No elements can occur within the content of element	5	0	0
294	[3.1.8]	Yes	Yes	Abs_Rating	Simple	When the content of the say-as element contains additional text next to the content that is in the indicated format and interpret-as type, then this additional text MUST be rendered.	4	0	1
295	[3.1.8]	Yes	Yes	Abs_Rating	Simple	When the content of the say-as element contains no content in the indicated interpret-as type or format, the processor must render the content as if the attributes are not present.	5	0	0
296	[3.1.8]	No (1 test)	Yes	Abs_Rating	Simple	When the content of the say-as element contains no content in the indicated interpret-as type or format the processor should notify the environment of the mismatch.	3	1	1
14	[3.1.9]	Yes	Yes	Mult_Pair_Comp	Simple	The phoneme element may be empty.	4	0	1
15	[3.1.9]	Yes	Yes	Simple_Pair_Comp	Simple	The phoneme element provides a phonetic pronunciation for the contained text.	4	0	1
16	[3.1.9]	Yes	Yes	Abs_Rating	Simple	The ph attribute is a required attribute that specifies the phoneme string.	5	0	0
18	[3.1.9]	Yes	Yes	Mult_Pair_Comp	Simple	The alphabet attribute is an optional attribute that specifies the phonetic alphabet. The default value is processor-specific.	5	0	0
19	[3.1.9]	No (1 test)	No	Simple_Pair_Comp	Simple	Synthesis processors should support a value for alphabet of "ipa", corresponding to characters composing the International Phonetic Alphabet [IPA].	3	0	2
20	[3.1.9]	Yes	Yes	Abs_Rating	Simple	It is an error if a value for alphabet is specified that is not known or cannot be applied by a synthesis processor.	5	0	0
21	[3.1.9]	Yes	Yes	Abs_Rating	Simple	No elements can occur within the content of the phoneme element	5	0	0
297	[3.1.9]	Yes	Yes	Abs_Rating	Simple	The only valid values for the alphabet attribute are "ipa" and vendor-defined strings of the form "x-organization" or "x-organization-alphabet"	5	0	0
298	[3.1.9]	No (2 tests)	Yes	Abs_Rating	Simple	For processors that support IPA, the processor must syntactically accept all legal ph values.	4	0	1
299	[3.1.9]	No (1 test)	No	Abs_Rating	Simple	For processors supporting the IPA alphabet, the processor should produce output when given unicode IPA codes that can reasonably be considered to belong to the current language.	3	0	2
129	[3.2.1]	Yes	No	Abs_Rating	Simple	gender: attribute indicating the preferred gender of the voice to speak the contained text. value : "female".	4	0	1
130	[3.2.1]	Yes	No	Abs_Rating	Simple	gender: attribute indicating the preferred gender of the voice to speak the contained text. value : "male".	4	0	1
131	[3.2.1]	Yes	No	Abs_Rating	Simple	gender: attribute indicating the preferred gender of the voice to speak the contained text. value : "neutral".	3	0	2
132	[3.2.1]	Yes	No	Abs_Rating	Simple	#age: attribute indicating the preferred age of the voice to speak the contained text. integer value : 5	3	0	2
133	[3.2.1]	Yes	Yes	Mult_Pair_Comp	Simple	variant: attribute indicating a preferred variant of the other voice characteristics to speak the contained text. Integer value : manual	2	0	3
134	[3.2.1]	Yes	Yes	Abs_Rating	Simple	name: attribute indicating a platform-specific voice name to speak the contained text. value : manual.	4	0	1
135	[3.2.1]	Yes	Yes	Abs_Rating	Simple	name: attribute indicating a platform-specific voice name to speak the contained text. The value may be a space-separated list of names. Value : manual.	3	0	2
137	[3.2.1]	Yes	Yes	Abs_Rating	Simple	xml:lang: If a voice is available for a requested xml:lang, a synthesis processor must use it.	4	0	1
138	[3.2.1]	No (1 test)	Yes	Abs_Rating	Simple	If there is no voice available for the requested xml:lang, the processor should select a voice that is closest to the requested language (e.g. a variant or dialect of the same language).	3	0	2
139	[3.2.1]	Yes	Yes	Abs_Rating	Simple	xml:lang: It is an error if the processor decides it does not have a voice that sufficiently matches the language criteria.	3	1	1
278	[3.2.1]	Yes	No	Abs_Rating	Simple	Although each attribute individually is optional, at least one must be specified any time the voice element is used.	4	0	1
279	[3.2.1]	No (1 test)	Yes	Mult_Pair_Comp	Simple	Relative changes in prosodic parameters should be carried across voice changes. Test with pitch attribute.	3	1	1
280	[3.2.1]	No (1 test)	Yes	Mult_Pair_Comp	Simple	Relative changes in prosodic parameters should be carried across voice changes. Test with range attribute.	3	1	1
281	[3.2.1]	No (1 test)	Yes	Mult_Pair_Comp	Simple	Relative changes in prosodic parameters should be carried across voice changes. Test with rate attribute.	3	1	1
282	[3.2.1]	No (1 test)	Yes	Mult_Pair_Comp	Simple	Relative changes in prosodic parameters should be carried across voice changes. Test with volume attribute.	4	0	1
110	[3.2.2]	Yes	No	Simple_Pair_Comp	Complex	level: the level attribute indicates the strength of emphasis to be applied. Value : "moderate".	4	0	1
123	[3.2.2]	Yes	No	Mult_Pair_Comp	Complex	level: the level attribute indicates the strength of emphasis to be applied. The default level is "moderate"	4	0	1
124	[3.2.2]	Yes	No	Mult_Pair_Comp	Complex	level: the level attribute indicates the strength of emphasis to be applied. "strong" >= "moderate"	3	0	2
125	[3.2.2]	Yes	No	Simple_Pair_Comp	Complex	level: the optional level attribute indicates the strength of emphasis to be applied. Value : "strong".	4	0	1
126	[3.2.2]	Yes	No	Mult_Pair_Comp	Simple	level: The "none" level is used to prevent the speech synthesis processor from emphasizing words that it might typically emphasize.	3	0	2
127	[3.2.2]	Yes	Yes	Simple_Pair_Comp	Simple	comparison with/without 'none' emphasis on specified sentences. must be customized by hand : IR participant must write a sentence where their TTS automatically puts emphasis	2	0	3
128	[3.2.2]	Yes	Yes	Simple_Pair_Comp	Complex	level: The "reduced" level is effectively the opposite of emphasizing a word.	3	0	2
277	[3.2.2]	Yes	No	Mult_Pair_Comp	Complex	In this test we compare the effect of emphasis/none versus emphasis/moderate.	4	0	1
39	[3.2.3]	Yes	No	Mult_Pair_Comp	Simple	A break with no attributes, must produce a break with a prosodic strength greater than that which the processor would otherwise have used if no break element was supplied.	4	0	1
40	[3.2.3]	Yes	No	Abs_Rating	Simple	time and strength: The time and strength attributes are optional for the break element.	4	0	1
61	[3.2.3]	Yes	No	Abs_Rating	Simple	The break element must always be empty.	5	0	0
242	[3.2.3]	Yes	No	Abs_Rating	Simple	strength: legal value: "x-strong"	4	0	1
243	[3.2.3]	Yes	No	Abs_Rating	Simple	strength: legal value: "strong"	4	0	1
244	[3.2.3]	Yes	No	Abs_Rating	Simple	strength: legal value: "medium"	4	0	1
245	[3.2.3]	Yes	No	Abs_Rating	Simple	strength: legal value: "weak"	4	0	1
246	[3.2.3]	Yes	No	Abs_Rating	Simple	strength: legal value: "x-weak"	4	0	1
247	[3.2.3]	Yes	No	Abs_Rating	Simple	strength: legal value: "none"	4	0	1
248	[3.2.3]	Yes	No	Mult_Pair_Comp	Simple	strength: default value == medium	3	0	2
249	[3.2.3]	Yes	No	Abs_Rating	Simple	time: legal value in seconds "s"	5	0	0
250	[3.2.3]	Yes	No	Abs_Rating	Simple	time: legal value in milliseconds "ms"	5	0	0
251	[3.2.3]	Yes	No	Mult_Pair_Comp	Medium	strength: comparative test, "weak" equal to or stronger than "x-weak"	3	0	2
252	[3.2.3]	Yes	No	Mult_Pair_Comp	Medium	strength: comparative test, "medium" equal to or stronger than "weak"	3	0	2
253	[3.2.3]	Yes	No	Mult_Pair_Comp	Medium	strength: comparative test, "strong" equal to or stronger than "medium"	3	0	2
254	[3.2.3]	Yes	No	Mult_Pair_Comp	Medium	strength: comparative test, "x-strong" equal to or stronger than "strong"	3	0	2
293	[3.2.3]	Yes	No	Abs_Rating	Simple	If both 'strength' and 'time' are supplied, the processor will insert a break with a duration as specified by the time attribute, with other prosodic changes in the output based on the value of the strength attribute.	4	0	1
300	[3.2.3]	No (1 test)	Yes	Simple_Pair_Comp	Simple	strength: comparative test, the value "none" indicates that no prosodic break boundary should be outputted, which can be used to prevent a prosodic break which the processor would otherwise produce.	4	0	1
62	[3.2.4]	Yes	No	Abs_Rating	Simple	Although each attribute individually is optional, at least one must be specified.	4	1	0
143	[3.2.4]	Yes	No	Abs_Rating	Simple	pitch: legal value, number followed by "Hz"	5	0	0
144	[3.2.4]	Yes	No	Abs_Rating	Simple	pitch: legal value, relative positive change, "+" number followed by "Hz"	5	0	0
145	[3.2.4]	Yes	No	Abs_Rating	Simple	pitch: legal value, relative negative change, "-" number followed by "Hz"	5	0	0
146	[3.2.4]	Yes	No	Abs_Rating	Simple	pitch: legal value, relative negative semitone change, "-" number followed by "st"	5	0	0
147	[3.2.4]	Yes	No	Abs_Rating	Simple	pitch: legal value, relative positive semitone change, "+" number followed by "st"	5	0	0
148	[3.2.4]	Yes	No	Abs_Rating	Simple	pitch: legal value: "x-high"	5	0	0
149	[3.2.4]	Yes	No	Abs_Rating	Simple	pitch: legal value: "high"	5	0	0
150	[3.2.4]	Yes	No	Abs_Rating	Simple	pitch: legal value: "medium"	5	0	0
151	[3.2.4]	Yes	No	Abs_Rating	Simple	pitch: legal value: "low"	5	0	0
152	[3.2.4]	Yes	No	Abs_Rating	Simple	pitch: legal value: "x-low"	5	0	0
153	[3.2.4]	Yes	No	Abs_Rating	Simple	pitch: legal value: "default"	5	0	0
154	[3.2.4]	Yes	No	Simple_Pair_Comp	Simple	pitch: comparative test, no pitch is equal to "default"	5	0	0
155	[3.2.4]	Yes	No	Mult_Pair_Comp	Simple	pitch: comparative test, "low" higher or equal then "x-low"	4	0	1
156	[3.2.4]	Yes	No	Mult_Pair_Comp	Simple	pitch: comparative test, "medium" higher or equal then "low"	4	0	1
157	[3.2.4]	Yes	No	Mult_Pair_Comp	Simple	pitch: comparative test, "high" higher than or equal to "medium"	4	0	1
158	[3.2.4]	Yes	No	Mult_Pair_Comp	Simple	pitch: comparative test, "x-high" higher than or equal to "high"	4	0	1
159	[3.2.4]	Yes	No	Abs_Rating	Complex	range: legal value, number followed by "Hz"	5	0	0
160	[3.2.4]	Yes	No	Abs_Rating	Complex	range: legal value, relative positive change, "+" number followed by "Hz"	5	0	0
161	[3.2.4]	Yes	No	Abs_Rating	Complex	range: legal value, relative negative change, "-" number followed by "Hz"	5	0	0
162	[3.2.4]	Yes	No	Abs_Rating	Complex	range: legal value, relative negative semitone change, "-" number followed by "st"	5	0	0
163	[3.2.4]	Yes	No	Abs_Rating	Complex	range: legal value, relative positive semitone change, "+" number followed by "st"	5	0	0
164	[3.2.4]	Yes	No	Abs_Rating	Complex	range: legal value: "x-high"	5	0	0
165	[3.2.4]	Yes	No	Abs_Rating	Complex	range: legal value: "high"	5	0	0
166	[3.2.4]	Yes	No	Abs_Rating	Complex	range: legal value: "medium"	5	0	0
167	[3.2.4]	Yes	No	Abs_Rating	Complex	range: legal value: "low"	5	0	0
168	[3.2.4]	Yes	No	Abs_Rating	Complex	range: legal value: "x-low"	5	0	0
169	[3.2.4]	Yes	No	Abs_Rating	Complex	range: legal value: "default"	5	0	0
170	[3.2.4]	Yes	No	Simple_Pair_Comp	Complex	range: comparative test, no range is equal to "default"	5	0	0
171	[3.2.4]	Yes	No	Mult_Pair_Comp	Complex	range: comparative test, "low" higher than or equal to "x-low"	4	0	1
172	[3.2.4]	Yes	No	Mult_Pair_Comp	Complex	range: comparative test, "medium" higher than or equal to "low"	4	0	1
173	[3.2.4]	Yes	No	Mult_Pair_Comp	Complex	range: comparative test, "high" higher than or equal to "medium"	4	0	1
174	[3.2.4]	Yes	No	Mult_Pair_Comp	Complex	range: comparative test, "x-high" higher than or equal to "high"	4	0	1
175	[3.2.4]	Yes	No	Simple_Pair_Comp	Medium	rate: legal value, fast multiplier (number greater than 1)	4	0	1
178	[3.2.4]	Yes	No	Abs_Rating	Simple	rate: legal value: "x-fast"	5	0	0
179	[3.2.4]	Yes	No	Abs_Rating	Simple	rate: legal value: "fast"	5	0	0
180	[3.2.4]	Yes	No	Abs_Rating	Simple	rate: legal value: "medium"	5	0	0
181	[3.2.4]	Yes	No	Abs_Rating	Simple	rate: legal value: "slow"	5	0	0
182	[3.2.4]	Yes	No	Abs_Rating	Simple	rate: legal value: "x-slow"	5	0	0
183	[3.2.4]	Yes	No	Abs_Rating	Simple	rate: legal value: "default"	5	0	0
184	[3.2.4]	Yes	No	Simple_Pair_Comp	Medium	rate: comparative test, no rate is equal to "default"	5	0	0
185	[3.2.4]	Yes	No	Mult_Pair_Comp	Medium	rate: comparative test, "slow" faster than or equal to "x-slow"	4	0	1
186	[3.2.4]	Yes	No	Mult_Pair_Comp	Medium	rate: comparative test, "medium" faster than or equal to "slow"	4	0	1
187	[3.2.4]	Yes	No	Mult_Pair_Comp	Medium	rate: comparative test, "fast" faster than or equal to "medium"	4	0	1
188	[3.2.4]	Yes	No	Mult_Pair_Comp	Medium	rate: comparative test, "x-fast" faster than or equal to "fast"	4	0	1
189	[3.2.4]	Yes	No	Abs_Rating	Simple	duration: legal value in seconds	4	0	1
190	[3.2.4]	Yes	No	Abs_Rating	Simple	duration: legal value in milliseconds	4	0	1
191	[3.2.4]	Yes	No	Abs_Rating	Simple	volume: legal value, a number	5	0	0
192	[3.2.4]	Yes	No	Abs_Rating	Simple	volume: legal value, relative positive change, "+" number	5	0	0
193	[3.2.4]	Yes	No	Abs_Rating	Simple	volume: legal value, relative negative change, "-" number	5	0	0
194	[3.2.4]	Yes	No	Abs_Rating	Simple	volume: legal value: "x-loud"	5	0	0
195	[3.2.4]	Yes	No	Abs_Rating	Simple	volume: legal value: "loud"	5	0	0
196	[3.2.4]	Yes	No	Abs_Rating	Simple	volume: legal value: "medium"	5	0	0
197	[3.2.4]	Yes	No	Abs_Rating	Simple	volume: legal value: "soft"	5	0	0
198	[3.2.4]	Yes	No	Abs_Rating	Simple	volume: legal value: "x-soft"	5	0	0
199	[3.2.4]	Yes	No	Abs_Rating	Simple	volume: legal value: "silent"	5	0	0
200	[3.2.4]	Yes	No	Abs_Rating	Simple	volume: legal value: "default"	5	0	0
201	[3.2.4]	Yes	No	Simple_Pair_Comp	Medium	volume: comparative test, no volume is equal to "default"	5	0	0
202	[3.2.4]	Yes	No	Mult_Pair_Comp	Simple	volume: comparative test, "default" is equal to one hundred	4	1	0
203	[3.2.4]	Yes	No	Mult_Pair_Comp	Simple	volume: comparative test, "silent" is equal to zero	5	0	0
204	[3.2.4]	Yes	No	Mult_Pair_Comp	Medium	volume: comparative test, "soft" louder than or equal to "x-soft"	4	0	1
205	[3.2.4]	Yes	No	Mult_Pair_Comp	Medium	volume: comparative test, "medium" louder than or equal to "soft"	4	0	1
206	[3.2.4]	Yes	No	Mult_Pair_Comp	Medium	volume: comparative test, "loud" louder than or equal to "medium"	4	0	1
207	[3.2.4]	Yes	No	Mult_Pair_Comp	Medium	volume: comparative test, "x-loud" louder than or equal to "loud"	4	0	1
208	[3.2.4]	Yes	No	Abs_Rating	Simple	contour: legal value, relative negative percentage change, "-" number followed by "%"	5	0	0
209	[3.2.4]	Yes	No	Abs_Rating	Simple	contour: legal value, number followed by "Hz"	5	0	0
210	[3.2.4]	Yes	No	Abs_Rating	Simple	contour: legal value, relative percentage change, number followed by "%"	5	0	0
211	[3.2.4]	Yes	No	Abs_Rating	Simple	contour: legal value, relative positive percentage change, "+" number followed by "%"	5	0	0
212	[3.2.4]	Yes	No	Abs_Rating	Simple	contour: legal value, relative positive change, "+" number followed by "Hz"	5	0	0
213	[3.2.4]	Yes	No	Abs_Rating	Simple	contour: legal value, relative negative change, "-" number followed by "Hz"	5	0	0
215	[3.2.4]	Yes	No	Abs_Rating	Simple	contour: legal value, relative negative semitone change, "-" number followed by "st"	5	0	0
216	[3.2.4]	Yes	No	Abs_Rating	Simple	contour: legal value, relative positive semitone change, "+" number followed by "st"	5	0	0
217	[3.2.4]	Yes	No	Abs_Rating	Simple	contour: legal value: "x-high"	5	0	0
218	[3.2.4]	Yes	No	Abs_Rating	Simple	contour: legal value: "high"	5	0	0
219	[3.2.4]	Yes	No	Abs_Rating	Simple	contour: legal value: "medium"	5	0	0
220	[3.2.4]	Yes	No	Abs_Rating	Simple	contour: legal value: "low"	5	0	0
221	[3.2.4]	Yes	No	Abs_Rating	Simple	contour: legal value: "x-low"	5	0	0
222	[3.2.4]	Yes	No	Abs_Rating	Simple	contour: legal value: "default"	5	0	0
223	[3.2.4]	Yes	No	Mult_Pair_Comp	Complex	contour: comparative test, time positions less then 0% are ignored	4	0	1
224	[3.2.4]	Yes	No	Mult_Pair_Comp	Complex	contour: comparative test, time positions greater then 100% are ignored	4	0	1
225	[3.2.4]	Yes	No	Abs_Rating	Complex	contour: comparative test, relative values for pitch are relative to the pitch just before the contained text	4	0	1
228	[3.2.4]	Yes	No	Mult_Pair_Comp	Complex	contour: comparative test, contour takes precedence over pitch	4	0	1
229	[3.2.4]	Yes	No	Mult_Pair_Comp	Complex	contour: comparative test, contour takes precedence over range	4	0	1
230	[3.2.4]	Yes	No	Abs_Rating	Simple	pitch: legal value, relative percentage change, number followed by "%"	5	0	0
231	[3.2.4]	Yes	No	Abs_Rating	Simple	pitch: legal value, relative positive percentage change, "+" number followed by "%"	5	0	0
232	[3.2.4]	Yes	No	Abs_Rating	Simple	pitch: legal value, relative negative percentage change, "-" number followed by "%"	5	0	0
233	[3.2.4]	Yes	No	Abs_Rating	Complex	range: legal value, relative percentage change, number followed by "%"	5	0	0
234	[3.2.4]	Yes	No	Abs_Rating	Complex	range: legal value, relative positive percentage change, "+" number followed by "%"	5	0	0
235	[3.2.4]	Yes	No	Abs_Rating	Complex	range: legal value, relative negative percentage change, "-" number followed by "%"	5	0	0
236	[3.2.4]	Yes	No	Abs_Rating	Simple	rate: legal value, relative percentage change, number followed by "%"	5	0	0
237	[3.2.4]	Yes	No	Abs_Rating	Simple	rate: legal value, relative positive percentage change, "+" number followed by "%"	5	0	0
238	[3.2.4]	Yes	No	Abs_Rating	Simple	rate: legal value, relative negative percentage change, "-" number followed by "%"	5	0	0
239	[3.2.4]	Yes	No	Abs_Rating	Simple	volume: legal value, relative percentage change, number followed by "%"	5	0	0
240	[3.2.4]	Yes	No	Abs_Rating	Simple	volume: legal value, relative positive percentage change, "+" number followed by "%"	5	0	0
241	[3.2.4]	Yes	No	Abs_Rating	Simple	volume: legal value, relative negative percentage change, "-" number followed by "%"	5	0	0
269	[3.2.4]	Yes	No	Abs_Rating	Simple	pitch: legal value, "Hz" is case-sensitive	5	0	0
283	[3.2.4]	Yes	No	Abs_Rating	Simple	pitch: legal value, "st" is case-sensitive	5	0	0
284	[3.2.4]	Yes	No	Abs_Rating	Simple	range: legal value, "Hz" is case-sensitive	5	0	0
285	[3.2.4]	Yes	No	Abs_Rating	Simple	range: legal value, "st" is case-sensitive	5	0	0
286	[3.2.4]	Yes	No	Simple_Pair_Comp	Medium	rate: legal value, slow multiplier (number lower than 1)	5	0	0
287	[3.2.4]	Yes	No	Abs_Rating	Simple	contour: legal value, "Hz" is case-sensitive	5	0	0
288	[3.2.4]	Yes	No	Abs_Rating	Simple	contour: legal value, "st" is case-sensitive	5	0	0
289	[3.2.4]	Yes	No	Mult_Pair_Comp	Simple	duration: comparative test, duration takes precedence over rate	3	0	2
301	[3.2.4]	No (1 test)	No	Abs_Rating	Medium	rate: the default rate for a voice should be such that it is experienced as a normal speaking rate for the voice when reading aloud text.	4	0	1
102	[3.3.1]	Yes	No	Abs_Rating	Simple	If text only output is not required, the processor must try to play the referenced audio document (Raw (headerless) 8kHz 8-bit mono mu-law [PCM] single channel)	4	0	1
103	[3.3.1]	Yes	No	Mult_Pair_Comp	Simple	If the audio document cannot be played and text only output is not required, the alternate content must be rendered. The alternate content may be empty.	5	0	0
104	[3.3.1]	Yes	No	Mult_Pair_Comp	Simple	If the alternate content contains an audio element that cannot be played, the processor must recursively attempt to find its alternate content to render.	4	0	1
106	[3.3.1]	No (1 test)	No	Abs_Rating	Simple	If the audio element is not successfully rendered, the synthesis processor should continue processing and should notify the hosting environment.	4	1	0
118	[3.3.1]	Yes	No	Mult_Pair_Comp	Simple	If the audio document cannot be played and text only output is not required, the alternate content must be rendered. The alternate content may contain text.	5	0	0
119	[3.3.1]	Yes	No	Mult_Pair_Comp	Simple	If the audio document cannot be played and text only output is not required, the alternate content must be rendered. The alternate content may contain other markup.	5	0	0
120	[3.3.1]	Yes	No	Mult_Pair_Comp	Simple	If the audio document cannot be played and text only output is not required, the alternate content must be rendered. The alternate content may contain desc elements.	5	0	0
121	[3.3.1]	Yes	No	Mult_Pair_Comp	Simple	If the audio document cannot be played and text only output is not required, the alternate content must be rendered. The alternate content may contain other audio elements.	4	0	1
257	[3.3.1]	Yes	No	Abs_Rating	Simple	If text only output is not required, the processor must try to play the referenced audio document (Raw (headerless) 8kHz 8-bit mono A-law [PCM] single channel)	4	0	1
258	[3.3.1]	Yes	No	Abs_Rating	Simple	If text only output is not required, the processor must try to play the referenced audio document (WAV (RIFF) 8kHz 8-bit mono mu-law [PCM] single channel)	4	0	1
259	[3.3.1]	Yes	No	Abs_Rating	Simple	If text only output is not required, the processor must try to play the referenced audio document (WAV (RIFF) 8kHz 8-bit mono A-law [PCM] single channel)	4	0	1
107	[3.3.2]	Yes	No	Simple_Pair_Comp	Simple	The mark element does not affect the speech output process. Test of markers at word boundaries.	5	0	0
108	[3.3.2]	Yes	Yes	Abs_Rating	Simple	When processing a mark element, the synthesis processor must do one or both of the following: (1) inform the hosting environment with the value of the name attribute and with information allowing the platform to retrieve the corresponding position in the rendered output or (2) when audio output of the SSML document reaches the mark, issue an event that includes the value of the name attribute. The processor must send the event to the destination specified by the hosting environment.	4	0	1
266	[3.3.2]	Yes	No	Simple_Pair_Comp	Simple	The mark element does not affect the speech output process Test of markers at the boundaries of the input text.	5	0	0
109	[3.3.3]	No (1 test)	Yes	Mult_Pair_Comp	Simple	If text only output is required, the content of the desc element(s) should be rendered instead of other alternative content.	1	0	4
268	[3.3.3]	No (1 test)	Yes	Mult_Pair_Comp	Simple	If text only output is required, the content of the desc element(s) should be rendered instead of other alternative content. The xml:lang attribute can be used to indicate that the content of the element is in a different language from that of the content surrounding the element.	1	0	4

9. References

[PM]: Psychometric methods, Guilford J.P., 1954, New York, McGraw-Hill.
[PC]: The method of paired comparisons for social values, Thurstone L.L., in "Journal of Social Psychology", 1927, nr. 11, pp. 384-400.

Appendices

Appendix A - Test assertion XML API definition

This appendix describes a lightweight framework for authoring SSML tests. The framework encourages a consistent format for writing tests by facilitating Absolute Rating, Simple Paired Comparison and Multiple Paired Comparison tests to be authored in a straightforward manner. By employing a stylesheet, vendors may adapt the framework to their own test infrastructure. For example, a test infrastructure may present the instruction for a test visually instead of via synthesised text.

The Test API consists of a superset of the schema specified in the SSML 1.0 Specification with the addition of a set of four elements in their own namespace (http://www.w3.org/2002/ssml-conformance). The main element of the test API is <conf:test>, it can contain the following elements containers:

A.1 Instruction
A.2 Reference markup
A.3 Test markup

A.1 Instruction

This <conf:instruction> element marks the instructions for the tester to successfully evaluate the test. For example, for a Simple Paired Comparison test, the instruction might describe the expected differences between the audio produced for the reference and markup test to be assessed as "pass". The instruction is written in plain text and is compulsory. If the test assertion is labelled as Manual, then the instruction will specify the adaptation of the testing environment that is required to execute the test. There are three kinds of possible adaptations:

The test contains the absolute URI: "http://www.loquendocafe.com/ssml/uploadfilessml/res/", that has to be adapted at the testing location. These tests are marked in the <conf:instruction> by "Manual='ABS_URI'".
The test is about a processor-specific feature. These tests are marked in the <conf:instruction> by "Manual='PLAT_DEP'".
The test is language dependent. These tests are marked in the <conf:instruction> by "Manual='LANG_DEP'".

A.2 Reference markup

This (optional) <conf:reference_markup> element is used to indicate the reference SSML document for Multiple Paired Comparison tests. If the element contains no children, the raw text from the test markup is used for the reference and may be used for comparison, i.e. for a Simple Paired Comparison. Otherwise the contained SSML markup is employed. Note that the SSML markup must include the <speak> element.

A.3 Test markup

The <conf:test_markup> element indicates the SSML test markup. This element always contains SSML markup and is compulsory. Note that (as with the reference) the SSML markup must include the <speak> element.

A.4 Document Type Definition

This section contains the DTD for the Test API markup. The Test API DTD imports the SSML 1.0 DTD and hence test documents may be validated directly against the Test API DTD.

<!-- Bring in the SSML DTD -->
<!ENTITY % synthesis.dtd
     PUBLIC "-//W3C//DTD SYNTHESIS 1.0//EN"
            "http://www.w3.org/TR/speech-synthesis/synthesis.dtd" >
%synthesis.dtd;

<!-- Control prefixing - can be 'switched off' in internal subset -->
<!ENTITY % Conf.prefixed "INCLUDE" >

<!-- Declare the actual namespace -->
<!ENTITY % Conf.xmlns "http://www.w3.org/2002/ssml-conformance" >

<!-- Declare the prefix -->
<!ENTITY % Conf.prefix "conf" >

<![%Conf.prefixed;[
<!ENTITY % Conf.pfx "%Conf.prefix;:" >
]]>
<!ENTITY % Conf.pfx "" >

<![%Conf.prefixed;[
<!ENTITY % Conf.xmlns.attrib
    "xmlns:%Conf.prefix; CDATA #FIXED '%Conf.xmlns;'"
>
]]>
<!ENTITY % Conf.xmlns.attrib
     "xmlns CDATA  #FIXED '%Conf.xmlns;'"
>

<!-- Qualified names -->
<!ENTITY % Conf.test.qname "%Conf.pfx;test" >
<!ENTITY % Conf.instruction.qname "%Conf.pfx;instruction" >
<!ENTITY % Conf.reference_markup.qname "%Conf.pfx;reference_markup" >
<!ENTITY % Conf.test_markup.qname "%Conf.pfx;test_markup" >

<!-- Define the content model -->
<!ELEMENT %Conf.test.qname;
    (%Conf.instruction.qname;,
    (%Conf.reference_markup.qname;)?,
    %Conf.test_markup.qname;) >

<!ATTLIST %Conf.test.qname; %Conf.xmlns.attrib; >

<!ELEMENT %Conf.instruction.qname; (#PCDATA) >

<!ELEMENT %Conf.reference_markup.qname; (speak)? >

<!ELEMENT %Conf.test_markup.qname; (speak) >

A.5 Test examples

The following examples illustrate the use of the Test API. Other different kind of test are reported in the table above. These examples were written to help validate the stylesheet (see section A.6) used to generate the tests.

Example 1 - Simple Paired Comparison

The following test illustrates a Simple Paired Comparison where the reference is constructed from the raw text generated from the test markup:

<?xml version="1.0" encoding="UTF-8"?>
<conf:test xmlns:conf="http://www.w3.org/2002/ssml-conformance">
   <conf:instruction>
      In order for the test to pass, the test audio
      should be louder than the reference audio
   </conf:instruction>

   <conf:reference_markup/>

   <conf:test_markup>
      <speak version="1.0" xml:lang="en-US"
       xmlns="http://www.w3.org/2001/10/synthesis">
         <prosody volume="loud">The cat jumped over the moon</prosody>
      </speak>
   </conf:test_markup>
</conf:test>

After transformation, the instruction markup is:

<?xml version="1.0" encoding="UTF-8"?>
<speak version="1.0" xml:lang="en-US"
 xmlns="http://www.w3.org/2001/10/synthesis">
   In order for the test to pass, the test audio
   should be louder than the reference audio
</speak>

After transformation, the reference markup is:

<?xml version="1.0" encoding="UTF-8"?>
<speak version="1.0" xml:lang="en-US"
 xmlns="http://www.w3.org/2001/10/synthesis">
   <metadata>
      <rdf:RDF xmlns:dc="http://purl.org/metadata/dublin_core#"
       xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
       xmlns:rdfs="http://www.w3.org/TR/1999/PR-rdf-schema-19990303#">
         <rdf:Description
          dc:Description="In order for the test to pass, the test audio
                          should be louder than the reference audio"/>
         </rdf:RDF>
   </metadata>
   The cat jumped over the moon
</speak>

After transformation, the test markup is:

<?xml version="1.0" encoding="UTF-8"?>
<speak version="1.0" xml:lang="en-US"
 xmlns="http://www.w3.org/2001/10/synthesis">
   <metadata>
      <rdf:RDF xmlns:dc="http://purl.org/metadata/dublin_core#"
       xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
       xmlns:rdfs="http://www.w3.org/TR/1999/PR-rdf-schema-19990303#">
         <rdf:Description
          dc:Description="In order for the test to pass, the test audio
                          should be louder than the reference audio"/>
      </rdf:RDF>
   </metadata>
   <prosody volume="loud">The cat jumped over the moon</prosody>
</speak>

Example 2 - Multiple Paired Comparison

The following test illustrates a Multiple Paired Comparison where the reference is constructed from the supplied SSML markup:

<?xml version="1.0" encoding="UTF-8"?>
<conf:test xmlns:conf="http://www.w3.org/2002/ssml-conformance">
   <conf:instruction>
      In order for the test to pass, the test audio
      should be louder than the reference audio
   </conf:instruction>

   <conf:reference_markup>
      <speak version="1.0" xml:lang="en-US"
       xmlns="http://www.w3.org/2001/10/synthesis">
         <prosody volume="soft">The cat jumped over the moon</prosody>
      </speak>
   </conf:reference_markup>

   <conf:test_markup>
      <speak version="1.0" xml:lang="en-US"
       xmlns="http://www.w3.org/2001/10/synthesis">
         <prosody volume="loud">The cat jumped over the moon</prosody>
      </speak>
   </conf:test_markup>
</conf:test>

After transformation, the instruction markup is:

<?xml version="1.0" encoding="UTF-8"?>
<speak version="1.0" xml:lang="en-US"
 xmlns="http://www.w3.org/2001/10/synthesis">
   In order for the test to pass, the test audio
   should be louder than the reference audio
</speak>

After transformation, the reference markup is:

<?xml version="1.0" encoding="UTF-8"?>
<speak version="1.0" xml:lang="en-US"
 xmlns="http://www.w3.org/2001/10/synthesis">
   <metadata>
      <rdf:RDF xmlns:dc="http://purl.org/metadata/dublin_core#"
       xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
       xmlns:rdfs="http://www.w3.org/TR/1999/PR-rdf-schema-19990303#">
         <rdf:Description
           dc:Description="In order for the test to pass, the test audio
                           should be louder than the reference audio"/>
      </rdf:RDF>
   </metadata>
   <prosody volume="soft">The cat jumped over the moon</prosody>
</speak>

After transformation, the test markup is:

<?xml version="1.0" encoding="UTF-8"?>
<speak version="1.0" xml:lang="en-US"
 xmlns="http://www.w3.org/2001/10/synthesis">
   <metadata>
      <rdf:RDF xmlns:dc="http://purl.org/metadata/dublin_core#"
       xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
       xmlns:rdfs="http://www.w3.org/TR/1999/PR-rdf-schema-19990303#">
         <rdf:Description
          dc:Description="In order for the test to pass, the test audio
                          should be louder than the reference audio"/>
      </rdf:RDF>
   </metadata>
   <prosody volume="loud">The cat jumped over the moon</prosody>
</speak>

Example 3 - Absolute Rating

The following test illustrates an Absolute Rating test where no reference markup is required:

<?xml version="1.0" encoding="UTF-8"?>
<conf:test xmlns:conf="http://www.w3.org/2002/ssml-conformance">
   <conf:instruction>
      This test should count from one to four
   </conf:instruction>

   <conf:test_markup>
      <speak version="1.0" xml:lang="en-US"
       xmlns="http://www.w3.org/2001/10/synthesis">
         <p>
            1
         </p>
         <p>
            <s>2</s>
         </p>

         <p>
            <s>3</s>
            <s>4</s>
         </p>
      </speak>
   </conf:test_markup>
</conf:test>

After transformation, the instruction markup is:

<?xml version="1.0" encoding="UTF-8"?>
<speak xmlns="http://www.w3.org/2001/10/synthesis"
 version="1.0" xml:lang="en-US" >
   This test should count from one to four
</speak>

After transformation, the test markup is:

<?xml version="1.0" encoding="UTF-8"?>
<speak version="1.0" xml:lang="en-US"
 xmlns="http://www.w3.org/2001/10/synthesis">
   <metadata>
      <rdf:RDF xmlns:dc="http://purl.org/metadata/dublin_core#"
       xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
       xmlns:rdfs="http://www.w3.org/TR/1999/PR-rdf-schema-19990303#">
         <rdf:Description
          dc:Description="This test should count from one to four"/>
      </rdf:RDF>
   </metadata>
   <p>
      1
   </p>
   <p>
      <s>2</s>
   </p>

   <p>
      <s>3</s>
      <s>4</s>
   </p>
</speak>

A.6 Sample XSLT Template Definition

The following is a listing of an XSLT that can be used to transform the Test API into valid SSML. The XSLT is parameterizable: the parameter "mode" may be set to one of "instruction", "reference", or "test".

<?xml version="1.0" encoding="UTF-8"?>
<!-- Copyright 1998-2004 W3C (MIT, ERCIM, Keio), All Rights Reserved. -->
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:ssml="http://www.w3.org/2001/10/synthesis"
                xmlns:conf="http://www.w3.org/2002/ssml-conformance"
                xmlns="http://www.w3.org/2001/10/synthesis"
                exclude-result-prefixes="ssml conf">

<!-- ################### -->
<!-- P a r a m e t e r s -->
<!-- ################### -->
<xsl:param name="mode" select="'test'"/>
   <!-- select = 'instruction', 'reference', or 'test' -->

<!-- ################ -->
<!-- T o p  L e v e l -->
<!-- ################ -->
<xsl:output method="xml" indent="yes" encoding="UTF-8"/>

<xsl:template match="/">
    <xsl:choose>
        <xsl:when test="$mode = 'instruction'">
            <xsl:apply-templates select="//conf:instruction"/>
        </xsl:when>
        <xsl:when test="$mode = 'reference'">
            <xsl:apply-templates select="//conf:reference_markup"/>

            <!-- For consistency, always return a valid SSML document -->
            <xsl:if test="count(//conf:reference_markup) = 0">
                <speak version="1.0">
                    <xsl:call-template name="meta"/>
                </speak>
            </xsl:if>
        </xsl:when>
        <xsl:when test="$mode = 'test'">
            <xsl:apply-templates select="//conf:test_markup"/>
        </xsl:when>
        <xsl:otherwise>
            Error - unknown mode type: <xsl:value-of select="$mode"/>
        </xsl:otherwise>
    </xsl:choose>
</xsl:template>

<!-- ##################### -->
<!-- I n s t r u c t i o n -->
<!-- ##################### -->
<xsl:template match="conf:instruction">
    <speak version="1.0" xml:lang="en-US">
        <xsl:value-of select="."/>
    </speak>
</xsl:template>

<!-- #################  -->
<!-- R e f e r e n c e  -->
<!-- #################  -->
<xsl:template match="conf:reference_markup">
    <xsl:choose>
        <xsl:when test="0 = count(child::*)">
            <speak version="1.0">
                <xsl:apply-templates
                 select="//conf:test_markup/ssml:speak/@xml:lang"/>
                <xsl:call-template name="meta"/>
                <xsl:value-of
                 select="normalize-space(//conf:test_markup/ssml:speak)"/>
            </speak>
        </xsl:when>
        <xsl:otherwise>
            <xsl:call-template name="copy_speak"/>
        </xsl:otherwise>
    </xsl:choose>
</xsl:template>

<!-- ################ -->
<!-- T e s t  S S M L -->
<!-- ################ -->
<xsl:template match="conf:test_markup">
    <xsl:call-template name="copy_speak"/>
</xsl:template>

<!-- ############################## -->
<!-- H e l p e r  T e m p l a t e s -->
<!-- ############################## -->
<!-- Copy the speak element -->
<xsl:template name="copy_speak">
          <xsl:element name="speak">
                  <xsl:apply-templates select="ssml:speak/@*"/>
                  <xsl:call-template name="meta"/>
                  <xsl:apply-templates select="ssml:speak/node()"/>
          </xsl:element>
</xsl:template>

<!-- Do copy without the namespace information duplicated -->
<xsl:template match="*">
        <xsl:element 
         name="{name()}"><xsl:apply-templates select="@* | node()"/>
        </xsl:element>
</xsl:template>
<xsl:template match="@*">
        <xsl:attribute name="{name()}">
                <xsl:value-of select="."/>
        </xsl:attribute>
</xsl:template>
<xsl:template match="text()">
        <xsl:value-of select="."/>
</xsl:template>

<!-- Meta data -->
<xsl:template name="meta">
    <metadata>
        <rdf:RDF
         xmlns:rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:rdfs = "http://www.w3.org/TR/1999/PR-rdf-schema-19990303#"
         xmlns:dc = "http://purl.org/metadata/dublin_core#">
            <xsl:element name="rdf:Description">
                <xsl:attribute name="dc:Description">
                    <xsl:copy-of
                     select="normalize-space(//conf:instruction)"/>
                </xsl:attribute>
            </xsl:element>
        </rdf:RDF>
    </metadata>
</xsl:template>

</xsl:stylesheet>

Appendix B - Downloading tests

The "ssml-ir-20040715.zip" archive contains a number of resources. The SSML tests are ordered by test assertion id and are organised into folders where the folder name corresponds to the assertion id. In addition the archive includes the following:

B.1 The Manifest
B.2 The Report Submission Template
B.3 The Stylesheet

B.1 The Manifest

"manifest.xml" is a file containing the complete information about test assertions written in the SSML Implementation Report project. The structure of the Manifest presents a root element called <tests> ; this is the container of all the Test Assertions. Every Test Assertion is represented by an <assertion> element containing CDATA that represents the description of the test assertion. At the end of the file is the <contribs> element; this lists all the people who have contributed to the Implementation Report preparation. The <assertion> element must contain a <start> element that references the main test file and may optionally contain several <dep> element that identify the other tests useful to complete the test case. Here's the DTD for the Manifest:

<!ELEMENT assertion (#PCDATA)>
<!ATTLIST assertion
    id CDATA #REQUIRED
    spec CDATA #REQUIRED
    test_level (Complex | Medium | Simple) #REQUIRED
    test_class (Abs_Rating | Mult_Pair_Comp | Simple_Pair_Comp) #REQUIRED
    exec_manual (PLAT_DEP | LANG_DEP | ABS_URI) #IMPLIED
    conf_level (Optional | Required) 'Required' #IMPLIED
>
<!ELEMENT contrib EMPTY>
<!ATTLIST contrib
    usr_fname CDATA #REQUIRED
    usr_lname CDATA #REQUIRED
    usr_email CDATA #REQUIRED
    usr_comp_name CDATA #REQUIRED
    editor CDATA #IMPLIED
>
<!ELEMENT contribs (contrib+)>
<!ELEMENT dep EMPTY>
<!ATTLIST dep
    uri CDATA #REQUIRED
    type CDATA #REQUIRED
>
<!ELEMENT start EMPTY>
<!ATTLIST start
    uri CDATA #REQUIRED
    type CDATA #REQUIRED
>
<!ELEMENT test (assertion, start, dep*)>
<!ELEMENT tests (test+, contribs)>

Test assertion typology is defined by several attributes on the <assertion> element. These attributes allow for a more complete identification of the nature of the current assertion and an idea of related tests structure.

id : Test Assertion Identifier;
spec: reference to paragraph belonging to the SSML 1.0 Specification;
test_class: information about the test class. In particular it can assume one of the possible values: “Abs_Rating”, “Simple_Pair_Comp”, “Mult_Pair_Comp”;
test_level: information about the test level. In particular it assumes one of the possible values: “Complex”, “Medium”, “Simple”;
exec_manual: this is present only if the test must be manually executed; in this case it can assume one of the following possible values: ABS_URI, LANG_DEP, PLAT_DEP, for details see Appendix A.1;
conf_level: information about the optional or required nature of the current test assertion; it can assume one of the possible values: “Optional”, “Required”.

<start> and <dep> elements are characterized by the following attributes:

uri: relative URI that links the referenced file;
type: referenced file mime type.

For instance here’s a fragment of the manifest.xml document:

<tests>
[…]
 <test>
  <assertion id="63" 
        spec="3.1.1" 
        test_level="Simple"
        test_class="Abs_Rating">
      The version number for this specification is 1.0.
  </assertion>
  <start uri="63/63.txml" type="text/x-txml"/>
 </test>
[…]
</tests>

Here’s another fragment of the manifest.xml document to show the use of the <dep> element:

<tests>
[…]
 <test>
  <assertion id="257" 
        spec="3.3.1" 
        test_level="Simple" 
        test_class="Abs_Rating">
      If text only output is not required, the processor must try
      to play the referenced audio document 
      (Raw (headerless) 8kHz 8-bit mono A-law [PCM] single channel)     
  </assertion>
  <start uri="257/ta_257.txml" type="text/x-txml"/>
  <dep uri="257/beep_a.raw" type="audio/basic"/>
 <test>
[…]
</tests>

The <tests> element contains the <contribs> element too: here you can find the generalities this contains the list of all the contributors that have participated in the Implementation Report activity.

B.2 The Report Submission Template

The template has to be filled by the company following the rules described in Section 3. An excerpt of the Template is shown below.

<system-report name="YOUR-SYSTEM-NAME-HERE">
<testimonial> YOUR-WELL-FORMED-TESTIMOMIAL-CONTENT-HERE</testimonial>
<assert id="Id1" res="pass|fail|not-impl">OPTIONAL-NOTES-HERE</assert>
        [...]
<assert id="Idn" res="pass|fail|not-impl">OPTIONAL-NOTES-HERE</assert>
</system-report>

B.3 The Stylesheet

A specific stylesheet transforms the meta markup language used to write the tests into valid SSML documents (the stylesheet structure is described in Appendix A.6). The output of the stylesheet presents three valid SSML documents containing respectively the instructions, the reference text and the test itself. Parameterization is used so that a single stylesheet may be used for producing all three documents.

Appendix C - Acknowledgements

The Voice Browser Working Group would like to acknowledge the contributions of several individuals:

Paolo Baggia (Loquendo) to manage the entire SSML-IR work activity.
Dave Burke (Voxpilot) for generating the stylesheet used to author SSML tests.
Sheyla Militello (Loquendo) for developing the evaluation criteria to classify the tests.
Gianluca Gagliati (Loquendo) for building and maintaining the Loquendo Web site that allowed contributors to manage and execute the assertions and tests.
All the contributors that allowed for the realization of this activity.

Thanks to Dave Raggett, Jim Larson, Scott Mc Glashan, Max Froumentin for the important management support.