Version reviewed: http://www.w3.org/TR/2008/WD-speech-synthesis11-20080317/
Lead reviewer and date of initial review: Richard Ishida, Apr 2008
Subject lead in: [SSML11]
These are comments on behalf of the Internationalization Core WG, unless otherwise stated. The "Owner" column indicates who has been assigned the responsibility of tracking discussions on a given comment.
We recommend that responses to the comments in this table use a separate email for each point. This makes it far easier to track threads. Click on the icons in the right-most column to see email discussions.
|1||3.2.1||Language priority list?||
No mention is made of the concept of a 'language priority list' per RFC 4647. We suspect that this is an oversight, since we expect that a processor needs to choose one item from the list that best fits, and will need some help in making that choice.
Furthermore, the text says "A voice satisfies the languages feature if, for each language/accent pair in the list...". We suspect that that should read 'if, for one or more language/accent pairs in the list...' The word 'each' implies that all items in the list must match.
If we are mistaken here, please make it clearer in the spec why this approach is used.
|2||3.2.1||Who's description as a language tag||
The phrase "whose description as a language tag" is misleading. Perhaps you mean to say "a voice can read/speak a language whose language tag matches"
|3||3.2.1||Ignoring script and extension subtags||
Probably the text about script and extension subtags is unnecessary, As long as the range in the attribute has no script subtag, any voices that have scripts will match it. If one is specified, only voices that have that script subtag will match. Even though script has no meaning in an auditory context, what the spec proposes requires that the tags/ranges be processed. Matching is written such that this isn't necessary. Better to recommend that these tags just not be used, since they are not relevant to an auditory selection. Matching is written such that parsing the language tags is not necessary.
zh-CN-HK is an illegal language tag (in one of the examples). It might be better to avoid a chinese example, at least initially ... if you want control over which *langauge* is used, you should use cmn or yue tags rather than zh-CN etc.
Addison Phillips has taken an action to propose an alternative paragraph or two for the example.
|5||3.2.1||RFC 4647 references||
Reference BCP 47, "Matching of Language Tags" might be better than RFC 4647 references or at least RFC 4647 or its successor. This is to guard against issues if RFC 4647 is replaced by a new RFC number. We do not expect future changes to that RFC to cause problems for the SSML spec.
|6||3.2.1||RFC 3066 reference||
3066 is a stale reference, and should be removed from the references section.
|7||3.2.1||BCP 47 as a normative reference||Bcp47 is only an informative reference. We suggest that the sentence about BCP 47 be changed to say that values should conform to language tags in BCP47, and this would make BCP47 a normative reference.||RI||S|
|8||3.2.1||Interaction of voice and xml:lang||
It is not clear to us from reading the spec how voice and xml:lang setting interact. We assume that, in the absence of a voice element the xml:lang values should be used to determine the appropriate voice. It would be helpful to clarify that in the spec.
If a range of text within a voice element has an xml:lang attribute, is it expected that that would affect the voice if the values are different than those specified on the voice element? Again, we would like to see that clarified in the spec.
Note that we found the paragraph that begins "voice attributes are inherited down the tree... " hard to understand, and it doesn't seem to imply any precedence.