PLS 1.0: Last Call Disposition of Comments

This document details the responses made by the Voice Browser Working Group to issues raised during the first Last Call (beginning 31 January 2006 and ending 15 March 2006). Comments were provided by Voice Browser Working Group members, other W3C Working Groups, and the public via the www-voice-request@w3.org (archive) mailing list.

1. Introduction

This document describes the disposition of comments in relation to Pronunciation Lexicon Specification (PLS) Version 1.0 (http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131). The goal is to allow the readers to understand the background behind the modifications made to the specification. In the meantime it provides an useful check point for the people who submitted comments to evaluate the resolutions applied by the W3C's Voice Browser Working Group.

This document provides the analysis of the issues that were submitted and resolved as part of the Last Call Review period. It includes issues that were submitted outside the official review period, up to July 2006.

2. Summary

The following table summarizes all the public comments received by the Voice Browser Working Group. The table includes following information on each Comment:

the Item ID
the Commentators' Name
the Nature of each comment (Feature Request, Change to Existing Feature, Technical Error, Clarification / Typo)
the final Disposition

The subsection "2.1 Clarifications, Typographical, and Other Editorial" of this document describes the details of each comment including:

the original Comment
the Resolution proposed by the Voice Browser Working Group ("Accepted", "Rejected" or "Deferred")
the Email Trail on the discussion

Note: The Disposition of "Waiting Response" means that we have not received a formal acceptance/denial from the Commentator, or the acceptance was pending to the resolution applied. And the Disposition of "Implicitly Accepted" means that the Commentator agrees to the resolution, but requires some additional modification and formal acceptance is not yet received.

Item	Commentator	Nature	Disposition
R100-1	Paul Bagshaw (2006-02-03)	Feature Request	Accepted
R100-2	Paul Bagshaw (2006-02-03)	Feature Request	Accepted
R100-3	Paul Bagshaw (2006-02-03)	Clarification / Typo / Editorial	Accepted
R100-4	Paul Bagshaw (2006-02-03)	Clarification / Typo / Editorial	Accepted
R101	Mark Alexandre (2006-02-08)	Clarification / Typo / Editorial	Waiting Response
R102	Al Gilman (2006-03-15)	Feature Request	Waiting Response
R103-1	Richard Ishida (2006-03-21)	Feature Request	Accepted
R103-2	Richard Ishida (2006-03-21)	Clarification / Typo / Editorial	Accepted
R103-3	Richard Ishida (2006-03-21)	Clarification / Typo / Editorial	Accepted
R103-4	Richard Ishida (2006-03-21)	Technical Error	Waiting Response
R103-5	Richard Ishida (2006-03-21)	Clarification / Typo / Editorial	Accepted
R103-6	Richard Ishida (2006-03-21)	Technical Error	Accepted
R103-7	Richard Ishida (2006-03-21)	Clarification / Typo / Editorial	Waiting Response
R103-8	Richard Ishida (2006-03-21)	Clarification / Typo / Editorial	Accepted
R103-9	Richard Ishida (2006-03-21)	Clarification / Typo / Editorial	Accepted
R103-10	Richard Ishida (2006-03-21)	Clarification / Typo / Editorial	Accepted
R103-11	Richard Ishida (2006-03-21)	Clarification / Typo / Editorial	Accepted
R103-12	Richard Ishida (2006-03-21)	Clarification / Typo / Editorial	Accepted
R103-13	Richard Ishida (2006-03-21)	Clarification / Typo / Editorial	Accepted
R103-14	Richard Ishida (2006-03-21)	Clarification / Typo / Editorial	Accepted
R103-15	Richard Ishida (2006-03-21)	Clarification / Typo / Editorial	Accepted
R103-16	Richard Ishida (2006-03-21)	Clarification / Typo / Editorial	Accepted
R103-17	Richard Ishida (2006-03-21)	Clarification / Typo / Editorial	Accepted
R103-18	Richard Ishida (2006-03-21)	Clarification / Typo / Editorial	Accepted
R103-19	Richard Ishida (2006-03-21)	Clarification / Typo / Editorial	Accepted
R103-20	Richard Ishida (2006-03-21)	Clarification / Typo / Editorial	Waiting Response
R103-21	Richard Ishida (2006-03-21)	Clarification / Typo / Editorial	Waiting Response
R103-22	Richard Ishida (2006-03-21)	Clarification / Typo / Editorial	Accepted
R103-23	Richard Ishida (2006-03-21)	Clarification / Typo / Editorial	Accepted
R103-24	Richard Ishida (2006-03-21)	Change to Existing Feature	Accepted
R103-25	Richard Ishida (2006-03-21)	Clarification / Typo / Editorial	Accepted
R103-26	Richard Ishida (2006-03-21)	Clarification / Typo / Editorial	Implicitly accepted
R103-27	Richard Ishida (2006-03-21)	Clarification / Typo / Editorial	Accepted
R103-28	Richard Ishida (2006-03-21)	Clarification / Typo / Editorial	Accepted
R103-29	Richard Ishida (2006-03-21)	Clarification / Typo / Editorial	Accepted
R103-30	Richard Ishida (2006-03-21)	Feature Request	Waiting Response
R103-31	Richard Ishida (2006-03-21)	Clarification / Typo / Editorial	Accepted
R103-32	Richard Ishida (2006-03-21)	Clarification / Typo / Editorial	Accepted
R103-33	Richard Ishida (2006-03-21)	Clarification / Typo / Editorial	Implicitly accepted
R103-34	Richard Ishida (2006-03-21)	Clarification / Typo / Editorial	Accepted
R103-35	Richard Ishida (2006-03-21)	Clarification / Typo / Editorial	Waiting Response
R103-36	Richard Ishida (2006-03-21)	Feature Request	Waiting Response
R104	Janina Sajka (2006-04-26)	Clarification / Typo / Editorial	Waiting Response
R105-1	Deborah Dahl (2006-05-15)	Clarification / Typo / Editorial	Accepted
R105-2	Deborah Dahl (2006-05-15)	Clarification / Typo / Editorial	Accepted

2.1 Clarifications, Typographical, and Other Editorial

Issue R100-3

From Paul Bagshaw (2006-02-03):

3. Specification ambiguity

* Section 4.4 of PLS stipulates:

"The <lexeme> element contains one or more <grapheme> elements, one or more of either <phoneme> or <alias> elements, and zero or more <example> elements."

However, it appears to be possible to have BOTH <phoneme> AND <alias> elements in <lexeme>, as illustrated in example 4 and more clearly described in section 4.9.2

. . . either by <phoneme> elements or <alias> elements or a combination of both . . .

The either/or of section 4.4 needs correction (Proposition 3: add “or a combination of both”).

Resolution: Accepted

You are right there is a difference in wording between Section 4.4 [1] and Section 4.9.2 [2]. We propose the following change in Section 4.4. [1]

OLD:
"The <lexeme> element contains one or more <grapheme> elements, one or more of either <phoneme> or <alias> elements, and zero or more <example> elements."

NEW:
"The <lexeme> element contains one or more <grapheme> elements, one or more pronunciations (either by <phoneme> elements or <alias> elements or a combination of both), and zero or more <example> elements."

We think this will solve your concern on this issue.

[1] http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/#S4.4
[2] http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/#S4.9.2

Email Trail:

Original Comment Paul Bagshaw (2006-02-03)
Confirmation of receipt VBWG (2006-04-20)
VBWG official response to last call issue VBWG (2006-07-27)
Final response to VBWG official response Paul Bagshaw (2006-07-27)

Issue R100-4

From Paul Bagshaw (2006-02-03):

4. Terminology

A final relatively minor comment: in section 4.5.

A <grapheme> may optionally contain an orthography attribute which identifies the script code used for writing the orthography.

The term ‘orthography’ has doubled use; one as a glossary term and the other as an attribute name. Only the font makes the specification clear. Rewording of the glossary term should be envisaged.

Resolution: Rejected

As consequence of another comment, we resolved to remove the 'orthography' attribute from the grapheme element because we do not see its value and recognize the benefits of supporting a mixture of script types within a grapheme element (which occurs in Japanese, for example).

Consequently, the double use you mention is no more present in the specification and the definition in the glossary may remain the same.

Email Trail:

Original Comment Paul Bagshaw (2006-02-03)
Confirmation of receipt VBWG (2006-04-20)
VBWG official response to last call issue VBWG (2006-07-27)
Final response to VBWG official response Paul Bagshaw (2006-07-27)

Issue R101

From Mark Alexandre (2006-02-08):

In the Pronunciation Lexicon Specification (PLS) v1.0, as of Draft 31, the usage of the element tag <grapheme> reflects a misunderstanding of the meaning of the word "grapheme." The definition in the spec's Glossary of Terms is nevertheless quite accurate: "One of the set of the smallest units of a written language, such as letters, ideograms, or symbols, that distinguish one word from another; a representation of a single orthographic element."

Thus, the letter "g" and the numeral "4" are both examples of widely used graphemes, as are the question mark "?" and the dollar sign "$". In the current draft of the PLS however, the so-called "grapheme" element is mistakenly applied to what in English is commonly called the "spelling." I believe this will lead to confusion unless the element is renamed. As to what it should be named, more on that below.

Before that, however, I wish to draw attention to an attribute of the grapheme element, the "orthography" attribute. This value of this attribute is, according to the draft spec, supposed to be a "script code" compliant with the ISO 15924 standard. The title of that standard is "Information and documentation — Codes for the representation of names of scripts." All of this naturally leads to the question: why not name this attribute "script" or "scriptcode"?

The word "orthography", derived from Greek roots meaning (roughly) "correct" and "writing", can present some ambiguity between two related meanings, but neither meaning is the same as "script."

Orthography can be used as a synonym for what most English speakers more commonly call the "spelling" of a particular word. Thus the examples "colour" and "color" are two different orthographies for the same word in the English language — the latter being the American orthography that was adopted following a set of spelling reforms which the rest of the English-speaking world declined to follow. The corresponding word in the French language is written with the orthography "couleur."

In a related sense, the word "orthography" can be used to refer to an entire system of conventions for writing, including such issues as spelling and punctuation, plus even such trivia as the direction of writing (such as left-to-right or vice versa), the spacing and/or divisions of words, etc. Some may also comprehend the word in this broader sense to include issues of penmanship or calligraphy, that is, the correct method to compose or draw the graphemes (or characters, or symbols, or glyphs, if you like) of the language. Note that, in this sense, conventions of orthography can differ even between cultures that use the same alphabet — even between the style guides used by differing editorial staffs in the same metropolis!

Neither meaning of "orthography" is to be conflated with the meaning of the word "script." The Greek, Latin and Cyrillic alphabets, as well as ancient cuneiform, Egyptian hieroglyphics, Chinese characters (Hanzi, or Kanji in Japan), etc., are all most precisely referred to as scripts, collectively. Perhaps the only alternative to "script" is the much more vague and expansive term, "writing system."

Finally, then, it would seem clear that the weight of the evidence clearly argues for the attribute in question to be called "script" (or "scriptcode" to be verbose), as indeed it is called in ISO 15924. Having thus liberated the word "orthography" from misapplication, we may consider that word a candidate for the element incorrectly labelled "grapheme."

In addition to "orthography," other candidates for the element now called grapheme in the draft spec might include "spelling" or "writing" or the almost comically long-winded "graphic presentation form." Any one of these four terms would be vastly preferable to "grapheme," —which, again, is simply wrong—but each does have certain short- comings as well. I will briefly list the problems I am aware of.

The term "orthography" is almost ideal, except for its unavoidable connotation of "correctness." That is, were it ever desirable, for whatever reason, to list what may be deemed a "non-standard" written form of a word, then calling that an orthography for the word is misleading. On the other hand, obviously, if the PLS is specifically only intended to associate pronunciation with "correct" spellings (according to somebody's criteria of correctness), then orthography (in its narrower sense, vide supra) would be precisely accurate. An additional bonus is that this word is understood with pretty much the same meaning in other languages such as French and Spanish.

The term "spelling" is by far the more commonly used word by English speakers when referring to how to write out a particular word. Further- more, it carries no connotation of correctness, since you can easily refer to "alternate spelling" or even "bad spelling." The downside, a minor one, is that the notion of spelling is strongly associated with alphabets; it is not at all clear what spelling means in the context of Chinese writing or similar non-alphabetic systems.

The term "writing" is just vague enough to mean anything you want. Since it applies to every aspect of, well, writing, it could be applied to any aspect of it. Put another way, its upside is its downside.

Finally, as for "graphic presentation form," or something similarly long and comically precise: one is tempted to wonder whether every XML parser out there really can handle sentence-length element names, as well as how many folks have access to an XML editor with a contextual auto-completion feature!

Just to throw out one last (off-the-wall) possibility, consider that the Spanish cognate of the English word orthography is "ortografía", which commonly gets shortened to just "grafía." [See for example, http://www.xtec.es/~faguile1/grafia/]. This suggests that this originally Greek root for "writing" suffices all by itself to communicate the idea we are talking about here. Perhaps an Anglicized (or Anglicised, if you prefer) coining such as "graphy" or a more internationally flavored "graphia" or "graphie" would actually be the least open to misuse and misinterpretation, since—to paraphrase Humpty-Dumpty— it would mean just what we chose it to mean.

Resolution: Rejected

We have summarised your comment and identified two requests.

#1. Rename the 'orthography' attribute to 'script' or 'scriptcode' within the grapheme element.

#2. Having liberated the name orthography, rename the grapheme element 'orthography' (preference), 'spelling', writing' or even 'graphia'

We resolve to remove the 'orthography' attribute from the grapheme element because we do not see its value and recognize the benefits of supporting a mixture of script types within a grapheme element (which occurs in Japanese, for example). Consequently, the request to rename the attribute is inapplicable. The name 'orthography' is however liberated.

The element named 'grapheme' [1] almost always involves a sequence of graphemes. However, it is not a requirement for the element to contain a sequence of graphemes; only one grapheme (smallest orthographic unit) is permissible (minimum requirement). The grapheme or sequence of graphemes given in the 'grapheme' element corresponds to the phoneme or sequence of phonemes given in the 'phoneme' element. This is in accordance with the notion of "grapheme-to-phoneme conversion" (or, in layman's terms, letter-to-sound conversion). The name of the element 'grapheme' goes hand-in-hand with the name of the element 'phoneme', which has been borrowed from SSML 1.0 [2] because it has a similar usage.

Future revisions of PLS may wish to define the pronunciation of orthographic units larger than the grapheme, such as 'morpheme' or 'affix' (as is common in system internal lexicons). Grapheme, morpheme, affix, locution... are all terms that refer to orthographic units. A generic term such as 'orthography', 'spelling' or 'writing' etc. for this element seems inappropriate at this stage given that it would probably have to be changed to 'grapheme' in future. It is thus our opinion that the current name 'grapheme' is the best name for this element.

Please note that our reply is in accordance with two further public comments closely related to yours [3] [4].

[1] http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131//#S4.5
[2] http://www.w3.org/TR/speech-synthesis/#S3.1
[3] http://lists.w3.org/Archives/Public/www-voice/2006AprJun/0107.html
[4] http://lists.w3.org/Archives/Public/www-voice/2006AprJun/0120.html

Email Trail:

Original Comment Mark Alexandre (2006-02-08)
Confirmation of receipt VBWG (2006-04-20)
VBWG official response to last call issue VBWG (2006-07-27)

PLS 1.0: Last Call Disposition of Comments

Abstract

Status

Table of Contents

1. Introduction

2. Summary

2.1 Clarifications, Typographical, and Other Editorial

Issue R100-3

Issue R100-4

Issue R101

Issue R103-2

Issue R103-3

Issue R103-5

Issue R103-7

Issue R103-8

Issue R103-9

Issue R103-10

Issue R103-11

Issue R103-12

Issue R103-13

Issue R103-14

Issue R103-15

Issue R103-16

Issue R103-17

Issue R103-18

Issue R103-19

Issue R103-20

Issue R103-21

Issue R103-22

Issue R103-23

Issue R103-25

Issue R103-26

Issue R103-27

Issue R103-28

Issue R103-29

Issue R103-31

Issue R103-32

Issue R103-33

Issue R103-34

Issue R103-35

Issue R104

Issue R105-1

Issue R105-2

2.2 Technical Errors

Issue R103-4

Issue R103-6

2.3 Requests for Change to Existing Features

Issue R103-24

2.4 New Feature Requests

Issue R100-1

Issue R100-2

Issue R102

Issue R103-1

Issue R103-30

Issue R103-36