Disposition of Comments for SRGS

Covering Messages sent to www-voice during the CR period starting 26 June 2002

scope of root and meaning of public

The candidate recommendation states in Section 4.7:
"The rule declared as the root rule may be scoped as either public or
private."

This is actually a misleading statement because even if the root is
declared private, it always behaves like a public rule (which is
good!):

1. the following sentence of Section 4.7 states that the root is
always activable:
"A rule reference to the root rule of a grammar is legal."
It seems clear this holds even if the rule is private;  this
impression is confirmed by Section 5.4:
"A conforming Grammar Processor (...) must be able to activate the
root, any single public rule, or any set of public rules or roots".

2. the next parag of Section 4.7 states that the root is always
activable:
"The root rule may be activated for recognition."

Now the purpose of the scope of a rule is to set two properties:
activation and exportation, which is confirmed by Appendix I:
"- Possible distinction of "activable" and "exported" rules (currently
merged as "public")".

In fact, I think that setting the scope of the root should not be
allowed because it can only be public.

About the point of Appendix I, I regreat that the addressed
distinction is not effective right now but I can live with it; after
all, being public can be seen from two perspectives: the grammars and
the engine.

However I am much more surprised by another confusion: being activable
is considered as being both on/off switchable at engine run-time, and
also being the "start symbol" (as defined in [HU79]:
http://www.w3.org/TR/speech-grammar/#ref-hu79).
Section 3.2 actually reads:

"Rules with public scope may be activated for recognition. That is
they may define the top-level syntax of spoken input." ("top-level
syntax" meaning "start symbol" I guess)

Now if you consider the grammar:
$root = I want to be connected to $people [please];
$people = $people_1st_floor | $people_2nd_floor ;

When you want to deactivate $people_2nd_floor, you have to declare it
public, but then, it also becomes a start symbol, which allows for
undesired spoken utterances.

I find this confusion much more a problem than exportation/activation.
This issue is even not addressed in Appendix I.  In the same vein, I
think that the status of the start symbol naturally belongs to the root and 
only to it.

WG Response

Your mail contains two comments. The first one regards
the scope of root rules, the second one the meaning of "activable".

1. Scope of root rules.
The group does not think that the sentence in section 4.7
("The rule declared as the root rule may be scoped as either public or
private.") is misleading.

The spec clearly distinguishes between a reference to the root rule
of a grammar and a reference to a named rule.
According to section 2.2 of the SRGS spec
<ruleref uri="grammarURI"/>
is a reference to a root rule, and
<ruleref uri="grammarURI#rulename"/>
is a reference to a named rule, whether or not the referenced rule is
declared as the root. (Further details are explained in section 2.2.2.)

A reference to a named rule is valid only if the referenced
rule is declared to be public. A reference to a grammar without a
fragment identifier, i.e. a reference to a root rule, is valid even
if the root rule of the referenced grammar is declared to be private.
(See Section 4.7 for details.)

The same reasoning applies to the activation of rules.

However, the wording of the second and third paragraphs in Section 3.2
seems to be a little confusing:
 "A rule with 'private' scope is visible only within its containing
  grammar. A private rule may be referenced only by other rules within
  the same grammar.
  One exception is that a rule declared as the root may be referenced
  externally even if it is a private rule. See Section 4.7 for details."
  
This wording will be changed in order to make the difference in semantics
between a reference to a rule and a reference to a grammar clearer. 

2. Meaning of "activable".
Your interpretation of what the spec says about the activation of
rules seems to be not quite correct. Being "activable" is not on/off
switchable at engine run-time. The grammar defines which rules can be
activated. At run-time an application may choose which subset of the
"activable" (i.e. public or root) rules are active for recognition as
stated in section 3.2. If a rule is active then the recognizer may
apply each alternative that is defined for this rule. The same holds
for any (directly or indirectly referenced) rule that the recognizer
encounters on its way down the tree. There is no mechanism to
dynamically restrict a given grammar. Specifically, when the root rule
in your example is activated, both the '$people_1st_floor' and the
'$people_2nd_floor' rules may be used during recognition, even though
these rules are not active by themselves.

declaration section of ABNF

1. Order of declarations. It was fixed in the previous Draft and now
   (in the candidate recommendation) declarations may appear in any
   order.  Actually I find this freedom harmfull for readability: if
   you're used to find always base, language, mode, root, tagFormat,
   lexicon, and meta, then getting the declarations of a new grammar
   is just easier; and writing a new grammar is just as simple.  From
   a software engineering point of view, it's like parameters for a
   function.

   To me, making the order free helps only beginners as grammar
   writers; it's useless if you're not a beginner and it's confusing
   if you're not a writer but a reader.

2. Tag format.  The candidate recommendation states that, for a given
   grammar, lexicon and meta declarations may appear any number of
   times, that language must appear exactly once, but it doesn't state
   that tag-format can appear at most once (which is stated for base
   and root) and I don't think that several tag-format declarations
   make any sense.
   [this can be solved syntactically: by reintroducing the fix order
   of declarations and making tag-format optional at its place, as it
   was in previous version of the document!-)]

   Furthermore, the tag-format should be like the mode: you cannot
   have a grammar in voice mode that uses a rule from a grammar in
   dtmf mode.  Similarly, I don't think you can mix two grammars that
   have different tag formats. Well, you could but it would require
   that further specification of tags is provided in the document.
   Currently, the tag language is defined externally: not only the
   syntax of this language but also its semantics, the kind of
   computed information (character strings, objects, raw data,
   attributes...), the data format, the way information is processed
   along the Logical Parse Structure (Appendix H), i.e. along the
   parse-tree (top-down, bottom-up, depth-first traversal,...), and
   everything I'm forgetting...

   I'd suggest mixing tag formats is forbidden like mixing modes.

WG Response

Your mail contains two comments on the ABNF form.
The first one regards the order of declarations, the second one
tag format declarations.

1. Order of declarations.
This issue has already been discussed during the Last Call period of
SRGS. Further details can be found in the disposition of comments
at http://www.w3.org/2002/06/speech-grammar-comments.html,
specifically in the sections on GC05-1
http://www.w3.org/2002/06/speech-grammar-comments.html#GC05-1 
and GC05-2
http://www.w3.org/2002/06/speech-grammar-comments.html#GC05-2

2. Tag Format.
You say the spec "doesn't state that tag-format can appear at most once".
This is wrong. In Section 4.8 the spec says:
"The ABNF header may contain one tag format declaration."
Moreover, the (normative) Appendix D, which formally specifies the syntax
of the ABNF form, is quite precise on this issue.

However, it might be confusing that the spec uses a different wording
in Sections 4.5 - 4.9 to express that a certain declaration must not
appear more than once. This will be changed and the following wording
will be used in these sections:
"The ABNF header must contain zero or one ... declaration.",
e.g. in Section 4.8:
"The ABNF header must contain zero or one tag format declaration.",

You are right that SRGS does not impose any restrictions on tag formats
across grammars. The issue of mixing different tag formats will be
dealt with in a separate specification on "Semantic Interpretation".

alias attribute of the ruleref element

In reading over the latest candidate version of the SRGS spec for 
clarification on the alias attribute of the ruleref element, I failed 
to see any mention of alias at all. However, it was included in the 
SRGS Working Draft 20 August 2001.

WG Response

Alias names were removed from the spec because the group had the
impression that this feature didn't add any value. In fact, if you
want to have a shorthand name for a uri you can just define a rule
that does nothing more than declaring an "alias name", for example:
 
<rule id="myAlias">
    <ruleref uri="http://www.example.com/some-very-long-path"/>
</rule>

If you put this "alias declaration" into your grammar you can replace
each occurrence of
<ruleref uri="http://www.example.com/some-very-long-path#rule"/>
with
<ruleref uri="#myAlias"/>

Whatever syntax you use for an alias declaration, you will always
have to write at least the full uri and its alias name. In the group's
opinion the (very small) overhead caused by the usage of the rule and
the ruleref elements didn't justify the introduction of a separate
element and attribute for aliases.

Metadata containing both " and ' in ABNF format

According to the 06/27/02 SRGS specification, the ABNF production for 'meta' and 'http-equiv' allows either double or single quoted strings for the name and value field.  However, there is no provision for ecaping the corresponding quote characters.  Thus, it is not possible to provide strings which contain both double quotes (") and single quotes (') as meta-data.  As the XML format does not impose such a restriction, it is not possible to guarantee a translation from an XML format grammar to an ABNF grammar.  

I consider this a problem and suggest adding support for C-style escape sequences in the "DoubleQuotedCharacters" production of the lexical grammar of ABNF.

The same applies to the tag format, even though the alternate delimiter format makes a collision somewhat unlikely.  What was the reason to use an alternate tag delimiter in favor of escape sequences as supported by JSpeech?

WG Response

Your mail contains two comments on the ABNF format. The first one
regards delimiters of 'meta' and 'http-equiv' strings, the second
one tag delimiters.

1. Delimiters of 'meta' and 'http-equiv' strings.
Support for escape sequences was not added to the ABNF format because
this would make the syntax more complex while there seem to be no
important use cases for it.
There are several issues regarding the conversion to and from ABNF Form
and XML Form. Section 1.3 of SRGS
(http://www.w3.org/TR/speech-grammar/#S1.3)
contains a list of these issues and states that ABNF Form and XML Form
are specified to ensure that the two representations are semantically
mappable. This means that the semantic performance of a grammar does
not change when it is converted to and from ABNF and XML.
The fact that quote characters in 'meta' and 'http-equiv' strings
cannot be escaped does not affect the semantic performance.

2. Tag Delimiters.
This issue has already been discussed during the Last Call period of
SRGS. Further details can be found in the disposition of comments
http://www.w3.org/2002/06/speech-grammar-comments.html,
specifically in the section on GC09-14
http://www.w3.org/2002/06/speech-grammar-comments.html#GC09-14.

ABNF srgs

http://www.ietf.org/internet-drafts/draft-porter-srgs-media-reg-01.txt

I'm not sure than I'm opposed to this I-D, though it is certainly
unusual for a non-XML MIME type to heavily reference RFC 3023.  However,
could you please explain the reasoning behind encoding the same
on-the-wire grammar with two different (but cross-convertible) syntaxes:
ABNF and XML?

It seems like a huge amount of work for little or no gain.  Certainly in
the context of MIME, there is much better experience in transporting XML
documents (and dealing with related i18n and encoding issues) than the
ABNF grammars that this I-D registers.  Plus, to quote RFC 1958,
Architectural Principles of the Internet, Section 3.2:

  "If there are several ways of doing the same thing, choose one."

I assume this has been thoroughly debated but I cannot find the thread
at <http://lists.w3.org/Archives/Public/www-voice/>.  I see at
<http://www.w3.org/TR/voice-intro/#gram>:  "We anticipate that
development tools will be constructed that provide the familiar ABNF
format to developers, and enable XML software to manipulate the XML
grammar format."  I can understand that developers find ABNF easier to
read and write.

However, is it really necessary for the ABNF format to be released into
the wild (i.e., sent over MIME protocols)?  Wouldn't it improve
simplicity and interoperability to say that ABNF MUST first be converted
to XML before transport?  More strongly, wouldn't the document be more
clear, straightforward and interoperable to say that XML is the
on-the-wire syntax for the grammar, and then separately to specify
reversible conversion back and forth between XML and ABNF?

In that case, this I-D would not need to be registered.

WG Response

Developing two forms of grammar for transport was debated
internally and determined to be requisite to address the needs of the
development community in this space.  The best summary of that
discussion is addressed in the "2nd Last Call Disposition of
Comments".

http://www.w3.org/2002/06/speech-grammar-comments.html#GC08-1

Certainly pre-transformation before transport is an option developers
have (as is the use of any source format if it is transformed in to
the XML or ABNF form of SRGS before transport), but not an option we
wanted to enforce upon the developer community.  In particular, the
belief is that the XML format is verbose to the point that it impinges
on developers doing hand-authoring for prototyping or even full
application development.  We very much did not want to require use of
transformative tools on the development side for simple
prototyping.

ABNF: "public" does too much

In Speech Recognition Grammar Specification, "public" means so much that it 
ends up with being very confusing.

In ABNF (and its XML counterpart) a rule with public scope is altogether:
- a rule that can be used from an external grammar--let's call this feature 
"export";
- a rule that can be activated and deactivated for speech recognition--let's 
call this "activation";
- a rule that is the top-level syntax of spoken input--let's call it 
"top-level".

The 3 notions are completely different but they are merged into "public" 
which creates problems.

1. top-level is not activation

In http://www.w3.org/TR/speech-grammar/#S3.2 and also in 
http://www.w3.org/TR/speech-grammar/#S4.7, those notions are explicitly 
considered as synonyms, but now look at the following grammar:

root $r;
$r = I want to go to $cities;
$cities = $florida | $california | ...;
public $florida = tampa | miami |...;
public $california = SF | LA | ...;

The only sentences you want to allow start with "I want to go to".
Also you want to activate and deactivate Floridian/Californian cities in 
order to lower perplexity. However you don't want your sentences to be 
limited to bare cities.

This little toy example can have very many different applications in 
telephony or in web browsing (in general you want the activation to apply 
only to a sub-sentence that appears in a carrier phrase, not to the whole 
sentence). It shows there are rules that may be activated while they should 
not be at top-level.

Confusing top-level and activation can be a handicap for some applications.

2. top-level is not export

It is not because you want a rule to be exported that you want it to be at 
top-level as well.
For example say you have a sub-grammar of numbers that you use in several 
applications: it's very unlikely that you wish numbers to be uttered 
alone--outside any carrier phrase.

As a consequence when you import a rule from a grammar G1 into a grammar G2, 
you have at top-level: all sentences at top-level of G2 plus the rule(s) you 
import from G1. This looks very awkward to me.

3. export is not activation

This issue is touched in http://www.w3.org/TR/speech-grammar/#AppI as a 
"Consideration for Future Versions". I fully agree but I consider this less 
of a problem than both the previous ones.

WG Response

As you mentioned, the SRGS spec does not distinguish between the
notions "top-level" and "activation". Therefore, "public" has just
the following two meanings:
- A public rule may be referenced by other grammars;
- A public rule may be activated for recognition (as top-level syntax
  of spoken input)

A possible distinction of these two meanings is one of the "Features
under Consideration for Future Versions" listed in Appendix I of the
SRGS spec.

link metadata cannot override server media type

This spec can't override the URI/HTTP specs this way...

"A URI reference may be accompanied by a media type that indicates the
content type of the resource identified by the URI. When specified, this
type value takes precedence over other possible sources of the media
type (for instance, the "Content-type" field in an HTTP exchange, or the
file extension)."
  -- http://www.w3.org/TR/2002/CR-speech-grammar-20020626/#S2.2.2

I suggest you fix it to work like in other specs...

"This attribute gives an advisory hint as to the content type of the
content available at the link target address."
 -- http://www.w3.org/TR/html401/struct/links.html#adef-type-A

Please add a test case to clarify how this works and let me know when
it's available.

WG Response

This issue has been discussed many times within the group. It is
explicitly discussed in the Last Call Disposition of Comments -
http://www.w3.org/2002/06/speech-grammar-comments.html#GC09-20 - where
we took into account the W3C Director's requested modification. Since
the SMIL 2.0 Recommendation uses 'type' in a similar way, we believe
there are other specs which set the precedence and, at this stage, no
'fix' is required. However, if further evidence comes to light, please
let us know.

The SRGS testsuite already contains tests for this feature.

Follow-up

On Fri, 2003-02-07 at 07:07, Scott McGlashan wrote:
> Hi Dan,
> 
> Thank you for your public comments on SRGS 1.0. 
> 
> This issue has been discussed many times within the group. It is
> explicitly discussed in the Last Call Disposition of Comments -
> http://www.w3.org/2002/06/speech-grammar-comments.html#GC09-20 - where
> we took into account the W3C Director's requested modification. Since
> the SMIL 2.0 Recommendation uses 'type' in a similar way, we believe
> there are other specs which set the precedence and, at this stage, no
> 'fix' is required.

I disagree; I don't find this a satisfactory justification for
declining my request. In fact, I don't see any technical
justification the way the spec is at all.

> However, if further evidence comes to light, please
> let us know.

No, the burden is on you to (attempt to) satisfy me.

"5.2.4 Proposed Recommendation (PR)

Entrance criteria. Before advancing a technical report to Proposed
Recommendation, the Director must be satisfied that:

[...]
   2. the Working Group has formally addressed issues raised during the
previous review or implementation period (possibly modifying the
technical report)"
 -- http://www.w3.org/Consortium/Process-20010719/tr.html#RecsCR


If you aren't interested in negotiating further based
on the information I sent, be sure to note this as
outstanding dissent when you request Proposed
Rec status.


> The SRGS testsuite already contains tests for this feature.

Pointer, please?

Follow-up

From: Brad Porter <brad@tellme.com> Date: 07 Feb 2003

HTML and SMIL are in clear conflict on their use of the type
attribute.  Other specifications do not make a clear statement either
way.  I have not seen a clear statement from the TAG yet.  I have seen
substantial email threads debating this issue in different working
groups without clear consensus.

As is documented in the comments, we did work to address this question
with Martin.  The working group did choose to follow the language and
use from SMIL for the reason that practically speaking not all web
servers return the right MIME type for the content.  If you are not
satisfied with the details provided in the response, we would
certainly be happy to discuss it further.

I personally would welcome the TAG addressing this issue and I would
be very willing to participate in such a discussion.

Followup

From: Chris Lilley <chris@w3.org> Date: 7 Feb 2003

BP> Hopefully you didn't intend your comments to sound as inflamatory
BP> as they might be interpreted.

I am sure Dn did not intend to be inflamatory, any more than the
initial response intended to be dismissive.

BP> HTML and SMIL are in clear conflict on their use of the type attribute.

Further, SMIL is in conflict with itself on the type attribute,
depending on what element it is used and what the transport protocol
is.

SVG also uses a type attribute, as an informative hint and as a way to
allow client-side selection from available media.

BP>  Other specifications do not make a clear statement either way.

They do, in fact.

BP> I have not seen a clear statement from the TAG yet.

No, but you will and I hope you will take part in the preceeding
discussion.

Dans statement was a first heads up, as a matter of courtesy, that the
TAG has an open issue on this subject.

BP> I have seen
BP> substantial email threads debating this issue in different working
BP> groups without clear consensus.

I would appreciate pointers to such, particularly those that
considered retyping was desirable.

BP> As is documented in the comments, we did work to address this
BP> question with Martin. The working group did choose to follow the
BP> language and use from SMIL for the reason that practically
BP> speaking not all web servers return the right MIME type for the
BP> content.

Aha. We suspected that might be the reason. The problem is that this
transparent fixup (and sniffing in general) has a number of
undesirable knockon effects.

BP> If you are not satisfied with the details provided in the
BP> response, we would certainly be happy to discuss it further.

I would encourage you to do this.

BP> I personally would welcome the TAG addressing this issue and I
BP> would be very willing to participate in such a discussion.

Thanks, this is appreciated.

Follup-up

From: Brad Porter <brad@tellme.com> Date: 07 Feb 2003

By the way, just to be clear, I personally completely agree with the
architectural impurity of having local type information take precedence.
Though I also completely agree with fact that unregistered mime types exist and
web server configurations are not always correct.  Which is why I'm very keen
to have a definitive statement and a practical plan to make the choice stick,
as when it comes to a choice of being conformant or working with a wider range
of content providers, business motivations can prevail.

(comments and links embedded below)

--Brad


Chris Lilley wrote:

> On Friday, February 7, 2003, 8:59:40 PM, Brad wrote:
>
> BP> Dan,
>
> BP> Hopefully you didn't intend your comments to sound as inflamatory
> BP> as they might be interpreted.
>
> I am sure Dn did not intend to be inflamatory, any more than the
> initial response intended to be dismissive.
>
> BP> HTML and SMIL are in clear conflict on their use of the type attribute.
>
> Further, SMIL is in conflict with itself on the type attribute,
> depending on what element it is used and what the transport protocol
> is.
>
> SVG also uses a type attribute, as an informative hint and as a way to
> allow client-side selection from available media.
>
> BP>  Other specifications do not make a clear statement either way.
>
> They do, in fact.

In SMIL, local value takes precedence.  In HTML 4.01, type is a hint.  In XHTML
2.0 type is a definition of the allowable mime types for that resource
(http://www.w3.org/TR/xhtml2/mod-attribute-collections.html#adef_attribute-collections_type).

I find the statement of precedence ambiguous here as even though the type is
"advisory" it doesn't specify the precedence.
http://www.w3.org/TR/xmldsig-core/#sec-o-SignatureProperty

> BP> I have not seen a clear statement from the TAG yet.
>
> No, but you will and I hope you will take part in the preceeding
> discussion.
>
> Dans statement was a first heads up, as a matter of courtesy, that the
> TAG has an open issue on this subject.
>
> BP> I have seen
> BP> substantial email threads debating this issue in different working
> BP> groups without clear consensus.
>
> I would appreciate pointers to such, particularly those that
> considered retyping was desirable.

Here are some threads with relevant discussion that I found quickly.  There may
be more:

http://lists.w3.org/Archives/Public/www-html/1999Aug/0035.html
http://lists.w3.org/Archives/Public/www-html/1998Jan/0076.html
http://lists.w3.org/Archives/Public/www-html/1999May/0011.html
http://lists.w3.org/Archives/Public/www-html/2002Aug/0346.html
http://lists.w3.org/Archives/Public/w3c-wai-er-ig/2000Jan/0123.html

Follow-up

From: Dan Connolly <connolly@w3.org> Date: 07 Feb 2003

On Fri, 2003-02-07 at 11:59, Brad Porter wrote:
> Dan,
> 
> Hopefully you didn't intend your comments to sound as inflamatory as they might be interpreted.

Hmm... yes, sorry to be curt. I'm typing with one good hand; broke my
finger and had surgery earlier this week...

> HTML and SMIL are in clear conflict on their use of the type attribute.

Yes; HTML is right and SMIL is wrong. 1/2 ;-)

Formats can't override protocols that carry them (nor can protocols
screw with data formats.

Dunno how SMIL got to be that way. I don't think it justifies
the SGRS design, though reasonable people may disagree...

>  Other specifications do not make a clear statement either way.  I have not seen a
> clear statement from the TAG yet.

I think maybe the TAG may look at this, but regardless, my
comment stands.

>  I have seen substantial email threads debating this issue in different working groups without clear consensus.

OK, but where we are right now is me asking you to change your spec or
tell me why not, in technical terms.


> As is documented in the comments, we did work to address this question with Martin.  The working group did choose to follow the language and use from SMIL
> for the reason that practically speaking not all web servers return the right MIME type for the content.

I see.

Well, that is a technical justification; I appreciate that, but...

>  If you are not satisfied with the details provided
> in the response,

Looking over
  http://www.w3.org/2002/06/speech-grammar-comments.html#GC09-20

in detail, no I'm not satisfied. I still think the spec
should change.

> we would certainly be happy to discuss it further.

I'd like to study the test case. I'd appreciate
a pointer, though perhaps I can find it myself.

> 
> I personally would welcome the TAG addressing this issue and I would be very willing to participate in such a discussion.

Meanwhile, as I said, my request/comment stands.

WG Response

Following several teleconferences

This makes several minor changes such as adding a dash to 'media type'
and two editorial changes motivated by responses from W3C TAG members
and questions during the 7 July 2003 call.

OLD:  Upon delivery, the resource indicated by a URI reference
       may be considered in terms of two types.

NEW:  The resource representation delivered by dereferencing the
       URI refererence may be considered in terms of two types.

----

OLD:  Whenever a type is returned, it is treated as authoritative.
       The declared media type is determined by the value returned
       by the resource owner or, if none is returned, by the
       preferred media type given in the grammar.

NEW:  The declared media-type is the value returned by the
       resource owner or, if none is returned, the preferred media
       type given in the grammar.  There may be no declared
       media-type if the resouce owner does not return a value and
       no preferred type is specified.  Whenever specified, the
       declared media-type is authoritative.

This draft closes all known issues and is hopefully final.

necessity of the <token> tag

I just wondered why there is the <token> tag in the SRGS
specification.  Where is the difference to include PCDATA in <item>
.. </item>?

WG Response

The difference is that the content of the
<token> element is treated as a single token,
e.g.:
<token>San Francisco</token> is a single token, whereas
<item>San Francisco</item> is a sequence of two tokens.