Disposition of Comments for SRGS

Covering Messages sent to www-voice during the CR period starting 26 June 2002

scope of root and meaning of public

The candidate recommendation states in Section 4.7:
"The rule declared as the root rule may be scoped as either public or
private."

This is actually a misleading statement because even if the root is
declared private, it always behaves like a public rule (which is
good!):

1. the following sentence of Section 4.7 states that the root is
always activable:
"A rule reference to the root rule of a grammar is legal."
It seems clear this holds even if the rule is private;  this
impression is confirmed by Section 5.4:
"A conforming Grammar Processor (...) must be able to activate the
root, any single public rule, or any set of public rules or roots".

2. the next parag of Section 4.7 states that the root is always
activable:
"The root rule may be activated for recognition."

Now the purpose of the scope of a rule is to set two properties:
activation and exportation, which is confirmed by Appendix I:
"- Possible distinction of "activable" and "exported" rules
(currently merged as "public")".

In fact, I think that setting the scope of the root should not be
allowed because it can only be public.

About the point of Appendix I, I regreat that the addressed
distinction is not effective right now but I can live with it; after
all, being public can be seen from two perspectives: the grammars and
the engine.

However I am much more surprised by another confusion: being activable
is considered as being both on/off switchable at engine run-time, and
also being the "start symbol" (as defined in [HU79]:
http://www.w3.org/TR/speech-grammar/#ref-hu79).
Section 3.2 actually reads:

"Rules with public scope may be activated for recognition. That is
they may define the top-level syntax of spoken input." ("top-level
syntax" meaning "start symbol" I guess)

Now if you consider the grammar:
$root = I want to be connected to $people [please];
$people = $people_1st_floor | $people_2nd_floor ;

When you want to deactivate $people_2nd_floor, you have to declare it
public, but then, it also becomes a start symbol, which allows for
undesired spoken utterances.

I find this confusion much more a problem than exportation/activation.
This issue is even not addressed in Appendix I.  In the same vein, I
think that the status of the start symbol naturally belongs to the
root and only to it.

WG Response

Your mail contains two comments. The first one regards
the scope of root rules, the second one the meaning of "activable".

1. Scope of root rules.
The group does not think that the sentence in section 4.7
("The rule declared as the root rule may be scoped as either public or
private.") is misleading.

The spec clearly distinguishes between a reference to the root rule
of a grammar and a reference to a named rule.
According to section 2.2 of the SRGS spec
<ruleref uri="grammarURI"/>
is a reference to a root rule, and
<ruleref uri="grammarURI#rulename"/>
is a reference to a named rule, whether or not the referenced rule is
declared as the root. (Further details are explained in section 2.2.2.)

A reference to a named rule is valid only if the referenced
rule is declared to be public. A reference to a grammar without a
fragment identifier, i.e. a reference to a root rule, is valid even
if the root rule of the referenced grammar is declared to be private.
(See Section 4.7 for details.)

The same reasoning applies to the activation of rules.

However, the wording of the second and third paragraphs in Section 3.2
seems to be a little confusing:
 "A rule with 'private' scope is visible only within its containing
  grammar. A private rule may be referenced only by other rules within
  the same grammar.
  One exception is that a rule declared as the root may be referenced
  externally even if it is a private rule. See Section 4.7 for details."
  
This wording will be changed in order to make the difference in semantics
between a reference to a rule and a reference to a grammar clearer. 

2. Meaning of "activable".
Your interpretation of what the spec says about the activation of
rules seems to be not quite correct. Being "activable" is not on/off
switchable at engine run-time. The grammar defines which rules can be
activated. At run-time an application may choose which subset of the
"activable" (i.e. public or root) rules are active for recognition as
stated in section 3.2. If a rule is active then the recognizer may
apply each alternative that is defined for this rule. The same holds
for any (directly or indirectly referenced) rule that the recognizer
encounters on its way down the tree. There is no mechanism to
dynamically restrict a given grammar. Specifically, when the root rule
in your example is activated, both the '$people_1st_floor' and the
'$people_2nd_floor' rules may be used during recognition, even though
these rules are not active by themselves.

declaration section of ABNF

1. Order of declarations. It was fixed in the previous Draft and now
   (in the candidate recommendation) declarations may appear in any
   order.  Actually I find this freedom harmfull for readability: if
   you're used to find always base, language, mode, root, tagFormat,
   lexicon, and meta, then getting the declarations of a new grammar
   is just easier; and writing a new grammar is just as simple.  From
   a software engineering point of view, it's like parameters for a
   function.

   To me, making the order free helps only beginners as grammar
   writers; it's useless if you're not a beginner and it's confusing
   if you're not a writer but a reader.

2. Tag format.  The candidate recommendation states that, for a given
   grammar, lexicon and meta declarations may appear any number of
   times, that language must appear exactly once, but it doesn't state
   that tag-format can appear at most once (which is stated for base
   and root) and I don't think that several tag-format declarations
   make any sense.
   [this can be solved syntactically: by reintroducing the fix order
   of declarations and making tag-format optional at its place, as it
   was in previous version of the document!-)]

   Furthermore, the tag-format should be like the mode: you cannot
   have a grammar in voice mode that uses a rule from a grammar in
   dtmf mode.  Similarly, I don't think you can mix two grammars that
   have different tag formats. Well, you could but it would require
   that further specification of tags is provided in the document.
   Currently, the tag language is defined externally: not only the
   syntax of this language but also its semantics, the kind of
   computed information (character strings, objects, raw data,
   attributes...), the data format, the way information is processed
   along the Logical Parse Structure (Appendix H), i.e. along the
   parse-tree (top-down, bottom-up, depth-first traversal,...), and
   everything I'm forgetting...

   I'd suggest mixing tag formats is forbidden like mixing modes.

WG Response

Your mail contains two comments on the ABNF form.
The first one regards the order of declarations, the second one
tag format declarations.

1. Order of declarations.
This issue has already been discussed during the Last Call period of
SRGS. Further details can be found in the disposition of comments
at http://www.w3.org/2002/06/speech-grammar-comments.html,
specifically in the sections on GC05-1
http://www.w3.org/2002/06/speech-grammar-comments.html#GC05-1 
and GC05-2
http://www.w3.org/2002/06/speech-grammar-comments.html#GC05-2

2. Tag Format.
You say the spec "doesn't state that tag-format can appear at most once".
This is wrong. In Section 4.8 the spec says:
"The ABNF header may contain one tag format declaration."
Moreover, the (normative) Appendix D, which formally specifies the syntax
of the ABNF form, is quite precise on this issue.

However, it might be confusing that the spec uses a different wording
in Sections 4.5 - 4.9 to express that a certain declaration must not
appear more than once. This will be changed and the following wording
will be used in these sections:
"The ABNF header must contain zero or one ... declaration.",
e.g. in Section 4.8:
"The ABNF header must contain zero or one tag format declaration.",

You are right that SRGS does not impose any restrictions on tag formats
across grammars. The issue of mixing different tag formats will be
dealt with in a separate specification on "Semantic Interpretation".

alias attribute of the ruleref element

In reading over the latest candidate version of the SRGS spec for 
clarification on the alias attribute of the ruleref element, I failed 
to see any mention of alias at all. However, it was included in the 
SRGS Working Draft 20 August 2001.

WG Response

Alias names were removed from the spec because the group had the
impression that this feature didn't add any value. In fact, if you
want to have a shorthand name for a uri you can just define a rule
that does nothing more than declaring an "alias name", for example:
 
<rule id="myAlias">
    <ruleref uri="http://www.example.com/some-very-long-path"/>
</rule>

If you put this "alias declaration" into your grammar you can replace
each occurrence of
<ruleref uri="http://www.example.com/some-very-long-path#rule"/>
with
<ruleref uri="#myAlias"/>

Whatever syntax you use for an alias declaration, you will always
have to write at least the full uri and its alias name. In the
group's opinion the (very small) overhead caused by the usage of
the rule and the ruleref elements didn't justify the introduction
of a separate element and attribute for aliases.

Metadata containing both " and ' in ABNF format

According to the 06/27/02 SRGS specification, the ABNF production
for 'meta' and 'http-equiv' allows either double or single quoted
strings for the name and value field.  However, there is no
provision for ecaping the corresponding quote characters. Thus, it
is not possible to provide strings which contain both double quotes
(") and single quotes (') as meta-data.  As the XML format does not
impose such a restriction, it is not possible to guarantee a
translation from an XML format grammar to an ABNF grammar.  

I consider this a problem and suggest adding support for C-style
escape sequences in the "DoubleQuotedCharacters" production of the
lexical grammar of ABNF.

The same applies to the tag format, even though the alternate
delimiter format makes a collision somewhat unlikely.  What was
the reason to use an alternate tag delimiter in favor of escape
sequences as supported by JSpeech?

WG Response

Your mail contains two comments on the ABNF format. The first one
regards delimiters of 'meta' and 'http-equiv' strings, the second
one tag delimiters.

1. Delimiters of 'meta' and 'http-equiv' strings.
Support for escape sequences was not added to the ABNF format because
this would make the syntax more complex while there seem to be no
important use cases for it.
There are several issues regarding the conversion to and from ABNF Form
and XML Form. Section 1.3 of SRGS
(http://www.w3.org/TR/speech-grammar/#S1.3)
contains a list of these issues and states that ABNF Form and XML Form
are specified to ensure that the two representations are semantically
mappable. This means that the semantic performance of a grammar does
not change when it is converted to and from ABNF and XML.
The fact that quote characters in 'meta' and 'http-equiv' strings
cannot be escaped does not affect the semantic performance.

2. Tag Delimiters.
This issue has already been discussed during the Last Call period of
SRGS. Further details can be found in the disposition of comments
http://www.w3.org/2002/06/speech-grammar-comments.html,
specifically in the section on GC09-14
http://www.w3.org/2002/06/speech-grammar-comments.html#GC09-14.

ABNF srgs

http://www.ietf.org/internet-drafts/draft-porter-srgs-media-reg-01.txt

I'm not sure than I'm opposed to this I-D, though it is certainly
unusual for a non-XML MIME type to heavily reference RFC 3023.  However,
could you please explain the reasoning behind encoding the same
on-the-wire grammar with two different (but cross-convertible) syntaxes:
ABNF and XML?

It seems like a huge amount of work for little or no gain.  Certainly in
the context of MIME, there is much better experience in transporting XML
documents (and dealing with related i18n and encoding issues) than the
ABNF grammars that this I-D registers.  Plus, to quote RFC 1958,
Architectural Principles of the Internet, Section 3.2:

  "If there are several ways of doing the same thing, choose one."

I assume this has been thoroughly debated but I cannot find the thread
at <http://lists.w3.org/Archives/Public/www-voice/>.  I see at
<http://www.w3.org/TR/voice-intro/#gram>:  "We anticipate that
development tools will be constructed that provide the familiar ABNF
format to developers, and enable XML software to manipulate the XML
grammar format."  I can understand that developers find ABNF easier to
read and write.

However, is it really necessary for the ABNF format to be released into
the wild (i.e., sent over MIME protocols)?  Wouldn't it improve
simplicity and interoperability to say that ABNF MUST first be converted
to XML before transport?  More strongly, wouldn't the document be more
clear, straightforward and interoperable to say that XML is the
on-the-wire syntax for the grammar, and then separately to specify
reversible conversion back and forth between XML and ABNF?

In that case, this I-D would not need to be registered.

WG Response

Developing two forms of grammar for transport was debated
internally and determined to be requisite to address the needs of the
development community in this space.  The best summary of that
discussion is addressed in the "2nd Last Call Disposition of
Comments".

http://www.w3.org/2002/06/speech-grammar-comments.html#GC08-1

Certainly pre-transformation before transport is an option developers
have (as is the use of any source format if it is transformed in to
the XML or ABNF form of SRGS before transport), but not an option we
wanted to enforce upon the developer community.  In particular, the
belief is that the XML format is verbose to the point that it impinges
on developers doing hand-authoring for prototyping or even full
application development.  We very much did not want to require use of
transformative tools on the development side for simple
prototyping.

ABNF: "public" does too much

In Speech Recognition Grammar Specification, "public" means
so much that it  ends up with being very confusing.

In ABNF (and its XML counterpart) a rule with public scope is
altogether:

- a rule that can be used from an external grammar--let's call
  this feature "export";
- a rule that can be activated and deactivated for speech
  recognition--let's call this "activation";
- a rule that is the top-level syntax of spoken input--let's call
  it "top-level".

The 3 notions are completely different but they are merged
into "public" which creates problems.

1. top-level is not activation

In http://www.w3.org/TR/speech-grammar/#S3.2 and also in 
http://www.w3.org/TR/speech-grammar/#S4.7, those notions are
explicitly considered as synonyms, but now look at the
following grammar:

root $r;
$r = I want to go to $cities;
$cities = $florida | $california | ...;
public $florida = tampa | miami |...;
public $california = SF | LA | ...;

The only sentences you want to allow start with "I want to
go to". Also you want to activate and deactivate Floridian /
Californian cities in order to lower perplexity. However you
don't want your sentences to be limited to bare cities.

This little toy example can have very many different
applications in telephony or in web browsing (in general
you want the activation to apply only to a sub-sentence that
appears in a carrier phrase, not to the whole sentence).
It shows there are rules that may be activated while they
should not be at top-level.

Confusing top-level and activation can be a handicap for
some applications.

2. top-level is not export

It is not because you want a rule to be exported that you
want it to be at top-level as well.

For example say you have a sub-grammar of numbers that you
use in several applications: it's very unlikely that you
wish numbers to be uttered alone--outside any carrier phrase.

As a consequence when you import a rule from a grammar G1
into a grammar G2, you have at top-level: all sentences at
top-level of G2 plus the rule(s) you import from G1.
This looks very awkward to me.

3. export is not activation

This issue is touched in http://www.w3.org/TR/speech-grammar/#AppI
as a "Consideration for Future Versions". I fully agree but I
consider this less of a problem than both the previous ones.

WG Response

As you mentioned, the SRGS spec does not distinguish between the
notions "top-level" and "activation". Therefore, "public" has just
the following two meanings:
- A public rule may be referenced by other grammars;
- A public rule may be activated for recognition (as top-level
  syntax of spoken input)

A possible distinction of these two meanings is one of the "Features
under Consideration for Future Versions" listed in Appendix I of the
SRGS spec.

This spec can't override the URI/HTTP specs this way...

"A URI reference may be accompanied by a media type that
indicates the content type of the resource identified by
the URI. When specified, this type value takes precedence
over other possible sources of the media type (for instance,
the "Content-type" field in an HTTP exchange, or the 
file extension)."
  -- http://www.w3.org/TR/2002/CR-speech-grammar-20020626/#S2.2.2

I suggest you fix it to work like in other specs...

"This attribute gives an advisory hint as to the content
type of the content available at the link target address."
 -- http://www.w3.org/TR/html401/struct/links.html#adef-type-A

Please add a test case to clarify how this works and let me
know when it's available.

WG Response

This issue has been discussed many times within the group. It is
explicitly discussed in the Last Call Disposition of Comments -
http://www.w3.org/2002/06/speech-grammar-comments.html#GC09-20 -
where we took into account the W3C Director's requested modification.
Since the SMIL 2.0 Recommendation uses 'type' in a similar way, we
believe there are other specs which set the precedence and, at this
stage, no 'fix' is required. However, if further evidence comes to
light, please let us know.

The SRGS testsuite already contains tests for this feature.

Follow-up

On Fri, 2003-02-07 at 07:07, Scott McGlashan wrote:
> Hi Dan,
> 
> Thank you for your public comments on SRGS 1.0. 
> 
> This issue has been discussed many times within the group.
> It is explicitly discussed in the Last Call Disposition of
> Comments -
> http://www.w3.org/2002/06/speech-grammar-comments.html#GC09-20
> - where we took into account the W3C Director's requested
> modification. Since the SMIL 2.0 Recommendation uses 'type'
> in a similar way, we believe there are other specs which set
> the precedence and, at this stage, no'fix' is required.

I disagree; I don't find this a satisfactory justification for
declining my request. In fact, I don't see any technical
justification the way the spec is at all.

> However, if further evidence comes to light, please
> let us know.

No, the burden is on you to (attempt to) satisfy me.

"5.2.4 Proposed Recommendation (PR)

Entrance criteria. Before advancing a technical report to Proposed
Recommendation, the Director must be satisfied that:

[...]
   2. the Working Group has formally addressed issues raised during
the previous review or implementation period (possibly modifying the
technical report)"
 -- http://www.w3.org/Consortium/Process-20010719/tr.html#RecsCR


If you aren't interested in negotiating further based
on the information I sent, be sure to note this as
outstanding dissent when you request Proposed
Rec status.


> The SRGS testsuite already contains tests for this feature.

Pointer, please?

Follow-up

From: Brad Porter <brad@tellme.com> Date: 07 Feb 2003

HTML and SMIL are in clear conflict on their use of the type
attribute.  Other specifications do not make a clear statement either
way.  I have not seen a clear statement from the TAG yet.  I have seen
substantial email threads debating this issue in different working
groups without clear consensus.

As is documented in the comments, we did work to address this question
with Martin.  The working group did choose to follow the language and
use from SMIL for the reason that practically speaking not all web
servers return the right MIME type for the content.  If you are not
satisfied with the details provided in the response, we would
certainly be happy to discuss it further.

I personally would welcome the TAG addressing this issue and I would
be very willing to participate in such a discussion.

Followup

From: Chris Lilley <chris@w3.org> Date: 7 Feb 2003

BP> Hopefully you didn't intend your comments to sound as inflamatory
BP> as they might be interpreted.

I am sure Dn did not intend to be inflamatory, any more than the
initial response intended to be dismissive.

BP> HTML and SMIL are in clear conflict on their use of the type
BP> attribute.

Further, SMIL is in conflict with itself on the type attribute,
depending on what element it is used and what the transport protocol
is.

SVG also uses a type attribute, as an informative hint and as a way to
allow client-side selection from available media.

BP>  Other specifications do not make a clear statement either way.

They do, in fact.

BP> I have not seen a clear statement from the TAG yet.

No, but you will and I hope you will take part in the preceeding
discussion.

Dans statement was a first heads up, as a matter of courtesy, that the
TAG has an open issue on this subject.

BP> I have seen
BP> substantial email threads debating this issue in different
BP> working groups without clear consensus.

I would appreciate pointers to such, particularly those that
considered retyping was desirable.

BP> As is documented in the comments, we did work to address this
BP> question with Martin. The working group did choose to follow the
BP> language and use from SMIL for the reason that practically
BP> speaking not all web servers return the right MIME type for the
BP> content.

Aha. We suspected that might be the reason. The problem is that this
transparent fixup (and sniffing in general) has a number of
undesirable knockon effects.

BP> If you are not satisfied with the details provided in the
BP> response, we would certainly be happy to discuss it further.

I would encourage you to do this.

BP> I personally would welcome the TAG addressing this issue and I
BP> would be very willing to participate in such a discussion.

Thanks, this is appreciated.

Follup-up

From: Brad Porter <brad@tellme.com> Date: 07 Feb 2003

By the way, just to be clear, I personally completely agree with
the architectural impurity of having local type information take
precedence. Though I also completely agree with fact that
unregistered mime types exist and web server configurations are not
always correct.  Which is why I'm very keen to have a definitive
statement and a practical plan to make the choice stick, as when it
comes to a choice of being conformant or working with a wider range
of content providers, business motivations can prevail.

(comments and links embedded below)

--Brad


Chris Lilley wrote:

> On Friday, February 7, 2003, 8:59:40 PM, Brad wrote:
>
> BP> Dan,
>
> BP> Hopefully you didn't intend your comments to sound as
> BP> inflamatory as they might be interpreted.
>
> I am sure Dn did not intend to be inflamatory, any more than the
> initial response intended to be dismissive.
>
> BP> HTML and SMIL are in clear conflict on their use of
> the type attribute.
>
> Further, SMIL is in conflict with itself on the type attribute,
> depending on what element it is used and what the transport
> protocol is.
>
> SVG also uses a type attribute, as an informative hint and as
> a way to allow client-side selection from available media.
>
> BP>  Other specifications do not make a clear statement
> either way.
>
> They do, in fact.

In SMIL, local value takes precedence.  In HTML 4.01, type is a hint.
In XHTML 2.0 type is a definition of the allowable mime types for
that resource, see XHTML2 mod attribute collections type.

I find the statement of precedence ambiguous here as even though
the type is "advisory" it doesn't specify the precedence.
http://www.w3.org/TR/xmldsig-core/#sec-o-SignatureProperty

> BP> I have not seen a clear statement from the TAG yet.
>
> No, but you will and I hope you will take part in the
> preceeding discussion.
>
> Dans statement was a first heads up, as a matter of courtesy,
> that the TAG has an open issue on this subject.
>
> BP> I have seen
> BP> substantial email threads debating this issue in
> BP> different working groups without clear consensus.
>
> I would appreciate pointers to such, particularly those that
> considered retyping was desirable.

Here are some threads with relevant discussion that I found quickly.
There may be more:

http://lists.w3.org/Archives/Public/www-html/1999Aug/0035.html
http://lists.w3.org/Archives/Public/www-html/1998Jan/0076.html
http://lists.w3.org/Archives/Public/www-html/1999May/0011.html
http://lists.w3.org/Archives/Public/www-html/2002Aug/0346.html
http://lists.w3.org/Archives/Public/w3c-wai-er-ig/2000Jan/0123.html

Follow-up

From: Dan Connolly <connolly@w3.org> Date: 07 Feb 2003

On Fri, 2003-02-07 at 11:59, Brad Porter wrote:
> Dan,
> 
> Hopefully you didn't intend your comments to sound as
> inflamatory as they might be interpreted.

Hmm... yes, sorry to be curt. I'm typing with one good hand; broke my
finger and had surgery earlier this week...

> HTML and SMIL are in clear conflict on their use of the type
attribute.

Yes; HTML is right and SMIL is wrong. 1/2 ;-)

Formats can't override protocols that carry them (nor can protocols
screw with data formats.

Dunno how SMIL got to be that way. I don't think it justifies
the SGRS design, though reasonable people may disagree...

>  Other specifications do not make a clear statement either way.
> I have not seen a clear statement from the TAG yet.

I think maybe the TAG may look at this, but regardless, my
comment stands.

>  I have seen substantial email threads debating this issue in
> different working groups without clear consensus.

OK, but where we are right now is me asking you to change your
spec or tell me why not, in technical terms.


> As is documented in the comments, we did work to address this
> question with Martin.  The working group did choose to follow
> the language and use from SMIL for the reason that practically
> speaking not all web servers return the right MIME type for
> the content.

I see.

Well, that is a technical justification; I appreciate that, but...

>  If you are not satisfied with the details provided
> in the response,

Looking over
  http://www.w3.org/2002/06/speech-grammar-comments.html#GC09-20

in detail, no I'm not satisfied. I still think the spec
should change.

> we would certainly be happy to discuss it further.

I'd like to study the test case. I'd appreciate
a pointer, though perhaps I can find it myself.

> I personally would welcome the TAG addressing this issue and
> I would be very willing to participate in such a discussion.

Meanwhile, as I said, my request/comment stands.

WG Response

Following several teleconferences with the TAG

If a media type is returned by the protocol, then it is authoritative:
it cannot be override by the grammar processor even if it does not
match the actual media type of the resource or cannot be processed as
a grammar. The value of the 'type' attribute may be used to influence
content type negotiation (in HTTP 1.1 for example) and, only if no
media type is returned by the protocol, becomes the authorative media
type for the resource.


http://lists.w3.org/Archives/Member/w3c-voice-wg/2003Jul/0085.html.

necessity of the <token> tag

I just wondered why there is the <token> tag in the SRGS
specification.  Where is the difference to include PCDATA in <item>
.. </item>?

WG Response

The difference is that the content of the
<token> element is treated as a single token,
e.g.:
<token>San Francisco</token> is a single token, whereas
<item>San Francisco</item> is a sequence of two tokens.