The candidate recommendation states in Section 4.7: "The rule declared as the root rule may be scoped as either public or private." This is actually a misleading statement because even if the root is declared private, it always behaves like a public rule (which is good!): 1. the following sentence of Section 4.7 states that the root is always activable: "A rule reference to the root rule of a grammar is legal." It seems clear this holds even if the rule is private; this impression is confirmed by Section 5.4: "A conforming Grammar Processor (...) must be able to activate the root, any single public rule, or any set of public rules or roots". 2. the next parag of Section 4.7 states that the root is always activable: "The root rule may be activated for recognition." Now the purpose of the scope of a rule is to set two properties: activation and exportation, which is confirmed by Appendix I: "- Possible distinction of "activable" and "exported" rules (currently merged as "public")". In fact, I think that setting the scope of the root should not be allowed because it can only be public. About the point of Appendix I, I regreat that the addressed distinction is not effective right now but I can live with it; after all, being public can be seen from two perspectives: the grammars and the engine. However I am much more surprised by another confusion: being activable is considered as being both on/off switchable at engine run-time, and also being the "start symbol" (as defined in [HU79]: http://www.w3.org/TR/speech-grammar/#ref-hu79). Section 3.2 actually reads: "Rules with public scope may be activated for recognition. That is they may define the top-level syntax of spoken input." ("top-level syntax" meaning "start symbol" I guess) Now if you consider the grammar: $root = I want to be connected to $people [please]; $people = $people_1st_floor | $people_2nd_floor ; When you want to deactivate $people_2nd_floor, you have to declare it public, but then, it also becomes a start symbol, which allows for undesired spoken utterances. I find this confusion much more a problem than exportation/activation. This issue is even not addressed in Appendix I. In the same vein, I think that the status of the start symbol naturally belongs to the root and only to it.
Your mail contains two comments. The first one regards the scope of root rules, the second one the meaning of "activable". 1. Scope of root rules. The group does not think that the sentence in section 4.7 ("The rule declared as the root rule may be scoped as either public or private.") is misleading. The spec clearly distinguishes between a reference to the root rule of a grammar and a reference to a named rule. According to section 2.2 of the SRGS spec <ruleref uri="grammarURI"/> is a reference to a root rule, and <ruleref uri="grammarURI#rulename"/> is a reference to a named rule, whether or not the referenced rule is declared as the root. (Further details are explained in section 2.2.2.) A reference to a named rule is valid only if the referenced rule is declared to be public. A reference to a grammar without a fragment identifier, i.e. a reference to a root rule, is valid even if the root rule of the referenced grammar is declared to be private. (See Section 4.7 for details.) The same reasoning applies to the activation of rules. However, the wording of the second and third paragraphs in Section 3.2 seems to be a little confusing: "A rule with 'private' scope is visible only within its containing grammar. A private rule may be referenced only by other rules within the same grammar. One exception is that a rule declared as the root may be referenced externally even if it is a private rule. See Section 4.7 for details." This wording will be changed in order to make the difference in semantics between a reference to a rule and a reference to a grammar clearer. 2. Meaning of "activable". Your interpretation of what the spec says about the activation of rules seems to be not quite correct. Being "activable" is not on/off switchable at engine run-time. The grammar defines which rules can be activated. At run-time an application may choose which subset of the "activable" (i.e. public or root) rules are active for recognition as stated in section 3.2. If a rule is active then the recognizer may apply each alternative that is defined for this rule. The same holds for any (directly or indirectly referenced) rule that the recognizer encounters on its way down the tree. There is no mechanism to dynamically restrict a given grammar. Specifically, when the root rule in your example is activated, both the '$people_1st_floor' and the '$people_2nd_floor' rules may be used during recognition, even though these rules are not active by themselves.
1. Order of declarations. It was fixed in the previous Draft and now (in the candidate recommendation) declarations may appear in any order. Actually I find this freedom harmfull for readability: if you're used to find always base, language, mode, root, tagFormat, lexicon, and meta, then getting the declarations of a new grammar is just easier; and writing a new grammar is just as simple. From a software engineering point of view, it's like parameters for a function. To me, making the order free helps only beginners as grammar writers; it's useless if you're not a beginner and it's confusing if you're not a writer but a reader. 2. Tag format. The candidate recommendation states that, for a given grammar, lexicon and meta declarations may appear any number of times, that language must appear exactly once, but it doesn't state that tag-format can appear at most once (which is stated for base and root) and I don't think that several tag-format declarations make any sense. [this can be solved syntactically: by reintroducing the fix order of declarations and making tag-format optional at its place, as it was in previous version of the document!-)] Furthermore, the tag-format should be like the mode: you cannot have a grammar in voice mode that uses a rule from a grammar in dtmf mode. Similarly, I don't think you can mix two grammars that have different tag formats. Well, you could but it would require that further specification of tags is provided in the document. Currently, the tag language is defined externally: not only the syntax of this language but also its semantics, the kind of computed information (character strings, objects, raw data, attributes...), the data format, the way information is processed along the Logical Parse Structure (Appendix H), i.e. along the parse-tree (top-down, bottom-up, depth-first traversal,...), and everything I'm forgetting... I'd suggest mixing tag formats is forbidden like mixing modes.
Your mail contains two comments on the ABNF form. The first one regards the order of declarations, the second one tag format declarations. 1. Order of declarations. This issue has already been discussed during the Last Call period of SRGS. Further details can be found in the disposition of comments at http://www.w3.org/2002/06/speech-grammar-comments.html, specifically in the sections on GC05-1 http://www.w3.org/2002/06/speech-grammar-comments.html#GC05-1 and GC05-2 http://www.w3.org/2002/06/speech-grammar-comments.html#GC05-2 2. Tag Format. You say the spec "doesn't state that tag-format can appear at most once". This is wrong. In Section 4.8 the spec says: "The ABNF header may contain one tag format declaration." Moreover, the (normative) Appendix D, which formally specifies the syntax of the ABNF form, is quite precise on this issue. However, it might be confusing that the spec uses a different wording in Sections 4.5 - 4.9 to express that a certain declaration must not appear more than once. This will be changed and the following wording will be used in these sections: "The ABNF header must contain zero or one ... declaration.", e.g. in Section 4.8: "The ABNF header must contain zero or one tag format declaration.", You are right that SRGS does not impose any restrictions on tag formats across grammars. The issue of mixing different tag formats will be dealt with in a separate specification on "Semantic Interpretation".
In reading over the latest candidate version of the SRGS spec for clarification on the alias attribute of the ruleref element, I failed to see any mention of alias at all. However, it was included in the SRGS Working Draft 20 August 2001.
Alias names were removed from the spec because the group had the impression that this feature didn't add any value. In fact, if you want to have a shorthand name for a uri you can just define a rule that does nothing more than declaring an "alias name", for example: <rule id="myAlias"> <ruleref uri="http://www.example.com/some-very-long-path"/> </rule> If you put this "alias declaration" into your grammar you can replace each occurrence of <ruleref uri="http://www.example.com/some-very-long-path#rule"/> with <ruleref uri="#myAlias"/> Whatever syntax you use for an alias declaration, you will always have to write at least the full uri and its alias name. In the group's opinion the (very small) overhead caused by the usage of the rule and the ruleref elements didn't justify the introduction of a separate element and attribute for aliases.
According to the 06/27/02 SRGS specification, the ABNF production for 'meta' and 'http-equiv' allows either double or single quoted strings for the name and value field. However, there is no provision for ecaping the corresponding quote characters. Thus, it is not possible to provide strings which contain both double quotes (") and single quotes (') as meta-data. As the XML format does not impose such a restriction, it is not possible to guarantee a translation from an XML format grammar to an ABNF grammar. I consider this a problem and suggest adding support for C-style escape sequences in the "DoubleQuotedCharacters" production of the lexical grammar of ABNF. The same applies to the tag format, even though the alternate delimiter format makes a collision somewhat unlikely. What was the reason to use an alternate tag delimiter in favor of escape sequences as supported by JSpeech?
Your mail contains two comments on the ABNF format. The first one regards delimiters of 'meta' and 'http-equiv' strings, the second one tag delimiters. 1. Delimiters of 'meta' and 'http-equiv' strings. Support for escape sequences was not added to the ABNF format because this would make the syntax more complex while there seem to be no important use cases for it. There are several issues regarding the conversion to and from ABNF Form and XML Form. Section 1.3 of SRGS (http://www.w3.org/TR/speech-grammar/#S1.3) contains a list of these issues and states that ABNF Form and XML Form are specified to ensure that the two representations are semantically mappable. This means that the semantic performance of a grammar does not change when it is converted to and from ABNF and XML. The fact that quote characters in 'meta' and 'http-equiv' strings cannot be escaped does not affect the semantic performance. 2. Tag Delimiters. This issue has already been discussed during the Last Call period of SRGS. Further details can be found in the disposition of comments http://www.w3.org/2002/06/speech-grammar-comments.html, specifically in the section on GC09-14 http://www.w3.org/2002/06/speech-grammar-comments.html#GC09-14.
http://www.ietf.org/internet-drafts/draft-porter-srgs-media-reg-01.txt I'm not sure than I'm opposed to this I-D, though it is certainly unusual for a non-XML MIME type to heavily reference RFC 3023. However, could you please explain the reasoning behind encoding the same on-the-wire grammar with two different (but cross-convertible) syntaxes: ABNF and XML? It seems like a huge amount of work for little or no gain. Certainly in the context of MIME, there is much better experience in transporting XML documents (and dealing with related i18n and encoding issues) than the ABNF grammars that this I-D registers. Plus, to quote RFC 1958, Architectural Principles of the Internet, Section 3.2: "If there are several ways of doing the same thing, choose one." I assume this has been thoroughly debated but I cannot find the thread at <http://lists.w3.org/Archives/Public/www-voice/>. I see at <http://www.w3.org/TR/voice-intro/#gram>: "We anticipate that development tools will be constructed that provide the familiar ABNF format to developers, and enable XML software to manipulate the XML grammar format." I can understand that developers find ABNF easier to read and write. However, is it really necessary for the ABNF format to be released into the wild (i.e., sent over MIME protocols)? Wouldn't it improve simplicity and interoperability to say that ABNF MUST first be converted to XML before transport? More strongly, wouldn't the document be more clear, straightforward and interoperable to say that XML is the on-the-wire syntax for the grammar, and then separately to specify reversible conversion back and forth between XML and ABNF? In that case, this I-D would not need to be registered.
Developing two forms of grammar for transport was debated internally and determined to be requisite to address the needs of the development community in this space. The best summary of that discussion is addressed in the "2nd Last Call Disposition of Comments". http://www.w3.org/2002/06/speech-grammar-comments.html#GC08-1 Certainly pre-transformation before transport is an option developers have (as is the use of any source format if it is transformed in to the XML or ABNF form of SRGS before transport), but not an option we wanted to enforce upon the developer community. In particular, the belief is that the XML format is verbose to the point that it impinges on developers doing hand-authoring for prototyping or even full application development. We very much did not want to require use of transformative tools on the development side for simple prototyping.
In Speech Recognition Grammar Specification, "public" means so much that it ends up with being very confusing. In ABNF (and its XML counterpart) a rule with public scope is altogether: - a rule that can be used from an external grammar--let's call this feature "export"; - a rule that can be activated and deactivated for speech recognition--let's call this "activation"; - a rule that is the top-level syntax of spoken input--let's call it "top-level". The 3 notions are completely different but they are merged into "public" which creates problems. 1. top-level is not activation In http://www.w3.org/TR/speech-grammar/#S3.2 and also in http://www.w3.org/TR/speech-grammar/#S4.7, those notions are explicitly considered as synonyms, but now look at the following grammar: root $r; $r = I want to go to $cities; $cities = $florida | $california | ...; public $florida = tampa | miami |...; public $california = SF | LA | ...; The only sentences you want to allow start with "I want to go to". Also you want to activate and deactivate Floridian / Californian cities in order to lower perplexity. However you don't want your sentences to be limited to bare cities. This little toy example can have very many different applications in telephony or in web browsing (in general you want the activation to apply only to a sub-sentence that appears in a carrier phrase, not to the whole sentence). It shows there are rules that may be activated while they should not be at top-level. Confusing top-level and activation can be a handicap for some applications. 2. top-level is not export It is not because you want a rule to be exported that you want it to be at top-level as well. For example say you have a sub-grammar of numbers that you use in several applications: it's very unlikely that you wish numbers to be uttered alone--outside any carrier phrase. As a consequence when you import a rule from a grammar G1 into a grammar G2, you have at top-level: all sentences at top-level of G2 plus the rule(s) you import from G1. This looks very awkward to me. 3. export is not activation This issue is touched in http://www.w3.org/TR/speech-grammar/#AppI as a "Consideration for Future Versions". I fully agree but I consider this less of a problem than both the previous ones.
As you mentioned, the SRGS spec does not distinguish between the notions "top-level" and "activation". Therefore, "public" has just the following two meanings: - A public rule may be referenced by other grammars; - A public rule may be activated for recognition (as top-level syntax of spoken input) A possible distinction of these two meanings is one of the "Features under Consideration for Future Versions" listed in Appendix I of the SRGS spec.
This spec can't override the URI/HTTP specs this way... "A URI reference may be accompanied by a media type that indicates the content type of the resource identified by the URI. When specified, this type value takes precedence over other possible sources of the media type (for instance, the "Content-type" field in an HTTP exchange, or the file extension)." -- http://www.w3.org/TR/2002/CR-speech-grammar-20020626/#S2.2.2 I suggest you fix it to work like in other specs... "This attribute gives an advisory hint as to the content type of the content available at the link target address." -- http://www.w3.org/TR/html401/struct/links.html#adef-type-A Please add a test case to clarify how this works and let me know when it's available.
This issue has been discussed many times within the group. It is explicitly discussed in the Last Call Disposition of Comments - http://www.w3.org/2002/06/speech-grammar-comments.html#GC09-20 - where we took into account the W3C Director's requested modification. Since the SMIL 2.0 Recommendation uses 'type' in a similar way, we believe there are other specs which set the precedence and, at this stage, no 'fix' is required. However, if further evidence comes to light, please let us know. The SRGS testsuite already contains tests for this feature.
On Fri, 2003-02-07 at 07:07, Scott McGlashan wrote: > Hi Dan, > > Thank you for your public comments on SRGS 1.0. > > This issue has been discussed many times within the group. > It is explicitly discussed in the Last Call Disposition of > Comments - > http://www.w3.org/2002/06/speech-grammar-comments.html#GC09-20 > - where we took into account the W3C Director's requested > modification. Since the SMIL 2.0 Recommendation uses 'type' > in a similar way, we believe there are other specs which set > the precedence and, at this stage, no'fix' is required. I disagree; I don't find this a satisfactory justification for declining my request. In fact, I don't see any technical justification the way the spec is at all. > However, if further evidence comes to light, please > let us know. No, the burden is on you to (attempt to) satisfy me. "5.2.4 Proposed Recommendation (PR) Entrance criteria. Before advancing a technical report to Proposed Recommendation, the Director must be satisfied that: [...] 2. the Working Group has formally addressed issues raised during the previous review or implementation period (possibly modifying the technical report)" -- http://www.w3.org/Consortium/Process-20010719/tr.html#RecsCR If you aren't interested in negotiating further based on the information I sent, be sure to note this as outstanding dissent when you request Proposed Rec status. > The SRGS testsuite already contains tests for this feature. Pointer, please?
From: Brad Porter <brad@tellme.com> Date: 07 Feb 2003
HTML and SMIL are in clear conflict on their use of the type attribute. Other specifications do not make a clear statement either way. I have not seen a clear statement from the TAG yet. I have seen substantial email threads debating this issue in different working groups without clear consensus. As is documented in the comments, we did work to address this question with Martin. The working group did choose to follow the language and use from SMIL for the reason that practically speaking not all web servers return the right MIME type for the content. If you are not satisfied with the details provided in the response, we would certainly be happy to discuss it further. I personally would welcome the TAG addressing this issue and I would be very willing to participate in such a discussion.
From: Chris Lilley <chris@w3.org> Date: 7 Feb 2003
BP> Hopefully you didn't intend your comments to sound as inflamatory BP> as they might be interpreted. I am sure Dn did not intend to be inflamatory, any more than the initial response intended to be dismissive. BP> HTML and SMIL are in clear conflict on their use of the type BP> attribute. Further, SMIL is in conflict with itself on the type attribute, depending on what element it is used and what the transport protocol is. SVG also uses a type attribute, as an informative hint and as a way to allow client-side selection from available media. BP> Other specifications do not make a clear statement either way. They do, in fact. BP> I have not seen a clear statement from the TAG yet. No, but you will and I hope you will take part in the preceeding discussion. Dans statement was a first heads up, as a matter of courtesy, that the TAG has an open issue on this subject. BP> I have seen BP> substantial email threads debating this issue in different BP> working groups without clear consensus. I would appreciate pointers to such, particularly those that considered retyping was desirable. BP> As is documented in the comments, we did work to address this BP> question with Martin. The working group did choose to follow the BP> language and use from SMIL for the reason that practically BP> speaking not all web servers return the right MIME type for the BP> content. Aha. We suspected that might be the reason. The problem is that this transparent fixup (and sniffing in general) has a number of undesirable knockon effects. BP> If you are not satisfied with the details provided in the BP> response, we would certainly be happy to discuss it further. I would encourage you to do this. BP> I personally would welcome the TAG addressing this issue and I BP> would be very willing to participate in such a discussion. Thanks, this is appreciated.
From: Brad Porter <brad@tellme.com> Date: 07 Feb 2003
By the way, just to be clear, I personally completely agree with the architectural impurity of having local type information take precedence. Though I also completely agree with fact that unregistered mime types exist and web server configurations are not always correct. Which is why I'm very keen to have a definitive statement and a practical plan to make the choice stick, as when it comes to a choice of being conformant or working with a wider range of content providers, business motivations can prevail. (comments and links embedded below) --Brad Chris Lilley wrote: > On Friday, February 7, 2003, 8:59:40 PM, Brad wrote: > > BP> Dan, > > BP> Hopefully you didn't intend your comments to sound as > BP> inflamatory as they might be interpreted. > > I am sure Dn did not intend to be inflamatory, any more than the > initial response intended to be dismissive. > > BP> HTML and SMIL are in clear conflict on their use of > the type attribute. > > Further, SMIL is in conflict with itself on the type attribute, > depending on what element it is used and what the transport > protocol is. > > SVG also uses a type attribute, as an informative hint and as > a way to allow client-side selection from available media. > > BP> Other specifications do not make a clear statement > either way. > > They do, in fact. In SMIL, local value takes precedence. In HTML 4.01, type is a hint. In XHTML 2.0 type is a definition of the allowable mime types for that resource, see XHTML2 mod attribute collections type. I find the statement of precedence ambiguous here as even though the type is "advisory" it doesn't specify the precedence. http://www.w3.org/TR/xmldsig-core/#sec-o-SignatureProperty > BP> I have not seen a clear statement from the TAG yet. > > No, but you will and I hope you will take part in the > preceeding discussion. > > Dans statement was a first heads up, as a matter of courtesy, > that the TAG has an open issue on this subject. > > BP> I have seen > BP> substantial email threads debating this issue in > BP> different working groups without clear consensus. > > I would appreciate pointers to such, particularly those that > considered retyping was desirable. Here are some threads with relevant discussion that I found quickly. There may be more: http://lists.w3.org/Archives/Public/www-html/1999Aug/0035.html http://lists.w3.org/Archives/Public/www-html/1998Jan/0076.html http://lists.w3.org/Archives/Public/www-html/1999May/0011.html http://lists.w3.org/Archives/Public/www-html/2002Aug/0346.html http://lists.w3.org/Archives/Public/w3c-wai-er-ig/2000Jan/0123.html
From: Dan Connolly <connolly@w3.org> Date: 07 Feb 2003
On Fri, 2003-02-07 at 11:59, Brad Porter wrote: > Dan, > > Hopefully you didn't intend your comments to sound as > inflamatory as they might be interpreted. Hmm... yes, sorry to be curt. I'm typing with one good hand; broke my finger and had surgery earlier this week... > HTML and SMIL are in clear conflict on their use of the type attribute. Yes; HTML is right and SMIL is wrong. 1/2 ;-) Formats can't override protocols that carry them (nor can protocols screw with data formats. Dunno how SMIL got to be that way. I don't think it justifies the SGRS design, though reasonable people may disagree... > Other specifications do not make a clear statement either way. > I have not seen a clear statement from the TAG yet. I think maybe the TAG may look at this, but regardless, my comment stands. > I have seen substantial email threads debating this issue in > different working groups without clear consensus. OK, but where we are right now is me asking you to change your spec or tell me why not, in technical terms. > As is documented in the comments, we did work to address this > question with Martin. The working group did choose to follow > the language and use from SMIL for the reason that practically > speaking not all web servers return the right MIME type for > the content. I see. Well, that is a technical justification; I appreciate that, but... > If you are not satisfied with the details provided > in the response, Looking over http://www.w3.org/2002/06/speech-grammar-comments.html#GC09-20 in detail, no I'm not satisfied. I still think the spec should change. > we would certainly be happy to discuss it further. I'd like to study the test case. I'd appreciate a pointer, though perhaps I can find it myself. > I personally would welcome the TAG addressing this issue and > I would be very willing to participate in such a discussion. Meanwhile, as I said, my request/comment stands.
Following several teleconferences with the TAG
If a media type is returned by the protocol, then it is authoritative: it cannot be override by the grammar processor even if it does not match the actual media type of the resource or cannot be processed as a grammar. The value of the 'type' attribute may be used to influence content type negotiation (in HTTP 1.1 for example) and, only if no media type is returned by the protocol, becomes the authorative media type for the resource. http://lists.w3.org/Archives/Member/w3c-voice-wg/2003Jul/0085.html.
I just wondered why there is the <token> tag in the SRGS specification. Where is the difference to include PCDATA in <item> .. </item>?
The difference is that the content of the <token> element is treated as a single token, e.g.: <token>San Francisco</token> is a single token, whereas <item>San Francisco</item> is a sequence of two tokens.