This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
A.1 EBNF (sectioning) I think that the EBNF productions and the explanation of the EBNF notation should each be split into a separate section. ---------------------------------------- preamble "with the following minor differences." You've removed the phrase "except that grammar symbols always have initial capital letters" even though it's still true, still different from the notation used in the XML spec, and still unexplained. [Leftover from qt-2004Feb0317-01] "a grouping of terminals that together may help disambiguate the individual symbols." This (along with its repeat in A.2) is another (but I hope the last) misuse of the word "disambiguate" in its technical sense. Instead, you might say "... may help a parser differentiate various constructs", or just "... may help a parser do its job". (And similarly for the repeat of this sentence in A.2.) You should make clear that angle-bracket-groups have no definitional significance. That is, their presence in the EBNF has no effect on the set of syntactically legal queries defined by the grammar. (Assuming that's true. If not, then you've got some explaining to do.) "To help readability, this "< ... >" notation is absent in the EBNF in the main body of this document. This appendix is the normative version of the EBNF." You could say the same of comments on grammar productions. "Comments on grammar productions" Note that the XML spec's grammar has production comments, so it's not their *presence* here that's different, but rather their normative power. "clarification for parsing rules" Some grammar notes are not mere clarification, they actually affect the set of legal queries. Pulling some of this together, how about restructuring the preamble into something like this: The following grammar .... differences: o All named symbols have a name that begins with an uppercase letter (unlike the XML spec, where some names began with lowercase letters to draw a distinction ...) o It adds a notation for referring to productions in external specs. o (...angle-bracket groups...) o Production comments are normative. These features are described in more detail in [the Notation section]. To increase readability, the EBNF in the main body of this document omits some of these notational features. This appendix is the normative version of the EBNF. ---------------------------------------- productions "[66] PragmaContents" "[146] Digits" "[155] CommentContents" These should be marked "ws: explicit". [96] DirAttributeValue ::= ('"' (EscapeQuot | QuotAttrValueContent)* '"') | ("'" (EscapeApos | AposAttrValueContent)* "'") [97] QuotAttrValueContent ::= QuotAttrContentChar | CommonContent [98] AposAttrValueContent ::= AposAttrContentChar | CommonContent I think these would be clearer if you didn't split each of the choices over two productions. Instead, how about: [96] DirAttributeValue ::= ('"' QuotAttrValueContent* '"') | ("'" AposAttrValueContent* "'") [97] QuotAttrValueContent ::= EscapeQuot | QuotAttrContentChar | CommonContent [98] AposAttrValueContent ::= EscapeApos | AposAttrContentChar | CommonContent [142] StringLiteral Change ('"' '"') to EscapeQuot. Change ("'" "'") to EscapeApos. (*ContentChar) If you factor out the overlap of ElementContentChar, QuotAttrContentChar, and AposAttrContentChar, and push it over to CommonContent, I think the result is simpler. That is, change this: [97] QuotAttrValueContent ::= EscapeQuot | QuotAttrContentChar | CommonContent [98] AposAttrValueContent ::= EscapeApos | AposAttrContentChar | CommonContent [99] DirElemContent ::= DirectConstructor | CDataSection | ElementContentChar | CommonContent [100] CommonContent ::= ... [151] ElementContentChar ::= Char - [{}<&] [152] QuotAttrContentChar ::= Char - ["{}<&] [153] AposAttrContentChar ::= Char - ['{}<&] to this: [97] QuotAttrValueContent ::= EscapeQuot | "'" | CommonContent [98] AposAttrValueContent ::= EscapeApos | '"' | CommonContent [99] DirElemContent ::= DirectConstructor | CDataSection | '"' | "'" | CommonContent [100] CommonContent ::= [^"'{}<&] | ... [151] ElementContentChar delete [152] QuotAttrContentChar delete [153] AposAttrContentChar delete Just a thought. (explicits) I wonder if it would help the reader if the "ws: explicit" productions (and the intervening ones that don't care whether they're "ws: explicit" or not) were put together in a group. Specifically: [65-66] [79] [93-106] [138-159] Then, instead of "productions marked with 'ws: explicit'", you might say "productions in the [whatever] group".
> You've removed the phrase "except that grammar symbols always have initial > capital letters" even though it's still true, still different from the > notation used in the XML spec, and still unexplained. MSM and I should have a conversation about this. I'm curious as to why, in the XML spec, there is: [22] prolog ::= XMLDecl? Misc* (doctypedecl Misc*)? vs. [24] VersionInfo ::= S 'version' Eq ("'" VersionNum "'" | '"' VersionNum '"') Section 6 states "Symbols are written with an initial capital letter if they are the start symbol of a regular language, otherwise with an initial lowercase letter." But it seems like a fuzzy line. I would like to be as consistent with the XML spec as possible. How much trouble would it cause to change capitalization on some symbol names? > Note that the XML spec's grammar has production comments, > so it's not their > *presence* here that's different, but rather their normative power. I think they're normative in the XML spec too, though they're not used to help define the grammar itself. In any case, the paragraph explaining the comments was not meant to be comparitive with the XML spec.
Scott Boag writes: I'm curious as to why, in the XML spec, there is: [22] prolog ::= XMLDecl? Misc* (doctypedecl Misc*)? vs. [24] VersionInfo ::= S 'version' Eq ("'" VersionNum "'" | '"' VersionNum '"') Section 6 states "Symbols are written with an initial capital letter if they are the start symbol of a regular language, otherwise with an initial lowercase letter." But it seems like a fuzzy line. The XML WG may have made errors in drawing the line, but whether a particular language over the alphabet of Unicode characters is regular or not, in the technical sense, should not be fuzzy. The language defined by a non-terminal in a regular right-part grammar is regular if and only if non-terminals on the right-hand side can be replaced (iteratively) until there is nothing there but terminal symbols (in this case, Unicode characters or expressions like [a-zA-Z]). This, in turn, is the case if there is no recursion in the grammar rules (no left-hand symbol turns up directly or indirectly in its own right-hand side). If all the symbols in a right-hand side are known to denote regular languages, then the symbol on the left-hand side also denotes a regular language; if any symbol on the right denotes a non-regular language, then the language of the left-hand side symbol is non-regular. Consider the examples above. The language defined by using the XML 1.0 grammar with 'doctypedecl' as start symbol (I'll just call this 'the language of doctypedecl' or 'the language denoted by doctypedecl' in what follows) is not regular, because a doctypedecl can contain an internal subset (intSubset), which can contain element declarations (via markupdecl and elementdecl), which can contain content models for elements with element content (via contentspec and children). Such content models are not regular, because they require that opening and closing parentheses match; there is indirect recursion in both choice, and seq, through cp. (Content models for mixed content are regular because they can't have nested groups.) So 'doctypedecl' is spelled with an initial lowercase letter. 'VersionInfo', by contrast, has an initial uppercase because it denotes a regular language: it can be written [24] VersionInfo ::= (#x20 | #x9 | #xD | #xA)+ 'version' (#x20 | #x9 | #xD | #xA)+? '=' (#x20 | #x9 | #xD | #xA)+? ("'" '1.0' "'" | '"' '1.0' '"') which has no non-terminals on the right-hand side. That may not be the 'why' you had in mind, though. The distinction between regular and non-regular non-terminals was the result of a compromise. Someone (I'll leave the protagonists anonymous) proposed that it would be easier to see how to write an XML parser if we distinguished the lexical level and the grammar level explicitly, so that interested parties could see at a glance where one might plausibly draw the line between a lexer and a parser. Even if an implementor later decided to move that line, it would be convenient to have an initial suggestion. Someone else objected that different implementors might choose to draw the lexer/parser line in different places, and that trying to prescribe it, or even making a specific suggestion, was a waste of time. In the end, we agreed to distinguish regular from non-regular sublanguages, on the theory that conventional lexers typically recognize only regular languages. The initial capital letter effectively says "If you want to, you can conveniently treat this non-terminal as a terminal symbol recognized by the lexer"; perhaps even more important, the initial lowercase letter says "If you were thinking of treating this as a terminal symbol, using a conventional lexer design, then forget it". I gather that when XPath 1.0 was done, the XSL WG had no one who thought that this was a worthwhile way to help implementors or readers. Myself, I find it helpful but not essential.
I agree that the change in capitalization should occur. However it will require that it be done in a period of spec freeze, as it will effect many of the documents. I think the other comments in this list are pretty much editorial (I agree with most of them) and I will apply with editor's discretion.
Re changing capitalization: Note that, with the XQuery grammar, there's a subtlety in determining whether a symbol derives a regular language. For example, consider ModuleDecl: ModuleDecl ::= "module" "namespace" NCName "=" URILiteral Separator All the symbols on the RHS derive regular languages, and this rule merely concatenates them, so it would appear to derive a regular language also. However, because implicit whitespace is allowed/required between those symbols, and because that can include comments, and because comments nest, the language derived by ModuleDecl is actually non-regular.
Editorial change notes: > Instead, you might say > "... may help a parser differentiate various constructs" Done > I think that the EBNF productions and the explanation of the EBNF notation > should each be split into a separate section. Notation section moved to subsection following the EBNF. > Pulling some of this together, how about restructuring the preamble into > something like this: done. > I think these would be clearer if you didn't split each of the choices I don't think the improvement warrents a change at this point. > [142] StringLiteral > Change ('"' '"') to EscapeQuot. > Change ("'" "'") to EscapeApos. Done. > If you factor out the overlap of ElementContentChar, QuotAttrContentChar, > and AposAttrContentChar, and push it over to CommonContent, I think the > result is simpler. I don't think the improvement warrents a change at this point. > I wonder if it would help the reader if the "ws: explicit" productions (and > the intervening ones that don't care whether they're "ws: explicit" or not) > were put together in a group. I don't think the improvement warrents a change at this point.
(In reply to comment #5) > derived by ModuleDecl is actually non-regular. Based on this, which is absolutely true, and having thought about it a bit more, I withdraw my support for the proposal that the capitalization be changed. I don't think it would be an improvement.
> I withdraw my support for the proposal that the capitalization be changed. I'll just point out that I didn't propose that the capitalization be changed, merely that the difference from the XML spec be acknowledged and explained.
A joint meeting of the Query and XSLT working groups considered this comment on July 20, 2005. The WGs agreed to resolve these editorial issues as listed in my previous comment. If you do not agree with this resolution, please add a comment explaining why. If you wish to appeal the WG's decision to the Director, then change the Status of the record to Reopened. If we do not hear from you in the next two weeks, we will assume you agree with the WG decision.