This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
E.1.4.2 Erases Notation "serialization followed by construction a new data model value" Insert "of" before "a". Notation / rule 1 / judgment_form_declaration 1 "statEnv |- Value1 erases to Value2" It seems like statEnv isn't needed for these judgments. Sem / rule 4 "statEnv |- Value simply erases to String" "attribute AttributeName of type TypeName { Value }" s/Value/SimpleValue/ Sem / rule 8 / conclusion "processing-instruction QName" s/QName/NCName/g Sem / rule 9 / conclusion "statEnv |- SimpleValue erases to text { String }" Grammatically, a SimpleValue isn't actually a Value, and so this doesn't conform to the judgment form declaration.
#1 Notation // Fixed. #2 Notation / rule 1 / judgment_form_declaration 1 // Removed the unnecessary statEnv in those judgments. #3 Sem / rule 4 // Fixed. #4 Sem / rule 8 / conclusion // Fixed #5 Sem / rule 9 / conclusion. Looking at the grammar in 2.3.1 Formal values a simple value is indeed a value. - Jerome
(In reply to comment #1) > > #5 Sem / rule 9 / conclusion. Looking at the grammar > in 2.3.1 Formal values a simple value is indeed a value. Has the grammar has changed since CR? While it's true that the language generated by SimpleValue is a subset of the language generated by Value, this doesn't justify saying that every SimpleValue is a Value. Consider that e.g., the language generated by VarName is a subset of the language generated by NameTest, but you wouldn't say that every VarName is a NameTest. However, this is mostly beside the point. The question is, if a judgment form is declared as statEnv |- Value1 erases to Value2 then, in judgments of that form, what can appear in the 'Value1' position? The FS doc isn't so formal as to say, but it seems to me that the only sensible answer is: phrases that can be derived from the 'Value' non-terminal. Since Value doesn't derive SimpleValue, I'm saying that statEnv |- SimpleValue erases to text { String } doesn't conform to the declaration of the judgment form.
Michael, As you point out yourself, the language generated by SimpleValue is a subset of the language generated by Value. When you instantiate the rule, there is no such thing as a 'SimpleValue', you will get a value in the language generated by SimpleValue. So I really do not understand what the problem is here. - Jerome
(In reply to comment #3) > > When you instantiate the rule, there is no such thing as a 'SimpleValue', I disagree. I suppose it comes down to how you interpret section 2.1.2: What is the nature of the "objects" that are bound to the italicized non-terminals in judgments, and which ones can be "substituted legally" for a given non-terminal? My interpretation is that: -- all such objects are syntactic objects (derivation trees, roughly speaking): objects defined by, and structured according to, the various grammars of the spec; and -- the objects that can be substituted legally for a given non-terminal are those whose root symbol is, or can be derived from, that non-terminal. (E.g., an italicized SimpleValue1 can be bound to any syntactic object deriveable from the non-terminal SimpleValue.) I'll admit that this interpretation raises questions. (In fact, I posed such a question in Bug 1535. I don't think it's been answered yet.) However, it's simple, consistent, and seems to fit the available data fairly well. I believe your interpretation is something like the following: -- The objects that can be bound to italicized non-terminals: some of these objects are syntactic, but some aren't (even though the FS has grammar rules that appear to define them as such). Instead, they are abstract entities defined by external specs like the XQuery/XPath Data Model and XML Schema. -- The particular objects that can be substituted legally for a given non-terminal: if the non-terminal belongs to the XQuery/XPath or Core grammars (I'm guessing), then the legal objects are perhaps something like I described in my interpretation; if the non-terminal belongs to the Formal grammar, the legal objects are those abstract entities that satisfy the definition of some term similar in spelling to the non-terminal. (E.g., an italicized SimpleValue1 can be bound to any entity that satisfies section 2.3.1's definition of "simple value".) The FS does a lot to reinforce my interpretation that Formal objects are syntactic: -- It gives productions for Formal non-terminals, in exactly the same way as for other non-terminals. (If a Formal non-terminal were to be treated as a reference to a definition of abstract entities, why supply a production for it?) -- Section 2.3.2 gives nice concrete phrases apparently derived from the Value non-terminal. Various sections in 2.4 do likewise for type-related non-terminals. -- Many judgments contain patterns that use syntax from Formal productions, in the same way that they use syntax from Core productions. (If you had a premise judgment in which a Formal non-terminal were bound to an abstract entity, how would you match that against conclusion judgments that have patterns involving Formal syntax?) As far as I can tell, saying that Formal objects aren't syntactic will make the inference process a lot more complicated, and with no compensating benefit. > you will get a value in the language generated by SimpleValue. This is an interesting statement. The phrase "the language generated by [a non-terminal]" would normally have a purely syntactic meaning, where each member of that language is a sequence of symbols or tokens or characters (depending on how one defined the boundary of the grammar). But you seem to be saying that each member is a value, by which I'm pretty sure you mean an XDM value.
Michael, I admit of some confusion here. The grammar for formal values starts like that: Value ::= Item | (Value "," Value) | ("(" ")") Item ::= NodeValue | AtomicValue ... And the grammar for simple values looks like that: SimpleValue ::= AtomicValue | (SimpleValue "," SimpleValue) | ("(" ")") So with the simple (and traditional) definition you have in mind, that still results in all simple values also be formal values. Concretely, (1,"3",4) can be derived by both grammars. I guess I still may be missing the problem you are trying to raise. - Jerome
(In reply to comment #5) > > So with the simple (and traditional) definition you have in mind, > that still results in all simple values also be formal values. No, I covered that in the first para of Comment #2: the fact that L(SimpleValue) is a subset of L(Value) doesn't justify saying that every SimpleValue is a Value. But that's beside the point. The pertinent question is, what things can legally be bound to an italicized non-terminal (e.g., SimpleValue) in a judgment? My answer is: any syntactic object that is actually derived from the (SimpleValue) non-terminal. I believe your answer is: anything that could have been derived from the (SimpleValue) non-terminal, whether or not it actually was so derived. (Perhaps each "thing" is a member of the language generated by the (SimpleValue) non-terminal [a series of tokens or characters, which doesn't know how it was derived], or perhaps each "thing" is an abstract value from a value-space.) Because Value does not derive SimpleValue (or vice versa), my answer says that you can't bind a SimpleValue object to a Value italicized word (or vice versa). Thus, a Value italicized word in one judgment will never unify with a SimpleValue italicized word in another judgment. (So, e.g., Sem / rule 9 / conclusion can't match Sem / rule 3 / premise 1, though you presumably want it to.) With (what I believe to be) your answer, there are things (strings of tokens, or abstract values) that can be legally bound to both Value and SimpleValue italicized words, and so such words can unify, and the judgments containing them can match. My problems with that answer are: -- it's a complication without a counterbalancing benefit, and -- there's lots of evidence that the inference process does indeed operate on syntactic objects, not strings of tokens or abstract values.
We may be talking cross purposes here... The interpretation I have for a grammar is that it derives a (possibly infinite) language of "words" that are accepted by that grammar. Here each of those "words" correspond to a data model value. For instance: (1,2,3) is a word that correponds to a sequence of 3 integers 1, 2 and 3, and is a valid instance of the formal value grammar. Can you give an example of a simple value which won't be a formal value? - Jerome
(In reply to comment #7) > > The interpretation I have for a grammar is that it derives a (possibly > infinite) language of "words" that are accepted by that grammar. Yup. (You could also call them "sentences".) > Here each of those "words" correspond to a data model value. Yes, if you start with the Value non-terminal. (For "correspond to", one might also say "denotes", "represents", or "means".) > For instance: > > (1,2,3) is a word that correponds to a sequence of 3 integers 1, 2 and 3, and > is a valid instance of the formal value grammar. Actually, you can't derive (1,2,3) from Value, because of the parentheses, but you can derive 1,2,3 *provided* you assume that the Decimal symbol can derive 1, 2, and 3. (There's no production for Decimal, so the grammar would appear to be treating it as a terminal symbol. However, examples like 42 of type xs:integer are clearly treating it as a non-terminal that can derive 42 (etc).) But yes, I agree that the Value 1,2,3 corresponds to the value you described. > Can you give an example of a simple value which won't be a formal value? Nope. Given the definitions of "value" and "simple value" in Section 2.3.1, it's clear that every simple value is a value. (However, that's an assertion about values in XS value spaces, whose only connection to Formal syntactic objects is via the (not actually defined) "corresponds to" relation.)
Hi Michael, So are we on the same page? I don't see a problem in using a SimpleValue in the rules for that case. Please let me know if you agree, and if we can close bugs #1804 and #1803 on that ground. Thanks a lot, - Jerome
No. I answered your question (from comment #7), but I think the question is irrelevant. It seems like you're thinking of 'simple value' and SimpleValue as more-or-less the same thing, so that assertions about the former are ipso facto true of the latter. Well, they aren't (as I indicated in the last sentence of comment #8). If you really do think they are, please say so. In comment #6, I asked a pertinent question and supplied two different answers, with conflicting results for the inference rule in question. Please do me the courtesy of indicating whether the second answer correctly outlines your position. Maybe then we can make some progress.
Dear Michael, I did not mean to be discourteous. I apologize is I came accross in that way. I was really hoping to pin-pointed what I though would take us out of this pickle. Let me go back to #6 and your question. I am not sure I understand all of it so I will start with some comments and questions. You write: << The pertinent question is, what things can legally be bound to an italicized non-terminal (e.g., SimpleValue) in a judgment? >> Yes I agree. << My answer is: any syntactic object that is actually derived from the (SimpleValue) non-terminal. >> Yes I think I agree, although I am not completely sure what you mean by 'derive'. More precisely, a non terminal defines a language of syntactic objects. Every syntactic object in that language can be bound to that non terminal in a judgment. << I believe your answer is: anything that could have been derived from the (SimpleValue) non-terminal, whether or not it actually was so derived. >> I am not sure I understand the distinction with the previous statement. There may be many different non terminals that describe the same language, in which case syntactic objects in that language could be bound to either non-terminal. (e.g., MyIntList1 ::= Integer* or MyIntList2 ::= Integer+ | () describe the same set of syntactic objects) << Because Value does not derive SimpleValue (or vice versa), my answer says that you can't bind a SimpleValue object to a Value italicized word (or vice versa). >> I do not understand what you mean by "derive" here. In your previous text, it seems that derive is a relationship between objects and non terminals. Here you are talking about two non terminals. My point in comments #5 and #7 is that every syntactic object that "derives from" (or can be bound to) SimpleValue also derives from Value. In my view, this observation is the ground for why using SimpleValue here is of a proper use. << With (what I believe to be) your answer, there are things (strings of tokens, or abstract values) that can be legally bound to both Value and SimpleValue italicized words, and so such words can unify, and the judgments containing them can match. My problems with that answer are: -- it's a complication without a counterbalancing benefit, and -- there's lots of evidence that the inference process does indeed operate on syntactic objects, not strings of tokens or abstract values. >> Here again, I am not sure I understand the connection. I do believe my interpretation as stated above work on 'syntactic objects' (e.g., 1,2,3 is a syntactic object). And I believe the formal semantics is consistent here on that interpretation. I sincerely hope this more detailed set of responses will help us identify where we are not on the same page. All the best, - Jerome
Okay, let me see if I can clarify some things. Re my use of "derive": Formal language theory defines the "derive" relation on sequences of symbols; A derives B (within a grammar G) iff (roughly speaking) you can get from A to B by applying productions from G. I'm using exactly that sense when I say that 'Value' does not derive 'SimpleValue': you can't start with the 'Value' symbol, apply productions from the Formal grammar, and end up with a 'SimpleValue' symbol. I also use "derive" in a closely allied sense: a (non-terminal) symbol 'derives' its derivation trees (i.e., the syntax trees that have that symbol at their root). Which brings me to my next clarification. As I said in comment #4, by "syntactic object", I mean 'derivation tree' (or 'syntax tree'). Whereas it looks like you're taking it to mean the end result of the derivation, i.e., a "word" (sequence of tokens or characters) in the language generated by some symbol. So I'll avoid using the term "syntactic object". Given those clarifications, we go back to the question: What things can legally be bound to an italicized non-terminal (e.g., SimpleValue) in a judgment? My answer is: any syntax tree that is derived from the (SimpleValue) non-terminal. Your answer is: any sequence of tokens/characters that can be derived from the (SimpleValue) non-terminal. My answer says that there is nothing that can bind to both italicized words 'SimpleValue' and 'Value' (and so those two italicized words cannot unify); your answer says that there are lots of things that can bind to both (and so the two words *can* unify). My objections to your answer are: -- The FS is fairly clear that it's dealing with syntax trees, not strings of tokens or characters. See 3.2.1 and 3.2.3. -- With your answer, I believe it's harder to prove that the rules are complete and consistent (not that anyone is trying to do such a thing), and also harder to be certain that the rules are saying what you want them to say.
Michael: Thanks for the response. I am now convinced we have distinct interpretations for what the meaning of non-terminals in judgments is. Just to make sure we are on the same page, it seems that the main difference between our two interpretation is that you assume that non-terminals names which are used in the grammar to describe the syntactic objects end up being part of the syntactic objects themselves. On the other hand, I assume that the syntactic objects are defined independantly of the non-terminal names being used and that only terminals are part of those syntactic objects. Now, I would like to argue that my interpretation is actually consistent with the current Formal Semantics draft, that it does not raise any specific additional issue. A few specific points on this: About ASTs. As your point out, the processing model mentions a notion of AST. However the specification does not make any specific assumption about what such an AST might be. In particular it does not assume that non-terminal names should be part of that AST. I believe this terminology is used since it is quite familiar to developers, but that this does not enforce any specific way to represent the syntactic objects resulting from grammar productions in the formal semantics. The interpretation of non-terminals in the grammar is actually described in Section 2.1.2 Notations for judgments, as follows: << Patterns are used to represent objects that can be constructed from a given grammar production. In patterns, italicized words correspond to non-terminals in the grammar. The name of those non-terminals is significant, and may be instantiated only to an "object" (a value, a type, an expression, etc.) that can be substituted legally for that non-terminal. For example, 'Expr' is a pattern that stands for every [XPath/XQuery] expressions, 'Expr1 + Expr2' is a pattern that stands for every addition expression, 'element a { Value }' is a pattern that stands for every value in the [XPath/XQuery] data model that is an 'a' element. When instantiating the judgment, each pattern must be instantiated to an appropriate sort of "object" (value, type, expression, etc). For example, '3 => 3' and '$x+0 => 3' are both instances of the judgment 'Expr => Value'. Note that in the first judgment, '3' corresponds to both the expression '3' (on the left-hand side of the => symbol) and to the value '3' (on the right-hand side of the => symbol). >> I don't believe that those description indicate that non-terminals are part of the objects themselves. I agree that maybe some more clarification about what we mean by objects (syntactic?) here and how they relate to actual objects (values, expressions) is completely specified. No matter what I think there is no completely formal way to explicit that relationship, but if you want to suggest some text that you believe would make it clearer, that would sure be greatly appreciated. On the technical front, I am worried that your interpretation will not work with most of the auxiliary judgments that are introduced in the spec. For instance, the section on FLWORs starts as follows: 4.8.1 FLWOR expressions Notation For convenience, we introduce the following auxiliary grammar productions. [87 (Formal)] OptTypeDeclaration ::= TypeDeclaration? [88 (Formal)] OptPositionalVar ::= PositionalVar? And those non-terminals are used in judgments. It seems that an expression Expr considered with your interpretation will never match with those formal non-terminals since they are not part of the main grammar. I think this is used all over the place and switching to your interpretation will require some fairly involved changes (including probably to the original XQuery grammar which probably will be hard at this point). Finally, I am not sure I see what current problem we are trying to fix. Is there really anything broken currently? It seems to me just a matter of style. Did I miss another technical problem? Cheers, - Jerome
(In reply to comment #13) > > Just to make sure we are on the same page, it seems that the main difference > between our two interpretation is that you assume that non-terminals names > which are used in the grammar to describe the syntactic objects end up being > part of the syntactic objects themselves. On the other hand, I assume that the > syntactic objects are defined independantly of the non-terminal names being > used and that only terminals are part of those syntactic objects. I would quibble over some terminology, but I think that's basically it, yes. > About ASTs. As your point out, the processing model mentions a notion > of AST. However the specification does not make any specific assumption > about what such an AST might be. In particular it does not assume that > non-terminal names should be part of that AST. Oh, please. Can you point to a usage of "syntax tree" anywhere in which each node of the tree is *not* labelled with a symbol from the grammar? What would be the point of such a tree? > I believe this terminology is used since it is > quite familiar to developers, but that this does not enforce any specific way > to represent the syntactic objects resulting from grammar productions in the > formal semantics. Well, as I've shown, if you don't "enforce a representation" for objects at least to the extent of distinguishing between syntax trees and terminal-seqs, you get conflicting interpretations of the FS. So I don't think you have the luxury of saying you'll just leave that up to the reader/implementor. (So actually, I guess this is a technical issue, not an editorial one.) > The interpretation of non-terminals in the grammar is actually described in > Section 2.1.2 Notations for judgments, ... > I don't believe that those description indicate that non-terminals are part of > the objects themselves. I'm not sure I agree. For example, it says: 'Expr1 + Expr2' is a pattern that stands for every addition expression Now, under your interpretation, the pattern 'Expr1 + Expr2' can be bound to any series of terminals that can be derived from the phrase 'Expr + Expr'; but such a series of terminals can occur other than as an addition expression. E.g., in the FunctionDecl declare functon foo () as T+ external the series of terminals T + external (which can be derived from 'Expr + Expr', and thus matches the pattern in question) is definitely not an addition expression. So I think the phrasing fits with my interpretation better. Anyway, I certainly don't believe that 2.1.2 indicates that non-terminal symbols *aren't* part of the objects either. Really, it's pretty vague on that score. (Which is why I didn't include 2.1.2 in the set of sections that indicate the FS is operating on syntax trees.) > I agree that maybe some more clarification about what we mean by objects > (syntactic?) here and how they relate to actual objects > (values, expressions) is completely specified. I'd say more clarification is warranted. (By the way, you don't think expressions are syntactic objects?) > No matter what I think there is no completely formal way to explicit > that relationship, but if you want to suggest some text that you > believe would make it clearer, that would sure be greatly appreciated. Well, I think first there would have to be a decision about which interpretation is correct for the FS. > On the technical front, I am worried that your interpretation will not work > with most of the auxiliary judgments that are introduced in the spec. I don't think it's a problem with auxiliary judgments per se, but the example you raise is a pertinent one. It's vexing that the XQuery grammar doesn't accommodate the FS better. Mind you, my response would be to fix the XQuery grammar rather than abandon my interpretation. I quite agree that changing the official XQuery grammar would be pretty much impossible at this point. However, I don't have a problem with the FS using a different-but-equivalent grammar. Note that really, it already *is* using such a grammar, but it's defined implicitly in the left-hand sides of normalization rules, so nobody notices. > Finally, I am not sure I see what current problem we are trying to fix. Is > there really anything broken currently? Yes. But what is broken depends on your interpretation.
Michael, I still do not understand what's broken with the current formal semantics spec. I think the interpretation used is pretty clear. I do not believe there is any indication that non-terminals are being part of the various objects being processed through inference rules, and I think that interpretation is used consistently throughout the document. Is there any specific part of the spec that you believe is broken on this? Or any specific change that you think would improve the situation? Thanks, - Jerome
(In reply to comment #15) > > I still do not understand what's broken with the current formal semantics spec. The primary thing that's broken is that the intended interpretation of "objects" (that can be bound to italicized words) isn't specified. If (as you appeared to indicate in comment #13), you think the reader is free to choose an interpretation ("this does not enforce any specific way to represent the syntactic objects"), I refer you to the conflicting results of comment #6. If, on the other hand, you think that the reader is *not* free to choose an interpretation, and must use the one that you favour, then I would argue that it's a poor choice. Under that interpretation, when you write a judgment, it's difficult to be certain that it only applies when you want it to, since it might unintentionally apply to some string of terminals that happens to be in the language generated by one of the judgment's nonterminals, even though those tokens aren't actually derived from that nonterminal in the syntax tree. And it's similarly difficult for me to determine whether a judgment applies when you don't want it to. So I can't currently point to a rule that would break under your interpretation, but that's because it's prone to subtle, difficult-to-find breakage. If on the third hand, the FS embraces the interpretation I favour, then E.1.4.2 / Sem / rule 9 is broken, and probably a few others. (And I would work on ways to fix them.) -- > I think the interpretation used is pretty clear. Given this conversation, I don't understand how you can think that. Can you indicate a place where you think it's clearly conveyed? (The only text you've quoted is from 2.1.2, and you have my response to that.) > I do not believe there is any indication that non-terminals are being > part of the various objects being processed through inference rules, And I believe that use of the phrase "syntax tree" *is* such an indication. > and I think that interpretation is used consistently throughout the document. That's difficult to disprove, since it's a much less tractable interpretation.
Michael, Again, trying to find some common ground on this thread. Here are a few additional thoughts. (a) I want to confirm that indeed the Formal Semantics specification has been written with the intent that non-terminals do not participate in the 'syntactic object' they described. (b) We do not have any concrete evidence that the current interpretation is broken. (c) I believe I understand your concern that some inference rules may be triggered by being matched with a syntactic object that may not be relevant for that rule. That said, I am not quite as concerned as you may be. The formal semantics processes essentially 3 kinds of objects: values, types, and expressions. I believe judgments are never such that they may bind two of those kinds of objects for the same pattern so there should not be any case where an object of one kind can be matched instead of an item of another kind (e.g., expression 1,2,3 will never be confused with value 1,2,3). Then for one such kind of objects, let say values, the syntactic objects are independant of the grammar production and the grammar can be seen as a constraint on which subset of the objects can be matched. Obviously this reasoning does not represent a formal proof, just the basic intuition. (d) Whatever interpretation is choosen, addressing fully your concern will require some significant work (a formal proof and a detailed review in either case). Now let me try and propose a plan to move us forward, in two steps: First step: * For now and considering (a) and (b) above I would suggest to keep the intended interpretation since this is the intended semantics and the spec is based on that interpretation. * We should clarify the intended semantics by explaining the role/semantics of non-terminals in the corresponding notation section. We can then close bugs 1804 and 1803. Second step, assuming someone has the interest and resources to do this: * Review the current interpretation to see if anything is broken with the current interpretation, and if so, post a technical item about this on bugzilla. Let me know how that sounds, - Jerome
(In reply to comment #17) > > (a) I want to confirm that indeed the Formal Semantics specification has been > written with the intent that non-terminals do not participate in the 'syntactic > object' they described. (I can't help thinking that whoever wrote "syntax tree" in 3.2.1 and 3.2.3 did not intend that interpretation.) > (b) We do not have any concrete evidence that the current interpretation is > broken. Okay, here's some. Consider the query body: 1 + 2 * 3 Under your interpretation, this matches the left-hand side of 4.4 / Norm / rule 1: Expr1 ArithOp Expr2 in two different ways: Expr1 = 1 ArithOp = + Expr2 = 2 * 3 Expr1 = 1 + 2 ArithOp = * Expr2 = 3 leading (ultimately) to two different values, only the first of which is correct.
(By the way, in F.1.4.2 / Sem / rule 2 / premise 2, you should delete "statEnv |-".)
Revert apparent vandalism (status change).