This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 1535 - [FS] editorial: 2.3.1 Formal values
Summary: [FS] editorial: 2.3.1 Formal values
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Formal Semantics 1.0 (show other bugs)
Version: Last Call drafts
Hardware: All All
: P2 minor
Target Milestone: ---
Assignee: Jerome Simeon
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-07-10 06:32 UTC by Michael Dyck
Modified: 2007-01-16 17:21 UTC (History)
0 users

See Also:


Attachments

Description Michael Dyck 2005-07-10 06:32:03 UTC
2.3.1 Formal values

[14] ProcessingInstructionValue ::=
"processing-instruction" QName "{" String "}"
    Change "QName" to "NCName".

[17] NamespaceBinding ::=
"namespace" NCName "{" String "}"
    I think you'll be better off if you change "String" to "AnyURI".

'In that grammar, "String" indicates the value space of xs:string,
"Decimal" indicates the value space of xs:decimal, etc.'
    Please clarify.
    Are you saying that, for example, the symbol 'String' "derives" a
    language of (abstract, non-syntactic) values (namely, the values in
    the value space of xs:string)?
    Or rather, that it derives (though by unspecified productions) a
    conventional language of character-sequences, each of which you
    identify with a value from that value space?

    Examples like these:
        text { "42" }             # i.e., String derives '"42"'
        10 of type xs:integer     # i.e., Decimal derives '10'
    certainly appear to assume the latter. But it seems to just forestall
    the inevitable matter of the syntactic-to-abstract mapping. And
    presumably we are to assume that the standard functions (on which
    the dynamic semantics are built) take and return abstract values
    rather than syntactic denoters thereof.

element weight of type xs:integer { text { "42" } } {}
    This does not conform to production "[9 (Formal)] ElementValue":
    if the final braces are there, then there has to be at least
    one NamespaceBinding between them.

"The same rule about constructing sequences apply"
    s/rule/rules/ or s/apply/applies/
    Which rule(s)? (Give a cross ref?)

(10, (1, 2), (), (3, 4))
    Actually, this isn't a "value described by that grammar", because the
    grammar has no production for parenthesized values.

"When the context is clear, we may omit the type annotation on literal
values."
    Hm. You've already said:
        "Atomic values without type annotations are assumed to have a type
        annotation which is the base type for the corresponding value."
    Does this new sentence add something?
Comment 1 Michael Dyck 2006-04-14 02:23:49 UTC
I don't think the third point ("Please clarify") has been addressed in the CR.
Comment 2 Jerome Simeon 2006-04-17 00:41:23 UTC
Added the following sentence to clarify:

<<
In that grammar, "String" represents the value space of xs:string, "Decimal" represents the value space of xs:decimal, etc. In each case, those non-terminals stand for a set of syntactic objects each of which correspond to a value in the corresponding value space. For instance "String" stand for all string values, and are written in this specification as "", "a", "John", etc.
>>

Thanks for pointing out that we overlooked that comment.
- Jerome
Comment 3 Michael Dyck 2006-04-17 05:49:20 UTC
Well, the added text helps, I guess, but it's still pretty loose, especially the last sentence. Consider the following (quasi-ER) diagram, established by the first two sentences:

      Syntactic Realm                         Abstract Realm
    -------------------                     ------------------
    
        non-terminal     --- represents -->    value space
       (e.g. String)                  (e.g., value space of xs:string)
             |                                      |
             |                                      |
         stands for                            [contains]
             |                                      |
             *                                      *
             V                                      V
       syntactic object  --- corresponds to -->   value

At the end of the second sentence, the phrase "the corresponding value space" breaks this somewhat, using the verb "corresponds" where the first sentence had set up "represents". In the last sentence, you say "string values", which I would take to mean "values in the xs:string value space", but you almost certainly don't mean that; rather you mean something like "String syntactic objects".

So I suggest that you:

-- Change "represents" to "corresponds to". (This will help delineate the syntactic/abstract divide.)

-- Change "stands for" to "derives". Or if that's too technical, then "generates".

-- Change "syntactic object" to something else, since, as we've seen elsewhere, it can be misinterpreted. Or else define it.

-- Change the last sentence to:
      For instance, the non-terminal 'String' derives a set of syntactic
      objects, which appear in examples as "", "a", "John", etc.; each one
      corresponds to a string value in the xs:string value space. 

.....

Re the "stands for/derives/generates" relation in this particular context: it's odd that this relation isn't defined by a set of productions. Why aren't there productions something like
    String ::= '"' Char* '"'
    Decimal ::= Digit+ ( "." Digit+ )?
Maybe because nobody wanted to supply all 19 rules. Maybe because, although String derives things that look like StringLiterals, and Decimal derives things that look like DecimalLiterals or IntegerLiterals, what about symbols like Base64Binary and NOTATION? What do the things they derive look like?

My advice is to drop the "looks like something familiar" approach, as it skirts too close to circularity. Instead, use an unfamiliar notation for things derived from AtomicValueContent. E.g.
    String  ::= '`' {Any literal in the ·lexical space· of xs:string}  '`'
    Decimal ::= '`' {Any literal in the ·lexical space· of xs:decimal} '`'
etc. (And you could just say "etc." since the pattern is pretty obvious.) (Here, "literal" is used in the XML Schema sense.)

Note that the particular appearance of such things has almost no effect on the spec; I think you'd only have to make some minor tweaks to the examples in 2.3.2. And of course, it has no effect on the semantics of the XQuery language.  But it would reassure readers that everything derived from AtomicValueContent (not just String and Decimal) is a syntactic object, and has a straightforward "concrete" presentation. Which would answer the question that I asked in the original comment. Moreover, because they are defined in terms of literals in a lexical space, the "corresponds to" relation is clearer: each thing-derived-from-AtomicValueContent corresponds to whatever value is denoted by the literal it contains.

(Someone might ask: what if the literal contains an occurrence of the delimiting character '`'? I think it would be sufficient to answer that escaping mechanisms exist, and it isn't really necessary for the FS to specify a particular one.)
Comment 4 Jerome Simeon 2006-04-17 16:49:48 UTC
Michael:

Thanks very much for all those suggestions, which can go directly into the draft. I have some reservation about the last suggestion however, about an alternative grammar & approach. More specifically, I am not sure that using `1` instead of xs:integer("1") is that much clearer. It introduces new notations which seem really unnecessary in that case and will need some extra explainations.

Here is an interim suggestion, which instead relies more explicitely on XQuery's syntax and is closer the first proposal you are hinting at. It would look like the following:

AtomicValueContent ::= StringLiteral
                    |  BooleanLiteral
                    |  DecimalLiteral
                    |  DoubleLiteral
                    |  ConstructedLiteral

    StringLiteral  ::= '"' Char* '"'
    BooleanLiteral ::= 'true()' | 'false()'
    DecimalLiteral ::= Digit+ ( "." Digit+ )?
    DoubleLiteral  ::= (("." Digits) | (Digits ("." [0-9]*)?)) [eE] [+-]? Digits
    ConstructedLiteral ::= AtomicTypeName "(" StringLiteral ")"

That would go along with some additional text to note that:

 * This notation uses XQuery syntax to write 'syntactic objects' (not sure what
better word to use here) that correspond to values in the corresponding atomic type value space.
 * In the ConstructedLiteral form, the corresponding string literal must
be a valid lexical representation for the corresponding value in the specified atomic type.

Would that address your last concern?

Best,
- Jerome
Comment 5 Michael Dyck 2006-04-18 03:13:45 UTC
(In reply to comment #4)
> 
> I have some reservation about the last suggestion however, about
> an alternative grammar & approach. More specifically, I am not sure
> that using `1` instead of xs:integer("1") is that much clearer.

Where did xs:integer("1") come from? As far as I can tell, xs:integer("1") cannot be derived from AtomicValueContent, so is not under discussion. I was suggesting using `1` instead of 1, and `foo` instead of "foo", (etc.) *only* in the context of AtomicValueContent.

> It introduces new notations

One new notation.

> which seem really unnecessary in that
> case and will need some extra explainations.

You think you can reuse old notation without extra explanations?

> Here is an interim suggestion, which instead relies more explicitely on
> XQuery's syntax and is closer the first proposal you are hinting at.

That wasn't a proposal, it was more of a rhetorical device. I think it's a bad idea to use XQuery syntax, because:
-- it makes it look like XQuery is being defined in terms of itself;
-- I think some readers would be confused about the different uses of StringLiteral etc.; and
-- it's possible that some inference rules might get "confused" too.

> It would look like the following:
> 
> AtomicValueContent ::= StringLiteral
>                     |  BooleanLiteral
>                     |  DecimalLiteral
>                     |  DoubleLiteral
>                     |  ConstructedLiteral

You'd have to go to each occurrence of 'String' and change it to 'StringLiteral' if appropriate. Similarly for Boolean, Decimal, Double. I don't think the result would be an improvement. Consider what would happen to the rules in 4.1.1.

Plus, you'd have to go to each occurrence of 'AnyURI' and 'expanded-QName' and change them to 'ConstructedLiteral' if appropriate, which I doubt you want to do. (The other symbols on the rhs of AtomicValueContent's current production don't seem to occur in the FS.)

People might wonder "Why no FloatLiteral? Or DateLiteral? Why do String/Boolean/Decimal/Double get preferred treatment?" (And really, the answer is that there's no reason for them to get preferred treatment here, other than that the examples in 2.3.2 were written in a particular way.)

Given ConstructedLiteral, {String,Boolean,Decimal,Double}Literal are redundant here. 

>     DecimalLiteral ::= Digit+ ( "." Digit+ )?
>     DoubleLiteral  ::= (("." Digits) | (Digits ("." [0-9]*)?)) [eE] [+-]? Digits

Note that these don't provide a literal for every value in the decimal or double value spaces.

>     ConstructedLiteral ::= AtomicTypeName "(" StringLiteral ")"
>
>  * In the ConstructedLiteral form, the corresponding string literal

Don't use "corresponding" here if that's the word chosen to signal the syntactic/abstract divide. (Also don't say "string literal" if you mean "StringLiteral".)

> must be a valid lexical representation for the corresponding value
> in the specified atomic type.

The phrasing is somewhat backwards, since it suggests that you already know the corresponding value, and thus what its valid lexical representations are. Maybe change "the corresponding value" to "some value" or "a value".

And it's not quite correct to say that the string literal is a valid lex rep; instead, the *content* of the string literal (roughly, the stuff between the quotes) is.

....

On a separate note, it occurred to me that it might make more sense if 'AtomicValueContent' were named 'PrimitiveValue'.
Comment 6 Jerome Simeon 2006-04-18 12:54:25 UTC
Michael,

I don't believe that writing "hello world!" for string values, and 1 for integer values will confuse many people. Since your original comment was about clarifying the notation, can you live with the original grammar along with the clarifications you suggest?

Thanks,
- Jerome
Comment 7 Michael Dyck 2006-04-18 21:13:10 UTC
(In reply to comment #6)
> 
> I don't believe that writing "hello world!" for string values, and 1 for
> integer values will confuse many people.

Probably not. What I meant was that some readers would be confused about the different uses of the *term* 'StringLiteral' (etc) in inference rules.

> Since your original comment was about clarifying the notation,

Actually, my original comment was about clarify the object model.

> can you live with the original grammar along with the
> clarifications you suggest?

Yes, I think I can.

How about this:
---
In the production for AtomicValueContent, each symbol in the right-hand side corresponds to one of the primitive datatypes. For example, "String" corresponds to xs:string, and "Boolean" corresponds to xs:boolean. (The mapping is obvious, except that "expanded-QName" corresponds to xs:QName.) Although there are no explicit productions for these symbols, we assume that each is a non-terminal that derives a set of syntactic objects, each of which corresponds to a value in the value space of the corresponding datatype. For instance, the non-terminal 'String' derives a set of syntactic objects, which appear in examples as "", "a", "John", etc.; each one corresponds to a string value in the xs:string value space.  (For familiarity, these objects have been given the same appearance as StringLiterals from the XQuery and Core grammars; however, these are formal objects, with a distinct role in the FS.)
---
Comment 8 Jerome Simeon 2006-04-21 14:56:04 UTC
I like it. Kept the original grammar and added your text.
Thanks for the suggestion.
- Jerome