This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 3161 - [XPath] string value not always derivable from typed value
Summary: [XPath] string value not always derivable from typed value
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Data Model 1.0 (show other bugs)
Version: Candidate Recommendation
Hardware: PC Windows XP
: P2 normal
Target Milestone: ---
Assignee: Norman Walsh
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-04-27 16:45 UTC by Klaus Bosse
Modified: 2006-10-17 14:22 UTC (History)
0 users

See Also:


Attachments

Description Klaus Bosse 2006-04-27 16:45:07 UTC
In the situation of element-only content there exists a string-value
of the content but no typed-value. So I think xpath20, 2.5.2 should
be changed slightly to avoid misunderstanding:

original:

"An implementation may store both the typed value and the string
value of a node, or it may store only one of these and derive the
other from it as needed. The string value of a node must be a valid
lexical representation of the typed value of the node, but the node
is not required to preserve the string representation from the
original source document. For example, if the typed value of a node
is the xs:integer value 30, its string value might be "30" or
"0030"."

changed (changes in big letters):

"An implementation may store both the typed value and the string
value of a node, or it may store only one of these and derive the
other from it as needed. The string value of a node must be a valid
lexical representation of the typed value of the node, IF THIS
EXISTS, but the node is not required to preserve the string
representation from the original source document, SO THE STRING
VALUE MAY DIFFER SLIGHTLY. For example, if the typed value of a node
is the xs:integer value 30, its string value might be "30" or
"0030"."

The most important word in the second change is 'slightly'. This
means, that the string value retrieved in this way is not allowed to
be much different from a 'direct' constructed string value.

--------------------------------

Additional comment would be also helpfull on 3.3.1.3 Relationship
Between Typed-Value and String-Value:

"In order to permit these various implementation strategies, some
variations in the string value of a node are defined as
insignificant. Implementations that store only the typed value of a
node are permitted to return a string value that is different from
the original lexical form of the node content."


Klaus Bosse
Comment 1 Michael Kay 2006-04-27 17:00:51 UTC
A better way to improve this paragraph might be simply to precede it with "In the case of a node with no children, ...". This seems to deal with the problem that the choice described in this paragraph isn't available for all nodes.

I think the other change "so the string value may differ slightly" is unnecessary and undesirable. We already say that the string value must be a valid lexical representation of the typed value, and that's a much more precise statement than saying it may vary "slightly". It's a matter of opinion whether "1" differs only slightly from "true", but it's a matter of fact that they are both valid lexical representations of the boolean value TRUE.

Michael Kay
personal response
Comment 2 Klaus Bosse 2006-04-27 17:37:56 UTC
(In reply to comment #1)
> A better way to improve this paragraph might be simply to precede it with "In
> the case of a node with no children, ...". This seems to deal with the problem
> that the choice described in this paragraph isn't available for all nodes.
> 
> I think the other change "so the string value may differ slightly" is
> unnecessary and undesirable. We already say that the string value must be a
> valid lexical representation of the typed value, and that's a much more precise
> statement than saying it may vary "slightly". It's a matter of opinion whether
> "1" differs only slightly from "true", but it's a matter of fact that they are
> both valid lexical representations of the boolean value TRUE.
> 
> Michael Kay
> personal response
> 

First to the second part of your comment: fully agreed

Now to the first:

This would indeed avoid confusion and it would be okay for me, but than not only the case of element-only content but also of mixed content is avoided. Is this intended?
Comment 3 Michael Kay 2006-04-27 18:13:53 UTC
For mixed content (a) the typed value and the string value consist of the same sequence of Unicode characters, and (b) it's also equal to the concatenation of the string-values of the descendant text nodes: so I think it's fairly obvious that you can store things any way you like and you get exactly the same result. It's with simple-valued content, e.g. numbers and dates, that results can vary depending on the implementation strategy, so that's the case we want to talk about.

Michael Kay
(personal response)
Comment 4 Klaus Bosse 2006-04-27 20:15:09 UTC
(In reply to comment #3)

Yes, my fault ([DM] 6.2.4 typed-value, 4th item). I somehow mixed it up with the typed-value of a list.

I think there is even more need to change the first sentence in [DM] '3.3.1.3 Relationship Between Typed-Value and String-Value' in a way similar to the way you suggested in comment #1. Should this be reported for [DM]?
Comment 5 Don Chamberlin 2006-05-09 18:08:37 UTC
Klaus,
Thank you for your comment, which was considered by the joint Query and XSLT working groups on May 9, 2006. The working groups agreed with your observation that an element node that has an element-only complex type does not have a typed value, and that this fact is pertinent to the following sentence in XPath/XQuery Section 2.5.2:

"An implementation may store both the typed value and the string value of a node, or it may store only one of these and derive the other from it as needed."

The working groups agreed to remove the words "from it" from this sentence, reflecting the fact that, in the case of an element node with an element-only complex type, the string value of the node is derived from its descendants rather than from its typed value. This change will be reflected in the next version of the XPath and XQuery specifications.

Section 3.3.1.2 of the Data Model document states that "Implementations are allowed some flexibility in how [the typed-value and string-value properties] are stored." It then briefly outlines some possible strategies, subject to the constraint that the relationship between the string value of a node and its typed value must be consistent with schema validation. The working groups feel that this explanation is adequate and does not need to be changed.

If you are satisfied with this resolution of your issue, please close this Bugzilla entry. If you take no action the entry will be closed by the working groups in two weeks.

Regards,
Don Chamberlin (for the joint working groups)
Comment 6 Don Chamberlin 2006-05-09 18:11:57 UTC
Sorry, the above comment should reference Data Model Section 3.3.1.3, not 3.3.1.2.
--Don Chamberlin
Comment 7 Klaus Bosse 2006-05-09 19:37:33 UTC
[XPath]and[DM]

I agree with your suggestion in the first Point ([XPath] Section
2.5.2). This sets the focus on the flexibility of implementations
and not on the asymmetric relation between string-value and
typed-value, but this is ok here.

I see now that in [DM] Section 3.3.1.3 the first sentence is correct
because it says "typed-value and string-value properties" (and not
only "typed-value and string-value") as you emphasized.

But I would prefer here a note like for string-values ("If an
implementation stores only the string-value of a node, the following
considerations apply:...") which says, that the relation between
string-values and typed-values is not symmetric, because this can
not be obvious to the reader (me) at this point of the document (-->
[DM]6.2.4 typed-value).

But, ok, this is no tutorial. So I reopen the bug but if you will
close it you will see no protest.

Regards

Klaus Bosse
Comment 8 Norman Walsh 2006-08-25 16:02:31 UTC
Proposed resolution:

Having reviewed the bug again and attempted to reconstruct mentally
the discussions we had back in May, my best effort to resolve this
issue is to add the following to the end of 3.3.1.3 in XDM:

First, a new bullet at the end of the existing bulleted list:

  * Where an element with a complex type and element-only content
    occurs, it is an error to attempt to access the typed-value
    of the node.

And the following paragraph below the list:

  If an implementation stores only the typed-value of a node, it must
  be prepared to construct string values from not only the node, but
  in some cases also the descendants of that node. For example, an
  element with a complex type and element-only content has no
  typed-value but does have a string-value that is the concatenation
  of the string-values of all its Text Node descendants in document
  order.