3771 – [FS] technical: interleaved with empty text nodes

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 3771 - [FS] technical: interleaved with empty text nodes

Summary: [FS] technical: interleaved with empty text nodes

Status:	CLOSED FIXED

Alias:	None

Product:	XPath / XQuery / XSLT
Classification:	Unclassified
Component:	Formal Semantics 1.0 (show other bugs)
Version:	Candidate Recommendation
Hardware:	All All

Importance:	P2 normal
Target Milestone:	---
Assignee:	Michael Dyck
QA Contact:	Mailing list for public feedback on specs from XSL and XML Query WGs

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2006-09-27 07:10 UTC by Michael Dyck
Modified:	2008-08-25 02:35 UTC (History)
CC List:	1 user (show)

See Also:

Attachments

Description Michael Dyck 2006-09-27 07:10:36 UTC

4.7.1 / Norm
"we normalize each unit individually and construct a sequence of the
normalized results interleaved with empty text nodes."
    This appears to cause failures in STA. Consider:
        <e>{ attribute a1 {"v1"} }{ attribute a2 {"v2"} }</e>
    which is normalized to (roughly):
        element e {
            fs:item-sequence-to-node-sequence((
                attribute a1 {"v1"},
                text {""},
                attribute a2 {"v2"}
            ))
        } {}
    The problem here is that there is an attribute node after a text node.
    This isn't a problem for DEv, because empty text nodes are deleted
    before checking for misplaced attribute nodes, but it is a problem for
    STA.  Currently, STA fails for the call to
        fs:item-sequence-to-node-sequence()
    If (as I suggest in Bug 3760) we drop that call from normalization,
    then the failure happens when we try to typecheck the element
    constructor.

    Are such failures intentional?

Comment 1 Jerome Simeon 2006-10-10 15:04:16 UTC

I don't think this is the intended semantics. This looks like a pretty serious problem for statically typed implementation. Considering how late we are in the process, I think I would recommend a ugly fix, which would let a combination of attribute and text nodes type check in the beginning of the type. Something like the following:

statEnv |-  Type <: (attribute|text)*, (element|document|text|processing-instruction|comment|xs:string|xs:float| ...|xs:NOTATION)*
statEnv |-  (FS-URI,"item-sequence-to-node-sequence") (Type) : attribute*, (element|text|processing-instruction|comment)*

Comment 2 Jerome Simeon 2006-10-10 15:27:46 UTC

The following rule may look a bit more natural:


statEnv |-  Type <: (attribute*, (element|document|processing-instruction|comment|xs:string|xs:float| ...|xs:NOTATION)*) & text*
--------------------------------------------------------------------------
statEnv |-  (FS-URI,"item-sequence-to-node-sequence") (Type) : attribute*, (element|text|processing-instruction|comment)*

Comment 3 Jerome Simeon 2006-10-19 19:37:44 UTC

The XSLT and XML Query WGs have decided to adopt the proposed change in comment #2, addressing that bug.
Best,
- Jerome, On Behalf of the WGs

Comment 4 Michael Dyck 2007-05-13 03:11:10 UTC

The Rec does not reflect the change proposed in Comment #2 (i.e., loosening the input type for fs:item-sequence-to-node-sequence in 7.1.5). Instead, the loosening was applied to the type of the content expression of computed element constructors in 4.7.3.1 / STA. But that doesn't solve the original problem, because it's the call to the function, not the constructor, that sees the interleaved text nodes.

Comment 5 Michael Dyck 2007-10-02 08:02:32 UTC

This issue has been entered as FS erratum E012. As a fix, I have undone the changes to 4.7.3.1 / STA and committed the change to 7.1.5 given in Comment #2. Consequently, I'm marking this issue resolved-FIXED, and CLOSED.

Comment 6 Michael Dyck 2008-03-10 04:32:44 UTC

It occurs to me that this approach is unsound. For instance, consider:
    element e { text {"foo"}, attribute a {"x"} }
The rules given in comment #1 or comment #2 will accept this and assign it a static type, but we know that dynamically it will raise a type error, because an attribute node follows a non-attribute node in the element constructor's content sequence.

Comment 7 Michael Dyck 2008-03-10 06:22:10 UTC

Moreover, the other case of interleaving empty text nodes, in the normalization of direct attribute constructors (4.7.1.1), doesn't work. Consider the direct attribute constructor:
    a="{4}{2}"
which is currently normalized to
    attribute a { fs:item-sequence-to-untypedAtomic(( (4), text {""}, (2) )) }
At evaluation time, the value passed to the function is
        4, text {""}, 2
Section 7.1.7 says that it applies the rules in XQuery 3.7.3.2, so:
(1) Atomize the sequence, yielding:
        4, "", 2
(2) Each of these atomic values is cast into a string:
        "4", "", "2"
(3) Merge these strings by concatenation with a single space between each pair:
        "4  2"
    (A space between "4" and "", and another between "" and "2".)
    The resulting string becomes the string-value of the new attribute node.

But in fact the value of the attribute is "42".

Comment 8 Michael Dyck 2008-03-11 03:19:56 UTC

Re the type-unsoundness problem of comment #6, here is my proposed solution.
It eliminates the interleaved text{""} nodes from the normalization of direct element constructors. To "distinguish" enclosed expressions, each is instead normalized to a separate function call.

Roughly speaking, we split the (intended) semantics of
    fs:item-sequence-to-node-sequence
into two fs functions, called fs:A and fs:B here for brevity.  With respect to XQuery 3.7.1.3, fs:A represents step 1e (the processing of enclosed expressions, including node-copying and all that that entails), and fs:B represents steps 2 through 4 (the processing of the constructor's whole content sequence).

(I'd suggest that fs:A inherit the name fs:item-sequence-to-node-sequence, and fs:B get the name fs:element-content-sequence.)

The specific changes to rules would be as follows. (Of course there would be collateral changes to the prose and examples.)

4.7.1 Direct Element Constructors / Norm / rule 3
    Change fs:item-sequence-to-node-sequence to fs:B
    Delete the interleaved text{""} items.

4.7.1 Direct Element Constructors / Norm / rule 8
    Change to:
        [[ { Expr } ]]_ElementContentUnit
        ==
        fs:A(( [[ Expr ]]_Expr ))

4.7.3.1 Computed Element Constructors / Norm / rule 2+3
    Change
        fs:item-sequence-to-node-sequence(...)
    to
        fs:B( fs:A(...) )

7.1.5 The fs:item-sequence-to-node-sequence function
    Split it into sections for fs:A and fs:B as follows.

    For brevity here, I leave out the "statEnv |-" and use the following
    abbreviations:
        Child_Type -> (element*|text|processing-instruction*|comment)
        A(Type)    -> (FS-URI,"A")(Type)
        B(Type)    -> (FS-URI,"B")(Type)

    Also for brevity, I leave out the "statEnv |-".

    The STA rules for fs:A would be:

        Type   <: attribute**
        ---------------------
        A(Type) : attribute**

        Type   <:              (Child_Type|document|xs:anyAtomicType)*
        --------------------------------------------------------------
        A(Type) :              (Child_Type|document)*

        Type   <: attribute**, (Child_Type|document|xs:anyAtomicType)*
        --------------------------------------------------------------
        A(Type) : attribute**, (Child_Type|document)*

    The STA rule for fs:B would be:
        
        Type   <: attribute**, (Child_Type|document)*
        ---------------------------------------------
        B(Type) : attribute**, Child_Type*

Comment 9 Michael Kay 2008-03-11 12:45:54 UTC

Concerning comment #7, I was wondering why Saxon doesn't have this problem. The answer is that it only uses the "empty text node" trick for element content, not for attribute content. For attribute content, xx="{a}{b}c{d}" is translated into

attribute {'xx'} {concat(string-join(a, ' '), string-join(b, ' '), 'c', string-join(d, ' '))}

which I think is perfectly sound.

Comment 10 Michael Dyck 2008-03-18 05:49:06 UTC

(In reply to comment #9)
> For attribute content, xx="{a}{b}c{d}" is translated into
> 
> attribute {'xx'} {concat(string-join(a, ' '), string-join(b, ' '), 'c',
> string-join(d, ' '))}

Consider the case where a (or b or d) yields a sequence of integers. The latter translation would raise a type error re string-join's first argument, whereas the direct constructor would cast the integers into strings.

Comment 11 Michael Dyck 2008-03-18 09:08:17 UTC

To address the problem with attribute constructors shown in comment #7, I propose a similar fix to that outlined for element constructors in comment #8.

Here, we split fs:item-sequence-to-untypedAtomic into two functions, fs:C and fs:D. With respect to XQuery 3.7.1.1, fs:C represents step 2, and fs:D represents step 3.

(For real names, fs:C could be fs:item-sequence-to-string-attr, and fs:D() could be fs:attribute-content-sequence, or maybe just fn:string-join(_,'') .)

4.7.1.1 Attributes / Norm / rule 4:
    Change fs:item-sequence-to-untypedAtomic to fs:D.
    Delete the interleaved text{""} items.

4.7.1.1 Attributes / Norm / rule 6:
    Change to:
        [[ { Expr } ]]_AttributeContentUnit
        ==
        fs:C(( [[ Expr ]]_Expr ))

4.7.3.2 Computed Attribute Constructors / Norm / rule 2+3:
    Change
        fs:item-sequence-to-untypedAtomic(...)
    to
        fs:D( fs:C(...) )

7.1.7 The fs:item-sequence-to-untypedAtomic function
    Split into:
        fs:C( $items as item()* ) as xs:string
        fs:D( $strings as xs:string* ) as xs:untypedAtomic
    Both are typed as declared, no special rules.

Comment 12 Michael Dyck 2008-03-18 17:16:13 UTC

At meeting 360, the WG endorsed the changes proposed in comments #8 and #11. This will eventually be reflected by an erratum on the FS spec.

Comment 13 Tim Mills 2008-03-20 11:01:04 UTC

Document construction is normalized with the rule:

[document { Expr }]Expr
==
document { fs:item-sequence-to-node-sequence-doc(( [Expr]Expr)) }

In FS 7.1.6, the fs:item-sequence-to-node-sequence-doc function is described as  "applying the normative rules numbered 1, 2, 3 after the sentence "Processing of the document node constructor then proceeds as follows:" in Section 3.7.3.3 Document Node Constructors."

However, in XQ 3.7.3.3, the preceding paragraph states that the "content expression of a document node constructor is processed in exactly the same way as an enclosed expression in the content of a direct element constructor, as described in Step 1e of 3.7.1.3 Content."

This application of Step 1e isn't captured by the current fs:item-sequence-to-node-sequence-doc.  The function fs:A (the 'new' fs:item-seuqnece-too-node-sequence) is now defined to perform this step.

Therefore I suggest that document constructors should normalize to:

[document { Expr }]Expr
==
document { fs:E(fs:A(( [Expr]Expr))) }

where fs:E might be called fs:document-content-sequence for consistency with Micheal's changes.

I realise that fs:item-sequence-to-node-sequence-doc could just be redefined to apply Step 1e, but since there is already a function defined to apply that step, it does seem sensible and consistent to describe the normalization with two functions.

Comment 14 Michael Dyck 2008-03-20 19:22:10 UTC

(In reply to comment #13)
> 
> This application of Step 1e isn't captured by the current
> fs:item-sequence-to-node-sequence-doc.

Yes, this was pointed out in Bug 3655 comment #1, which is awaiting processing. 

> Therefore I suggest that document constructors should normalize to:
> 
> [document { Expr }]Expr
> ==
> document { fs:E(fs:A(( [Expr]Expr))) }
> 
> where fs:E might be called fs:document-content-sequence for consistency with
> Micheal's changes.
> 
> I realise that fs:item-sequence-to-node-sequence-doc could just be redefined
> to apply Step 1e,

(and that was the original plan for resolving Bug 3655)

> but since there is already a function defined to apply that step, it does
> seem sensible and consistent to describe the normalization with two functions.

I agree.

Comment 15 Michael Dyck 2008-05-14 08:10:02 UTC

In order to properly support the uses of fs:item-sequence-to-node-sequence
in sections 4.4.1 Insert and 4.4.3 Replace of the XQuery Update CR (see
Bug 5666 comment #0), I propose the following tweak to the above fixes.

Recall that the semantics of fs:B are:
  1) Replace each document node by its children.
  2) Merge adjacent text nodes, delete empty text nodes.
  3) Raise an error if an attribute node follows a non-attribute node.

The starting point for the tweak is to move step 1 from fs:B up to fs:A.
(This is valid because fs:B encounters a document node if and only if
fs:A emits one, and because the replacement is context-independent.)
So fs:A's output type no longer includes 'document', which means
that fs:A's static typing now achieves all the type-transforms that
fs:B(fs:A(...)) used to. Which means that fs:B now has no real need
to appear in the normalized query, and can instead dissolve into the
dynamic semantics of computed element constructors. Note that those
semantics already enforce step 3, so the only thing left is for them
to handle step 2.

So, here is the tweaked version of the fix in comment #8.  (I assume
that fs:A inherits the name fs:item-sequence-to-node-sequence.)

4.7.1 Direct Element Constructors / Norm / rule 3
    Delete fs:item-sequence-to-node-sequence and all the parens.
    Delete the interleaved text{""} items.

4.7.1 Direct Element Constructors / Norm / rule 8
    Change to:
        [[ { Expr } ]]_ElementContentUnit
        ==
        fs:item-sequence-to-node-sequence(( [[ Expr ]]_Expr ))

4.7.3.1 Computed Element Constructors / Dyn Ev / rule 1+2
    After
        statEnvn; dynEnv |-  Expr0 => Value0
    insert
        Value0 with text nodes prepared is Value1
    and change subsequent occurrences of Value0 to Value1.

    The latter is an informally defined auxiliary judgment that implements:
        Merge adjacent text nodes; delete empty text nodes.

7.1.5 The fs:item-sequence-to-node-sequence function
    Change its STA as follows:

    For brevity here, I leave out the "statEnv |-" and use the following
    abbreviations:
        Child_Type -> (element*|text|processing-instruction*|comment)
        A(Type)    -> (FS-URI,"item-sequence-to-node-sequence")(Type)

    The STA rules would be:

        Type   <: attribute**
        ---------------------
        A(Type) : attribute**

        Type   <:              (Child_Type|document|xs:anyAtomicType)*
        --------------------------------------------------------------
        A(Type) :               Child_Type*

        Type   <: attribute**, (Child_Type|document|xs:anyAtomicType)*
        --------------------------------------------------------------
        A(Type) : attribute**,  Child_Type*


And here's the tweaked version of the fix for document constructors in
comment #13,

4.7.3.3 Document Node Constructors / Norm / rule 1
    Change
        fs:item-sequence-to-node-sequence-doc
    to
        fs:item-sequence-to-node-sequence

4.7.3.3 Document Node Constructors / Dyn Ev / rule 1+2
    Insert a premise of the form
        ValueX with text nodes prepared is ValueY
    as appropriate.

7.1.6 The fs:item-sequence-to-node-sequence-doc
    Drop the section.

========================================================================

With the above changes, the references to fs:item-sequence-to-node-sequence
in XQuery Update sections 4.4.1 Insert and 4.4.3 Replace would become correct,
both statically and dynamically.

Comment 16 Michael Dyck 2008-07-07 18:37:46 UTC

The proposal in the preceding comment was approved by the WGs at meeting #369 on 2008-06-03.

Comment 17 Michael Dyck 2008-08-22 23:12:00 UTC

The element aspect of this issue has been entered as FS erratum E029, and
the fix from comment #15 has been committed to the source files for the
next edition of the FS document.

Comment 18 Michael Dyck 2008-08-25 02:35:21 UTC

The attribute aspect of this issue has been entered as FS erratum E031, and
the fix from comment #11 has been committed to the source files for the
next edition of the FS document.
Consequently, I'm marking this issue CLOSED.