This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 6255 - [XPath2] Base URI after validation
Summary: [XPath2] Base URI after validation
Status: RESOLVED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: XSLT 2.0 (show other bugs)
Version: Recommendation
Hardware: PC Windows NT
: P2 normal
Target Milestone: ---
Assignee: Michael Kay
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-11-29 00:01 UTC by Michael Kay
Modified: 2009-11-13 15:42 UTC (History)
2 users (show)

See Also:


Attachments

Description Michael Kay 2008-11-29 00:01:03 UTC
I've tagged this as XPath for administrative convenience, but it's actually an issue that affects XSLT and XQuery rather than XPath itself.

Both XSLT and XQuery define the process of validation in terms of a number of steps, starting with serializing the XDM and then reparsing it to construct an Infoset, which is then validated as described in the XML Schema specification.

If this process is followed literally, the base URI of the nodes in the validated document is undefined. This is unfortunate, since it makes any relative URI references in the document unusable.

I suggest we add a rule that the base URI of the document node of the Infoset that is constructed must be the same as the base URI of the XDM node being validated.

As a future enhancement, there would seem to be a case for allowing the required base URI of a document node to be specified explicitly as an option on a document node constructor (document{} in XQuery, xsl:document in XSLT). It's easy to define syntax for this in XSLT of course, much harder in XQuery.
Comment 1 Henry Zongaro 2008-12-01 21:27:15 UTC
I see that section 19.2.1.3 of XSLT 2.0[1] defines validation in terms of serialization, so I think this is a valid bug against XSLT.  However, section 3.13 of XQuery[2] seems to skip serialization, and uses the mapping of XDM to Infoset directly.  I think the base URI will be preserved for XQuery through the entire trip from XDM to Infoset, to PSVI augmentations and back to XDM again.  My apologies if I've missed something.

[1] http://www.w3.org/TR/xslt20/#validation-process
[2] http://www.w3.org/TR/xquery/#id-validate
Comment 2 Michael Kay 2008-12-01 21:41:32 UTC
It's true that XQuery doesn't mention serialization and reparsing directly, but it refers to XDM, presumably section 4, which does. Although rereading this brief section 4, I see that it actually claims to describe two mappings, only one of which uses serialization and reparsing. I imagine that the other is the one described in Appendix K, though that isn't referenced explicitly (and it would be difficult to do so, since it is non-normative). But the mapping in Appendix K does retain the base URI of an element node.
Comment 3 Henry Zongaro 2008-12-02 10:59:28 UTC
Interesting.  I had missed section 4 of XDM, titled "Infoset Mapping" and fixed my sight on sections 6.1.5, 6.2.5, 6.3.5, 6.4.5, 6.5.5, 6.6.5 and 6.7.5 of XDM, each of which is also titled "Infoset Mapping".  I took the reference in XQuery 1.0 to be a direct reference to those sections of XDM (summarized non-normatively in appendix K), rather than a reference to section 4 of XDM.
Comment 4 Jonathan Robie 2008-12-16 20:38:38 UTC
In XQuery, this is quite possibly not broken. I started here:

http://www.w3.org/TR/xquery-11/#id-validate

The first step is to convert the operand node to an Information Set using the rules found in the Data Model. For elements, this rule preserves the Base URI:

<snip from="http://www.w3.org/TR/xpath-datamodel/#const-infoset-element">
Element Node properties are derived from the infoset as follows:

base-uri

    The value of the [base URI] property. Note that the base URI property is always an absolute URI (if an absolute URI can be computed) though it may contain Unicode characters that are not allowed in URIs. These characters, if they occur, are present in the base-uri property and will have to be encoded and escaped by the application to obtain a URI suitable for retrieval, if retrieval is required.

</snip>

After that, the resulting Infoset is validated as per XML Schema. This does not lose the Base URIs from the Infoset.

So I don't think there is a real problem here. Did I get that wrong?

Jonathan
Comment 5 Jonathan Robie 2008-12-16 20:43:29 UTC
(In reply to comment #4)

> <snip from="http://www.w3.org/TR/xpath-datamodel/#const-infoset-element">
> Element Node properties are derived from the infoset as follows:
> 
> base-uri
> 
>     The value of the [base URI] property. Note that the base URI property is
> always an absolute URI (if an absolute URI can be computed) though it may
> contain Unicode characters that are not allowed in URIs. These characters, if
> they occur, are present in the base-uri property and will have to be encoded
> and escaped by the application to obtain a URI suitable for retrieval, if
> retrieval is required.
> 

Oh bother, wrong snip. What I meant was this:

<snip from="http://www.w3.org/TR/xpath-datamodel/#infoset-mapping-element">
[base URI]

    The value of dm:base-uri.
</snip>

But the upshot is the same, the URI is there in the mapped Infoset Elements, and it is preserved when these elements are validated.

Jonathan
Comment 6 Jonathan Robie 2008-12-17 14:47:07 UTC
(In reply to comment #3)
> Interesting.  I had missed section 4 of XDM, titled "Infoset Mapping" and fixed
> my sight on sections 6.1.5, 6.2.5, 6.3.5, 6.4.5, 6.5.5, 6.6.5 and 6.7.5 of XDM,
> each of which is also titled "Infoset Mapping".  I took the reference in XQuery
> 1.0 to be a direct reference to those sections of XDM (summarized
> non-normatively in appendix K), rather than a reference to section 4 of XDM.


Henry,

First off, apologies for not reading your remarks before I responded to Mike Kay. 

I take Section 4 of XDM to be a forward reference to sections 6.1.5, 6.2.5, 6.3.5, 6.4.5, 6.5.5, 6.6.5 and 6.7.5. Perhaps it should be more explicit about this, but I think the information is there, and I agree with you that the Base URI is preserved.

Jonathan
Comment 7 Michael Kay 2009-02-12 17:26:27 UTC
Reclassified as XSLT because the only outstanding parts of the problem are in XSLT.
Comment 8 Michael Kay 2009-03-06 09:52:23 UTC
I propose adding after the numbered list of steps in XSLT 2.0 section 19.2.1.3 the paragraph:

<add>
The above process must be done in such a way that the base URI property of every node in the resulting XDM tree is the same as the base URI property of the corresponding node in the input tree.

Note: As an alternative to steps 1 and 2, the XDM tree may be converted to an Infoset directly, using the mapping rules given for each kind of node in [Data Model](Section 6).
</add>
Comment 9 Michael Kay 2009-11-13 15:42:37 UTC
The proposal in comment #8 was accepted at the WG telcon on 12 Nov 2009 and will appear as erratum XT.E38 (drafted).