This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 29608 - Fragment identifiers: fn:doc vs. fn:unparsed-text
Summary: Fragment identifiers: fn:doc vs. fn:unparsed-text
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Functions and Operators 3.1 (show other bugs)
Version: Candidate Recommendation
Hardware: PC Windows NT
: P2 normal
Target Milestone: ---
Assignee: Michael Kay
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-05-04 05:15 UTC by Christian Gruen
Modified: 2016-07-21 15:03 UTC (History)
1 user (show)

See Also:


Attachments

Description Christian Gruen 2016-05-04 05:15:13 UTC
I noticed that fragment identifiers are only allowed for fn:doc, but not for fn:unparsed-text. I assume there is a specific reason for that, but I could not find any hint in the XQFO specification. Would it be possible to add an explanatory phrase to one of the function definitions?
Comment 1 Michael Kay 2016-05-04 10:49:51 UTC
The semantics of a fragment identifier depend on the media type. There is a well-defined interpretation of fragment identifiers for application/xml, but I'm not aware of any for text/plain. Am I mistaken?
Comment 2 Christian Gruen 2016-05-04 11:03:52 UTC
> Am I mistaken?

I don’t think so. It was mostly my knowledge gap (and my vague assumption that others could have similar gaps) that led to this question. With the well-defined interpretation, do you refer to the XPointer working draft?
Comment 3 Abel Braaksma 2016-05-08 19:46:09 UTC
The fragment identifier is defined with the MIME type of a document and may depend on the serving application.

W3C says this:
(https://www.w3.org/DesignIssues/Fragment.html)
"The significance of the fragment identifier is a function of the MIME type of the object.[...]The fragment ID spec for a new MIME type should  be part of the MIME type registration process."

And RFC-3986 says: "The fragment's format and resolution is [..] dependent on the media type [RFC2046] of a potentially retrieved representation. [..] Fragment identifier semantics are independent of the URI scheme and thus cannot be redefined by scheme specifications."

Here are some examples of such MIME types:

- RFC-5147 defines fragment ids for text/plain through "char" and "line"
- RFC-7111 defines it for text/csv for "col" and "row"
- RFC-3778 defines it for application/pdf (page no, section, ref etc)
- W3C has a Recommendation for Media Fragments: https://www.w3.org/TR/media-frags/
- Some languages support package download (like Python) and add the MD5 hash as a fragment identifier to the URI, which then either retrieves the whole document or an error (if the MD5 does not match). This could well be used with any resource.
- This same MD5 check, or a length-check is also part of RFC-5147.

So I think Christian has a good point and perhaps we should say something about it, and at the very least allow it with unparsed-text etc as well (in fact, I think we should allow it with any external resource).
Comment 4 Michael Kay 2016-05-10 15:43:43 UTC
Noted in discussion:

(a) this is a new feature to introduce at a late stage and because of the interaction with the environment there's a lot of scope for quibbling about the exact spec; the chance of getting it right first time is small.

(b) doc() currently allows a fragment id and defines no semantics for it. Almost any attempt to define semantics for it would cause a compatibility problem for some implementations (or for their users...)

(c) unparsed-text() currently disallows a fragment id. If we were to allow it, we would probably have to leave the semantics implementation-defined if we want similar behaviour to doc(). It's not clear that's desirable.
Comment 5 Michael Kay 2016-05-10 15:56:03 UTC
We resolved to make no change to what's allowed/disallowed but to add editorial notes explaining the effect of fragment identifiers in the doc() function.
Comment 6 Michael Kay 2016-06-07 15:02:14 UTC
fn:doc has a list of things where the behaviour of the function is implementation-defined, and I have added to this list:

<p diff="add" at="E">The effect of a fragment identifier in the supplied URI is implementation-defined. One possible interpretation is to treat the fragment identifier as an ID attribute value, and to return a document node having the element with the selected ID value as its only child.</p>