This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 15317 - absence of functions for reading binary data in XPath 3.0 Working Draft
Summary: absence of functions for reading binary data in XPath 3.0 Working Draft
Status: CLOSED LATER
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Functions and Operators 3.0 (show other bugs)
Version: Last Call drafts
Hardware: All All
: P2 normal
Target Milestone: ---
Assignee: Michael Kay
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-12-22 22:05 UTC by mgac23
Modified: 2012-01-13 11:53 UTC (History)
1 user (show)

See Also:


Attachments

Description mgac23 2011-12-22 22:05:09 UTC
This comment is in regard to the current (last call) working draft for XPath 3.0.
 
What somehow irritates me is the (ongoing) absence of functions for reading binary data stored in files! 
 
A function unparsed-text() was introduced with version 2.0 (I guess) for reading arbitrary text files, but no one has taken this a step further and has standardized a function for reading binary files in XPath or XSLT. This is surprising for such powerful and, in the meantime, mature standards.  There are established encodings for binary data (hex, base64) that allow to embed the data into any valid XML document, even so there is no function to read any binary data from a file - does this make sense?
 
There exist scenarios where it is necessary or even convenient to embed binary data into XML documents. For example think of the (pre-docx) XML data format of the Microsoft Word text processor: Images or OLE objects are embedded as encoded text like that
 
    <w:docOleData><w:binData w:name="oledata.mso">0M8R4KGxGuEAAAAAAAAAA....

Another example (that we use a lot when generating documentation for software development projects) is the embedding of CSS icons or images into XHTML documents: This is done via utilizing data URI schemes in the src attributes of the XHTML img elements
 
    <img src="...
 
This might not be so elegant, but we have reasons to do so.
 
Currently, we have to write a Java extension function for the Saxon XSLT engine for converting the contents of a binary file to a base64 encoded string. A standard XPath function as replacement would be greatly appreciated!
 
In short, my appeal: Please add functions for reading binary data files to the XPath 3.0 standard! For example, they might be called unparsed-binary() or base64-binary() or hex-binary() or whatever you like !
Comment 1 Michael Kay 2011-12-29 15:16:36 UTC
For the record, another requirement for bitwise operations appears in an xsl-list posting today by Dave Pawson: essentially the ability to test whether bit N is set in a bit array expressed in xs:hexBinary lexical form. It's by no means the first time this has come up.
Comment 2 Mike Sokolov 2012-01-03 15:31:23 UTC
More use cases: 

1) Test for the existence of a binary document.  This is useful when processing documents with references to image files.  When they're not present (this often happens due to variances between copyright for print and for online distribution), the requirement may be to process the XML differently, eg by stripping all references to the images, including captions, etc.  Currently trying to do this with doc-available fails - I forget why, but perhaps that can be patched up as an implementation issue. 

2) Read in and write out a binary document unchanged, in an efficient manner (eg without the need to transcode to some text encoding).  This would really be useful if it were also practical to process entire directories or archives (zip files, etc) in the context of xslt/xquery.  At the moment we do directory processing externally, in Java, so this use case is mostly theoretical at the moment, for us. But it still could be useful to, say, pass every file in a folder through xslt, transforming uris on the way, and for the xml, also transforming the content.

I think if this were to go forward, it'd be best to add a binary type to the XDM if there isn't one already.  Then it would be possible to call out to extension functions to perform binary processing as well.
Comment 3 Michael Kay 2012-01-10 18:52:53 UTC
The Working Group discussed this today. Having recently gone to last call, the timing is not right to start adding new requirements or features. Nevertheless many members of the WG understand the need and use cases for improved support for functions and operators that manipulate binary data, and the WG agreed to add this request to the "shopping list" that it will examine when it starts studying requirements for the next release after 3.0.

mgac23, as the originator, we would be grateful if you would mark the report as closed to indicate that you accept this disposition. This does not lose it, of course; we will routinely review bug reports classified as "LATER" when we start on the next round of requirements capture.
Comment 4 mgac23 2012-01-13 11:53:21 UTC
I understand that a last call draft is not the right forum to add new functionality to a standard - closed this bug for now!

Nonetheless, I am glad that the editors recognize the need for extending the ability to handle binary data. I am looking forward to see new functions for this in the following releases.