27782 – [SER future] Serialization of query results for transmission and reconstruction of the XDM value

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 27782 - [SER future] Serialization of query results for transmission and reconstruction of the XDM value

Summary: [SER future] Serialization of query results for transmission and reconstructi...

Status:	NEW

Alias:	None

Product:	TextTracks CG
Classification:	Unclassified
Component:	WebVTT Extension: Regions for rendering cue groups (show other bugs)
Version:	unspecified
Hardware:	All All

Importance:	P5 blocker
Target Milestone:	---
Deadline:	2018-01-14
Assignee:	This bug has no owner yet - up for the taking
QA Contact:	Web Media Text Tracks CG

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2015-01-07 19:53 UTC by Jim Melton
Modified:	2018-01-13 19:19 UTC (History)
CC List:	2 users (show)

See Also:

Attachments

Description Jim Melton 2015-01-07 19:53:43 UTC

Christian Gruen 2015-01-06 11:42:50 UTC

See Bug 27498, Comments 5-9 for the initial discussion.

[reply] [−] Comment 1 Jim Melton 2015-01-07 19:48:25 UTC

In their Joint teleconference of 2015-01-06, the XML Query WG and XSLT WG agreed that the feature described in the referenced bug 27498 (repeated below for organizational purposes) is highly desirable, but also agreed that the requirement became known too late in the process for Serialization 3.1.  This bug is being marked RESOLVED/LATER now, and will be immediately re-opened as a bug against future versions. 

Bug 27498, comments 5-10 are copied here:

 Michael Kay 2015-01-01 11:24:07 UTC

I've just had a request from a Saxon user which suggests an additional requirement: they are interested in serializing the query result (an arbitrary XDM value) not for human consumption, but for transmission to a client application that can reconstruct the XDM value from its serialized form. This suggests additional requirements such as including the type of an atomic value, not just its string value.

[reply] [−] Comment 6 Hans-Juergen Rennau 2015-01-02 22:04:07 UTC

(In reply to Michael Kay from comment #5)
> I've just had a request from a Saxon user which suggests an additional
> requirement: they are interested in serializing the query result (an
> arbitrary XDM value) not for human consumption, but for transmission to a
> client application that can reconstruct the XDM value from its serialized
> form. This suggests additional requirements such as including the type of an
> atomic value, not just its string value.

Incidentally, this additional requirement (as well as the need of XDM serialization in general) has been pleaded for several years by David Lee, e.g. on the XQuery talk list and at two Balisage conferences, alas, without receiving any attention or response.

 C. M. Sperberg-McQueen 2015-01-05 18:56:09 UTC

If we adopt round-trippability as a requirement (as implicitly suggested at least for arrays and maps in comment 5 and endorsed in comment 6), does the requirement also apply to XML data?

One story that would be simple to tell would be:  serialize it using a new serialization method, and then you will be able to reconstitute an isomorphic collection of XDM data from the serialization.  We seem to be missing a couple of things here:

1 a way to annotate XML nodes with type information that can be reliably reconstituted (as long as all the appropriate in-scope schema information is available) -- remember that revalidating with the in-scope schema starting at the root of each maximal XDM tree is not guaranteed to produce the same results;

2 a way to read the serialized data and re-type everything the same way.

It's not clear at first glance how best to add the type annotations required for reliable write + read round tripping for either JSON or XML, without getting in the way of non-XDM systems.

And it would be nice to be able to serialize the entire collection of data without loss; but that involves being able to handle parentless attributes and functions (and possibly other things I'm forgetting at the moment).  Or is there a plausible subset of XDM for which reliable write + read round-tripping can be easily defined and which will suffice for all imaginable purposes?  all rational imaginable purposes?  all rational purposes that don't involve meta-programming or other unusual or unnatural acts?  most rational purposes?  many purposes?  

I thank Hans-Jürgen Rennau for pointing to some earlier discussions that have not been raised as bugs or enhancement requests in Bugzilla; I'll have to refresh my memory to see if solutions have already been suggested for these problems.

[reply] [−] Comment 8 Michael Kay 2015-01-05 23:14:30 UTC

I don't think it's difficult to define an XML representation of the full XDM model, but I doubt it would be very human-readable, so it's a very different objective from the original requirement of this thread.

Parsing that XML representation to reconstitute the XDM would not be possible using pure XSLT and XQuery programs because the only way we allow type annotations to be set is by using validate expressions. But we could define a magic function to do it.
magic function to do it.

[reply] [−] Comment 9 Christian Gruen 2015-01-06 11:47:37 UTC

I agree that a serialization method that allows users to reconstruct original query results would be helpful. As the "adaptive" serialization method is probably not the best target for all that, I have just added a new bug entry for further discussion (Bug 27763).

[reply] [−] Comment 10 David Lee 2015-01-06 20:55:13 UTC

A few years back I started a discussion on this and created a wiki with quite a few of these issues.

http://xml.calldei.com/XDMSerialize


Mike encouraged me to start with Use Cases ... and I definitely agree.
For example, a primary use case I have is "streaming" XDM producers.  For example producing log messages or long lived sessions.
The Efficient XML group (while I was on it) had several real world 'customers' who needed this as well (but for efficient XML), and the solution is non-ideal.   One example was for "Instant Message" applications.  each message is an XML Element but the entire stream is a long lasting document because there is no standardized way of representing streams of XML documents ... The requirement is to get each message without over-reading the socket ... That may be a edge case, but consider typical "feeds' ... twitter, facebook, stocks, news, message queues. 
Or simply log files ... how to parse a log file before its "done" ... 

This site is still up and discusses many of the use cases I considered.  I put this on hold when I realized I didn't have a clean solution to item types like maps or functions, and that some use cases have contradictory requirements such as full fidelity vs minimal output.
An example - do you really need to expose node identity ? without it you cant reconstruct the XDM perfectly but is that needed ? for what cases ?


Its good to see some renewed interest in this topic.

If we cant get XDM (of some sort ... ) in and out of our XDM Tools using some format that has a reasonable chance of being recognized by another tool set  ... that a big barrier ... To me, the "human readable text output" is interesting but not that problematic as any vendor can solve that differently  ... (humans are tolerant of differences).