This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 29206 - [xslt30] Streamed validation
Summary: [xslt30] Streamed validation
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: XSLT 3.0 (show other bugs)
Version: Last Call drafts
Hardware: PC All
: P2 normal
Target Milestone: ---
Assignee: Michael Kay
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-10-16 09:41 UTC by Michael Kay
Modified: 2015-10-29 12:42 UTC (History)
2 users (show)

See Also:


Attachments

Description Michael Kay 2015-10-16 09:41:26 UTC
It has been pointed out that we ought to say something about streamed schema validation. I proposed some text to add to section 2.10 at https://lists.w3.org/Archives/Public/public-xsl-wg/2015Oct/0011.html; Michael Sperberg-McQueen commented on this at https://lists.w3.org/Archives/Public/public-xsl-wg/2015Oct/0012.html. Taking these comments into account, I propose to add the following text:


Streaming can be combined with schema-aware processing: that is, the streamed input to a transformation can be subjected to on-the-fly validation, a process which typically accepts an input stream from the XML parser and delivers an output stream (of type-annotated nodes) to the transformation processor. The XSD specification is designed so that validation is, with one or two exceptions, a streamable process. The exceptions include:

* There may be a need to allocate memory to hold keys, in order to enforce uniqueness and referential integrity constraints (xs:unique, xs:key, xs:keyref).

* In XSD 1.1, assertions can be defined by means of XPath expressions. These are not constrained to be streamable; in the general case, any subtree of the document that is validated using an assertion may need to be buffered in memory while the assertion is processed.

Applications that need to run in finite memory may therefore need to avoid these XSD features, or to use them with care.

XSD is designed so that the intended type of an element (the "governing type") can be determined as soon as the start tag of the element is encountered: the process of validation checks whether the content of the element actually conforms to this type, and by the time the end tag is encountered, the process will have established either that the element is valid against the governing type, or that it is invalid. 

By default, dynamic errors occurring during streamed processing are fatal: they typically cause the transformation to fail immediately. XSLT 3.0 introduces the ability to catch dynamic errors and recover from them. Schema invalidity, however, is treated as a dynamic error occurring in the instruction that processes an entire input stream, so after a validation failure, no further processing of that input stream is possible.

In consequence, a streamed validator that is running in tandem with a streamed transformation can present the transformer with element nodes that carry a provisional type annotation representing the type that the element will have if it turns out to be valid. As soon as a node is encountered that violates this assumption, the validator should stop the flow of data to the transformer, so that the transformer never sees invalid data. This allows the stylesheet code to be compiled with the assumption of type-safety: at run-time, all nodes seen by the transformation will conform to their XSLT-declared types (for example, a type declared implicitly using <code>match="schema-element(invoice)"</code> on an xsl:template element).

A streamed transformation that only accesses part of the input document (for example, a header at the start of a document) is not required to read the entire document once the data it requires has been read. This means that XML well-formedness or validity errors occurring in the unread part of the input stream may go undetected.
Comment 1 Michael Kay 2015-10-16 09:51:27 UTC
This text has been applied to the spec, but the bug remains open pending WG review.
Comment 2 C. M. Sperberg-McQueen 2015-10-23 01:51:14 UTC
The proposed text satisfies all my concerns; my thanks to the editor.