xsi:type in strict mode

Dear all,

This is feedback related to the EXI specification.

Currently, according http://www.w3.org/TR/exi/#addingProductionsStrict, a @xsi:type production is added only when named subtypes are known to the EXI processor.
The intention, AIUI, is that as few @xsi:type productions as possible are actually added to grammars so as to get some compression gain.

I see a  practical drawback with this current approach.
Some XML documents, valid according the XML schema components used to generate the grammars, will not be encodable by EXI encoders.
Given the following schema and instance:
<xs:schema>
                < xs:element name="test" type="xs:base64Binary"/>
</ xs:schema>
<test xsi:type="xs:base64Binary" .../>
The instance is valid as per the schema but, according our current interpretation, is not encodable in strict mode.
If that is not correct, could you clarify the specification?
If that is correct, this is clearly not practical since one of the strict mode design goal was to encode at least all XML schema valid documents.
This case is not happening often currently. However, it may happen that applications put more often @xsi:type information to ensure value typing, even in schemaless mode.

The additional issue is that, in some environments, it may be tempting to use a global schema at the application level and a subset for the EXI transmission (to fit the lightest devices).
In those cases, this issue may cause trouble since perfectly application-schema-valid documents will not be encodable in strict mode, depending of course on the exact EXI schema subset. It could be expected that documents valid according a particular schema could be encoded in strict mode with a subset of the particular schema.
Also, adding new grammars to a given grammar set would require potential modifications of the grammars themselves.

Getting back to the actual compression gain of the current approach, the benefit is not high.
At max, it gains 1 bit per element (in bit-packed mode only, no difference for the compression mode).
In practice, I even doubt that the compression gain is that high:
@xsi:type production is often added to simple typed elements ("xs:string" typed elements e.g.)
                @xsi:type production has minor impact to the code length as soon as element content has optional attribute/child items

At the very least, the rule could be changed so that a @xsi:type production is added for all elements whose content is defined by a global type definition.
This seems more inline with the XML Schema specification. In addition, schema writers that want to squeeze as much bits as possible could inline their schema type definitions.

Regards,
                youenn

Received on Friday, 29 January 2010 14:14:53 UTC