EXI LC Comments

Dear EXI WG,
please find below some comments and questions regarding EXI specification last call working draft.
Regards,
                youenn fablet

1) Some facets are supported like minInclusive or maxExclusive.
What about the support of the length, minLength and maxLength facets which could be useful to better encode string or list sizes.
It should not be too difficult to support them based on current facet support.
Is there a rationale to not include these facets?

2) Guidelines for schema modeling
Is there any guideline regarding the relationship between EXI and schema modeling?
Guidelines would be useful to understand the impact of some schema modeling decisions on EXI encoding/decoding in terms of efficiency and compression.
For instance, it seems that the more global constructs (elements, types, attributes), the bigger will be the generated grammars since all global schema constructs need to be kept (right?),
having a lot of xs:all or maxOccurs="999" may also hurt efficiency.
See also question 3)

3) DataTypeRepresentationType question
I would like a confirmation of the current DataTypeRepresentationType behaviour.
Let's have a schema with the following attribute definition:
                <xs:attribute name="test" type="xs:string"/>
In that case, the only way to change the encoding for @test1 values with the DataTypRepresentationType feature
is to redefine xs:string which may have great impact.
If we only want to change the @test values with the DataTypRepresentationType feature, we would need to
change the schema as follow:
                <xs:simpleType name="mystring">
                                 <xs:restriction base="xs:string"/>
                </xs:simpleType>
                <xs:attribute name="test" type="mystring"/>
DataTypeRepresentationType could then be used to redefine mystring.
Is it correct?
If so, the interoperability will generally be lost, since interoperable DataTypeRepresentationType use is currently limited to XML Schema part 2 predefined types redefinition (end of section 7.4).
What about extending that behaviour to all simple types that have been gathered by consuming the schema in use?
Is there any rationale behind that specific constraint?

4)  Typed encoding in schema-less mode
EXI enables limited typed encoding support in schema-less encoding.
Since only predefined types are supported, xsi:type seems mainly useful to encode base64 chunks with the binary encoding.
Even in that case, the usability is not so good : in some  cases, elements whose content is base64 have also attributes. For instance ds:SignatureValue has an optional ID attribute.
Of course, one could still use xsi:type=base64Binary in deviation mode but interoperability may be pretty bad and putting a wrong xsi:type for the purpose of compression seems broken.
Also to be noted that:
                - Attribute values cannot be typed encoded with schema-less grammars.
                - Other useful types like "list of float","list of integers" cannot be used without external schema knowledge.
Improved out-of-the-box support of this use case would be very helpful.

5) EXI schema-less/schema-informed modes
Based on internal discussions and internal feedback, there is a general assumption that the EXI specification somehow defines two separate modes (schema-less and schema-informed).
While this is clearly stated in the specification that both modes easily coexist in a single EXI stream,
additional advertisement (maybe in the primer) of that feature may be good for adoption.
The latest published primer (dec 2007) could maybe be improved with that respect.

Additionaly, while EXI provides great flexibility in the amount of schema put in grammars,
the schemaID mechanism seems very minimal.
It seems that interoperable uses of schema-informed EXI will greatly restrain the use of this flexibility.
Is there some additional work in that area that could or will be further conducted?

6) Is it conformant to not follow the attribute order in the case of a schema-informed grammar encoded element in deviation mode?
As stated in  section 6, it seems not conformant.
In some cases, grammars can support attributes in no particular order, such as the example below (correct me if I got something wrong).
<xs:complexType name="test">
                <xs:attribute name="name" type="xs:string"/>
                <xs:anyAttribute namespace="#any"/>
</xs:complexType>
<xs:element name="test" type="test"/>

While the benefit of ordering the attributes at the grammar level and the general compression benefit for encoders to follow the given order are obvious, I do not see compelling reasons of including this constraint in the format itself.
At the encoder side, the encoder may decide to order attributes or not.
If encoding fails due to bad ordering (in strict mode) or if the compression ratio is bad, the encoder can always decide to order the attributes.
At the decoder side, the decoder is only following the grammars so it does not really care about the ordering.
There is even a drawback as this is one (major ?) difference between schema-informed and schema-less processing.
Am I missing something obvious?

7) RDF/XMP use case
This is more a general comment on specific XML/EXI use cases, notably RDF or XMP documents where
no standard, well defined XML schemas are available.
These documents generally have some defined structures and types (RDF schema, XMP schemas…) but no
well defined XML schemas.
What would be the recommendation from the WG to enable good interoperable EXI compression? Stick with schema less encoding? Create a XML schema, publish it and use it?

8)  Through careful checking of published EXI encoded streams
(Thanks again for the publication of these encoded examples by the way!),
Herve found some potential differences between the streams and the specifications (see below).



9)

Section 8.5.4.4.1:

  When adding production:

                                AT (qname) [schema-invalid value] Element?,?

to Elementi,j

Which next Symbol should be used?

Spec says Elementi,j

It would be more logical to use the symbol from the production:

                                AT (qname) [schema-valid value] Elementi,k



10)

Section 9.3

"Value channels that contain no more than 100 values" seems to mean: with *strictly* less than 100 values.

In this paragraph, all comparison should be made clearer using 'greater or equal' and 'strictly greater'.



11)

Section 8.4.3

In Schema-less mode, EE productions should be promoted to event code 0 when used (if no EE production with an event code length of 1 already exist).



12)

Section 8.4.3

In Schema-less mode, when using the SE(*) production, should the creation of the SE(qname) production be done before the evaluation of the element content?



In most case, this has no impact. In case of recursive elements, this leads to better compaction.

Moreover, in case or recursive elements, the current specification seems to imply creating several SE(qname) productions.



13)

Section 8.4.3

xsi:schemaLocation attributes seems to be removed from the infoset before encoding in agile delta streams.
Is it by design or is it implementation related?

14)
Section 7.3.3
Empty strings can occur as attribute values.
Section 7.3.3 suggests that these empty strings are to be added in indexing tables.
The current litteral EXI encoding being compact enough, it is reasonnable not to add them in the table.

Received on Thursday, 6 November 2008 16:17:05 UTC