W3C

Disposition of comments for the Efficient Extensible Interchange Working Group

paged view

In the table below, red is in the WG decision column indicates that the Working Group didn't agree with the comment, green indicates that a it agreed with it, and yellow reflects an in-between situation.

In the "Commentor reply" column, red indicates the commenter objected to the WG resolution, green indicates approval, and yellow means the commenter didn't respond to the request for feedback.

CommentorCommentWorking Group decisionCommentor reply
LC-2104 Mohamed ZERGAOUI <innovimax@gmail.com> (archived comment)
The format for double seems richer than IEEE 754 float and double that
are respectively used for xs:float and xs:double

But the fact that base 10 is used instead of base 2 will imply
rounding issues for float and double

How do you consider to work around that ?
One of the advantages of using a base 10 representation is that it avoids rounding issues when moving floating point data between EXI and text XML and between EXI and an application that use the standard XML interfaces. XML, EXI and the standard XML interfaces all use a base 10 representation for floating point numbers, so no rounding issues will occur in these circumstances.

You are correct that rounding issues may occur when moving floating point data between EXI and a base 2 representation. These rounding issues will be identical to those that occur moving floating point data between text XML and a base 2 representation, so EXI maintains the same behavior as XML in these cases. As such we avoid introducing any *new* rounding issues. Any work-arounds developed to address rounding issues for text XML will continue to work for EXI.
no
LC-2175 FABLET Youenn <Youenn.Fablet@crf.canon.fr> (archived comment)
4) Typed encoding in schema-less mode
EXI enables limited typed encoding support in schema-less encoding.
Since only predefined types are supported, xsi:type seems mainly useful to encode base64 chunks with the binary encoding.
Even in that case, the usability is not so good : in some cases, elements whose content is base64 have also attributes. For instance ds:SignatureValue has an optional ID attribute.
Of course, one could still use xsi:type=base64Binary in deviation mode but interoperability may be pretty bad and putting a wrong xsi:type for the purpose of compression seems broken.
Also to be noted that:
- Attribute values cannot be typed encoded with schema-less grammars.
- Other useful types like "list of float","list of integers" cannot be used without external schema knowledge.
Improved out-of-the-box support of this use case would be very helpful.
We feel that EXI should fit smoothly into the XML stack. To avoid any
problems EXI supports type information in the same fashion as XML
does. The groups intention is not to super-set XML. XML allows one to
specify typed-values with the attribute xsi:type. EXI provides the
same mechanism and both work on elements only.

EXI makes use of XML Schema and its types form the basis for typed
values in the entire W3C environment. Introducing new types or another
type system is not the scope of our group and might affect XML
processing in general.

A newly invented typing system would be processed in the context of
EXI only. Such type information would be lost when documents are
converted from EXI to XML then back to EXI. This may cause an issue,
since it may look like XML cannot preserve what EXI is describing.

The solution for arbitrary types in EXI (as well as in XML) is using
schema information.
Please note that partial schema information are sufficient for EXI. In
the use-case you provided a ds:SignatureValue element with the
according type information for content and attributes achieves the
desired result. An EXI processor picks the given type information for
element content as well as the attributes.
tocheck
LC-2164 ISHIZAKI Tooru <ishizaki.tooru@canon.co.jp> (archived comment)
Dear EXI members,

In chapter 5.4(EXI Options), alignment option and EXI compression
option can be unified.
Then option values are
- bit-packed, byte-alignment, pre-compression, compression

If doing so, the implementation will be more simple.

Best Regards,
Tooru Ishizaki.
You are correct in that the four options
you present are the only possible ones from the alignment-compression
pair.

The reason why alignment and compression are separated as options is
to achieve better compactness for the EXI Options header. It is
expected that EXI compression is used often, so it is placed in the
"common" part of the Options header, whereas the non-default alignment
options are expected to be used only in very special circumstances and
are therefore placed in the "uncommon" part. This reduces the size of
the Options header in the usual case when the default alignment is
used.

But none of this prescribes any particular implementation strategy. An
implementation is free to adopt the approach that you suggest,
including in its API, as long as it produces and is able to consume
correct Options headers.
tocheck
LC-2166 ISHIZAKI Tooru <ishizaki.tooru@canon.co.jp> (archived comment)
Dear EXI members,

I have a feedback of EXI specification.
In chapter 7.2(Enumerations), why can't the enumeration type
be applied for QName?

Best Regards,
Tooru Ishizaki.
Value qnames are represented as strings in EXI format. Because prefixes used
in an instance and its corresponding schema are declared in each document
independently from the other, enumerated values in schema are of little use for
types derived from xsd:QName since value qnames in EXI streams are strings.
Note, however, that markup qnames which occur as the names of elements
and attributes are represented using built-in QName datatype representation [1].

The rationale for the use of strings as the representation of value qnames
is primarily so as to avoid the anticipated processing efficiency overhead that
is likely to be involved if we used built-in QName datatype representation for
value qnames. This concern of processing efficiency overhead is particular to
value qnames, and is not foreseen for markup qnames. Therefore we use built-in
QName datatype representation for markup qnames.
yes
LC-2103 Mohamed ZERGAOUI <innovimax@gmail.com> (archived comment)
In 7.1.2 Boolean

Please consider instead of

[[
Otherwise, when pattern facets are available in the schema datatype,
Boolean datatype representation is able to distinguish values not only
arithmetically (0 or 1) but also between lexical variances ("0", "1",
"false" and "true"), and values typed as Boolean are represented as
n-bit unsigned integer (7.1.9 n-bit Unsigned Integer), where n is two
(2) and the value zero (0), one (1), two (2) and three (3) each
represents value "false", "0", "true" and "1".
]]

having

[[
the value zero (0), one (1), two (2) and three (3) each represents
value "0", "1", "false", "true".
]]

such as bittesting the last bit could give the right result

Regards,

Mohamed ZERGAOUI
--
Innovimax SARL
Consulting, Training & XML Development
9, impasse des Orteaux
75020 Paris
Tel : +33 9 52 475787
Fax : +33 1 4356 1746
http://www.innovimax.fr
RCS Paris 488.018.631
SARL au capital de 10.000 €
Thank you for your suggestion regarding the order of the bits in the 2-bit unsigned Integer representation used to distinguish between lexical variances of the Boolean type. The current representation allows bit-testing of the first bit to provide the value of the Boolean. It is not clear that changing the representation to allow bit-testing of the second bit rather than the first bit would provide a notable benefit.

Both representations seem equally capable, so our current plan is to keep the current representation. Please let us know if we've misunderstood some aspect of your suggestion.
tocheck
LC-2106 Mohamed ZERGAOUI <innovimax@gmail.com> (archived comment)
Dear

In 7.3.3 Partitions Optimized for Frequent use of String Literals

It is said :
[[
Editorial note
String values representing value content items are never added to the
string table once valueAmount reaches valuePartitionCapacity. The
working group is still looking at other alternatives to cap the amount
of memory used for value partitions that can result in more compact
representation of string values overall, including those that involve
reassignment of compact identifiers using some sort of round-robin
selection method, and the expected effect on processing efficiency of
each alternative.
]]

Please consider using Move to front method [1]

Regards,

Mohamed ZERGAOUI

[1] http://en.wikipedia.org/wiki/Move-to-front_transform
When initially discussing the valuePartitionCapacity option, the
working group analyzed several different potential methods for
treating new strings when the maximum capacity is reached. These
methods included Move to front, which you proposed.

The group does not believe that Move to front can be sufficiently
efficient to serve as the replacement method, based on analysis of the
possible algorithms in this context. The string table is used constantly
when processing an EXI document, so it needs to be as efficient as
possible, and we do not believe that the potential improvement in
compactness is worth the amount of decreased processing efficiency that
Move to front is estimated to result in.
tocheck
LC-2187 TAMIYA Keisuke <tamiya.keisuke@canon.co.jp> (archived comment)
EXI has a mechanism of the grammer leaning (ref. Section 8).
So, an EXI parser use large memory to keep grammar data, if the XML data
has many kinds of node (ex. element).
For small devices, this is a serious problem.
I think the limitation mechanism of the grammar learning is needed in
the specification (ex. MAX number of the kind of event code).
We understand the concern you have raised over the issue grammar
learning behaviour may pose on the size of memory EXI processor
consumes and requires.

While acknowledging it is a potential issue, we had stopped short of
adding a mechanism to restrain grammar learning at the time we dicussed
the feature a while back, based on implementation report of the feature
shared by the WG memebers, which suggested otherwise that the issue did
not surface in real deployments.

One of the traits that documents for use in exchange in general is that
grammars are learned a lot up front at a pace close to linear rate, and
the learning becomes less and less frequent, eventually saturating to a
convergence. This observation helps explain why the issue is not very
likely to emerge in practice.

Based on these observation and analysis, as well as the consideration of
the cost that would otherwise be incurred if a restraining mechanism was
introduced both in terms of compactness and runtime efficiency, we stay
cautious and may require further arguments involving specific use cases
before we consider a way to alleviate the potential issue.
yes
LC-2190 TAMIYA Keisuke <tamiya.keisuke@canon.co.jp> (archived comment)
EXI has Self-Contained option (ref. S.5.4).
And the EXI parser can parse Self-Contained data without parsing other parts of the EXI document.
But how does the EXI Parser find the position (ex. bytes offset) in the EXI data?
If the EXI parser wants to parse Self-Contained data first, can it get a start position of Self-Contained data from the EXI documents?
The EXI format itself offers only a hook to support selfContained
subtrees, meaning that an EXI processor may access such an independent
fragment without reading what came first.

The working group expects many different use cases for this feature
and does not restrain its use to a specific mechanism which is likely
not be suitable for many environments. It is therefore up to an
application or system to make use of this feature and build an index
or any other solution on top of it.
tocheck
LC-2171 Thomas Hornig <t.hornig@highQ.de> (archived comment)
we would like you to take into consideration to introduce the boolean field
"EXIBodyEncrypted" to the definition of the EXI header. This field could indicate that the
EXI body of an EXI stream is encrypted with the implication that a standard EXI parser
should reject such data with a corresponding notification code.

The encryption-parameters itself could then be optionally described in the "user defined" section of
the EXI header in conformance to the W3C recommendation "XML Signature and Encryption".
In this way, it could be up to the single application itself to handle encryption of the EXI body after
EXI encoding respectively decryption before EXI decoding and to set the proposed
EXIBodyEncrypted-Flag appropriate.
By this we believe that maximum efficiency and flexibility in encryption could be achieved, while
keeping EXI conformity at the same time.

We have notice that this point has been largely discussed, but again, in terms of efficiency it would
open up the chance to follow the rule "first compression, then encryption" with minimum overhead and
without the demand of another MIME-type-definition in an maybe otherwise necessary, additional
envelope to cope with alternative, efficient XML encryption.
The group believes that
EXI should integrate well with the family of XML technologies, and
therefore we would rather not define EXI-specific encryption or
digital signature methods but use the existing XML Encryption and
Signature specifications, applying them to EXI by using their defined
extension points.

As you note, the recommended rule is to first compress (or encode in
EXI) and then encrypt. Accomplishing this with XML Encryption requires
some additional specification work, but very little. We have already
talked with the XML Security working group on the best way to
accomplish this and believe we understand how to do it in a way that
fits into the XML Encryption specification. After this work is done,
we will publish it in one of our documents (we have not yet determined
which one).
tocheck
LC-2168 UCHIDA Hitoshi <uchida.hitoshi@canon.co.jp> (archived comment)
Dear EXI WG,

In 7.1 Built-in EXI Datatype Representations,
what do you think about the addition of IEEE float ?
For examples, it takes more time to decode EXI integer
because it isn't same to int type of general programing languages.

--
Regards,
Hitoshi Uchida
<uchida.hitoshi@canon.co.jp>
We understand that the IEEE float representation is one of the most frequently
requested changes to the EXI specification. We would like to use this
opportunity to better explain our position on this issue and share with you
the rationale that supports it. As you will find outlined below, the WG had
assessed both its advantages and disadvantages in the context of EXI, as well
as alternative ways to assuage the concerns.

The WG came to believe that the use of IEEE float fits better with the
user-defined datatype representation capability [1] of EXI than is integrated
as an additional mandatory physical representation of character information
items that are typed as xsd:float or xsd:double. There are some aspects of
IEEE float that do not lend natureally themselves to the primary role of
EXI as an efficient exchange format of XML, particularly when EXI is required
to remain intrinsically compatible with existing XML family of standards
written to XML infoset, such as XML signatures. The issues the WG looked at
in particular are, less compactness and less amenability to compression,
as well as rounding issues in serialization (base-2 to base-10) and the cost
of stringification required when used with the typical XML APIs (SAX, DOM, etc.)
which are undeniably predominantly text-based.

However, by saying this, it is not our intention to derogate the value nor
it is to neglect its wide-spread use in diverse range of applications. We do
not intend to dispute the merit IEEE float brings to some applications, and
understand EXI might be seen as the perfect opportunity to see that merit
in a way fitted well to XML technologies. Though the EXI spec calls for a
deliberate assessment when considering the use of user-defined datatype
representation, obviously IEEE is one of such cases that warrant the use
of the facility without any question.

It was not an easy discussion within the WG that led to the position
mentioned above. Some argued EXI is not XML enough, and others contended
EXI as not binary enough. We respected both views to find a difficult
balance and better serve both needs. In a way, we do not see EXI as static,
rather it is a core foundation on which things can be further built to suit
use cases, which will make the use of EXI more perfect. The two notable
points of innovation provided by EXI are user-defined datatype representation,
and the provision that permits the definition of schema binding for other
schema languages.

We therefore encourage you to seek the best way to leverage user-defined
datatype definitions to represent float and double data as IEEE floats.
If you find any suggestions that would make it easier to use IEEE float
through user-defined datatype representation, we would love to hear them.
Such suggestion may well find its way into a collateral documents such as
FAQ or Best Practices.

[1] http://www.w3.org/TR/exi/#datatypeRepresentationMap
tocheck
LC-2169 UCHIDA Hitoshi <uchida.hitoshi@canon.co.jp> (archived comment)
Dear EXI WG,

In 7.1.10.1 Restricted Character Sets,
it is better that this feature allows optional
because it takes much time to compute the restricted
characters based on pattern facets of a schema beforehand.

--
Regards,
Hitoshi Uchida
<uchida.hitoshi@canon.co.jp>
http://lists.w3.org/Archives/Public/public-exi-comments/2009Sep/0003.html tocheck
LC-2170 UCHIDA Hitoshi <uchida.hitoshi@canon.co.jp> (archived comment)
Dear EXI WG,

In 7.3 String Table,
what do you think about a function to share the string table
between two documents ?

After an EXI processor finished encoding a first document,
it uses the string table of the first document to encode
a second document.
This function can make the size of the second document smaller.

In decoding the second document, the processor needs the first document
because the second one depends on the string table of the first one.
It may be better to add an ID to the header of the second document
to identify the depending document information like schemaID of EXI
Option.

--
Regards,
Hitoshi Uchida
<uchida.hitoshi@canon.co.jp>
EXI already provides one feature that can be used to achieve the
functionality you request. Namely, it is possible to encode a series
of XML documents as an EXI fragment, which will retain both the string
table and any learned grammar content between individual documents of
the series.

In general, the working group believes that being able to preserve any
learned content from one encoded document to the next is primarily
applicable to situations where there is already an externally-defined
ordering among the documents, such as over a communication channel
that guarantees in-order delivery. For such situations, the solution
using fragments and relying on the external ordering seems adequate,
and thus we have no intention of defining an ordering internal to EXI.
yes
LC-2133 Yuri Delendik <yury_exi@yahoo.com> (archived comment)
Hello,

The section 8.5.4 Schema-informed Element and Type Grammars describes how to build EXI schema-informed grammar from W3C XML Schema. Relax NG or other schema languages have different constructs (e.g. interleave or value based choices) that may not be expressed in XML Schema terms, but they can be expressed in EXI grammar terms.

The whole 8.5.4 looks like example rather than specification; and has to be moved into appendix or separate document. The section that explains EXI grammar terms may contain explanation of schema-valid values. Also, it will be better to express EXI options schema in EXI grammar terms in addition to the XML schema to illuminate variations in implementations.

Creating grammar based on “strict” option state is more like creation of two independent schemas. Instead using of this option, it is better to use two different schemaIDs, e.g. urn:my-schema-strict and urn:my-schema-loose. The spec is already saying “…The parties involved in the exchange are free to agree on the scheme of schemaID field that is appropriate for their use to uniquely identify the schema information.”

Thanks.
The XML Schema mapping to EXI grammar terms described in section 8.5.4
[2] is a normative mapping. A conformant EXI decoder must support this
specified mapping, as stated in the conformance section [3]. As noted
in your comment, it is possible to specify mappings from other
schema languages. Requiring all conformant EXI processors to implement the
W3C XML Schema mapping feature will facilitate interoperability, and
we picked XML Schema because it is a W3C Recommendation. We do not
intend to make any judgement or preference over various schema languages.
Other mappings might be done by the EXI Working Group (or another
group) in the future but it is not planned at this time.
For those reasons, as well as since we do not see real semantic benefit by
the changes, we feel that this section is worth staying in the main body of
the specification.

A second part of your comment suggests that we use EXI grammar terms
to describe EXI Options. We will make it clear that the XML schema for
Options is provided for clarity of the specification, and it is not required
for the EXI processors to read an Options schema. Therefore we think
that adding an equivalent EXI grammar, which would be more verbose, is
not needed.

The last part of your issue is about the use of "strict" mode. The Working
Group believes that the use of the strict vs. non-strict mode is not
equivalent to the use of a different schema. The processor uses the same
schema but handles deviations from that schema when in non-strict mode.
This enables an use case where sender and receiver agrees to a schema,
and still the encoder has a liberty of arbitrarily using "strict" option
instance-by-instance basis, without pre-agreed schemaID.
tocheck
LC-2108 <pub@upokecenter.com> (archived comment)
I want to make a suggestion on the section 'Deriving Character Sets from XML Schema Regular Expressions':

I want to propose that datatypes with a regular expression containing a "charClassSub" should have no restricted character set. The reason is that all the remaining parts of the regular expression derivation expect only a union of characters, which is very efficient in determining whether the expression contains a restricted character set or not. Having a 'charClassSub' as part of the derivation process may complicate this, as the program now has to subtract portions of the character set as well as add to them, which may be a problem if the character set contains a large number of characters, like this:

[&#x20;-&#xFF00;-[&#x60;-&#xFF00]]

That regular expression above would yield a restricted character set of 64 characters; however the implementation may require storing thousands of characters (a naive implementation, yes) before it must exclude them in the 'charClassSub' portion of the regular expression. Another problem is nested 'charClassSub' sets. For example, the following regular expression is allowed:

[A-Z-[B-Z-[C-Z-[D-Z-[E-Z-[...]]]]]]

Both problems make 'charClassSub' problematic in restricted character set derivation. I thank you for your time.
We excluded those regular expressions that contain either wildcard (".") or
negative character groups, since those expressions tend to result in
large set of characters. Even those rare cases that are not the case
often have better alternative ways to specify the same effect, such as
[^&#x30;-&#x10FFFF;] can be expressed otherwise simply as [&#x00;-&#x02F;].

On the other hand, character class subtraction is retained because,
unlike wildcard or negative character groups, the operation always
results in a number of characters smaller than that of the 1st operand.
We also found that character class subtraction does not add much
computational burden if it is properly implemented.

Please also note that it is our expectation that schema authors can
provide some help by being aware of the general cost of each operation
and specifying patterns in ways more friendly to EXI processing.
tocheck
LC-2172 FABLET Youenn <Youenn.Fablet@crf.canon.fr> (archived comment)
1) Some facets are supported like minInclusive or maxExclusive.
What about the support of the length, minLength and maxLength facets which could be useful to better encode string or list sizes.
It should not be too difficult to support them based on current facet support.
Is there a rationale to not include these facets?
The EXI encoding of simple type data, and the use and disuse of certain facets
has been decided on the basis of both empirical observation of merits and its
implications.

We had looked at facets related to length to see whether EXI should be aware of
those facets to improve the simple type value encodings. The result of implementation
experiment shared by WG members indicated that the effects of leveraging those
facets were not substantive enough to make it convincing to include the function
into the format specification, given that the addition of which implies for EXI
processors to check the presence of those facets for every occurrence
of schema-bound strings before determining the method of representing the length
field. In this particular incident, the concerns about the potential processing
efficiency penalty outweighted the benefit observed. Encoding rules associated
with strings, such as string tables, are so defined as to be adequately simple
because it is a known hot spot in the processor execution, and any subtle overhead
can accue into a noticeable cost in performance.

Also, please note that at least for repeated strings, string tables kick in after the
first occurrence of that string value. Therefore, the effect would be limited to the
first occurrence of a particular string value only. We presume that this is one of
the reasons why we did not see the level of benefits we originally expected as
you do, out of the length-related facets.
tocheck
LC-2173 FABLET Youenn <Youenn.Fablet@crf.canon.fr> (archived comment)
2) Guidelines for schema modeling
Is there any guideline regarding the relationship between EXI and schema modeling?
Guidelines would be useful to understand the impact of some schema modeling decisions on EXI encoding/decoding in terms of efficiency and compression.
For instance, it seems that the more global constructs (elements, types, attributes), the bigger will be the generated grammars since all global schema constructs need to be kept (right?),
having a lot of xs:all or maxOccurs="999" may also hurt efficiency.
See also question 3)
It seems that there are two aspects to your question. One pertains to design issues involved in creating XML Schemas for optimizing EXI encoding compactness. The other relates to the internal representation that an EXI implementation might use to perform schema-based encoding.

Concerning designing a schema to get more compact EXI encodings, the general rule would be, "More precise content constraints and more deterministic structure lead to a more compact EXI encodings." Examples of applying this general rule are:

-Carefully think about minOccurs and maxOccurs of particles, and requirement (vs. optionality) of attributes. The more you exploit those properties, the more compactness you can achieve.

-Use substitutionGroup, instead of depending on xsi:type mechanism, whenever possible.

-Use wildcard only when absolutely necessary. Use concrete elements and attributes if possible.

The above are simple guidelines. However, some questions related to the general rule are not so simple. An example of such a consideration is: Where is the right location for a required element in a model group? For example, in the following schema fragment, element "D" comes first. But placing "D" after "C" (as shown in the second schema fragment, below) would increase determinism. The reason for this relates to the number of possible alternatives for elements which may follow "B". So, in the first example, the elements "C", "W", "X", "Y" and "Z" may each appear after "B". However, in the second example, only element "D" may follow "B". The second example is more deterministic and should result in a more compact encoding.

<sequence>
<element name="D" />
<element name="B" minOccurs="0" />
<element name="C" minOccurs="0" />
<choice>
<element name="W" />
<element name="X" />
<element name="Y" />
<element name="Z" />
</choice>
</sequence>

<sequence>
<element name="B" minOccurs="0" />
<element name="C" minOccurs="0" />
<element name="D" />
<choice>
<element name="W" />
<element name="X" />
<element name="Y" />
<element name="Z" />
</choice>
</sequence>


Regarding the grammar size relative to schema size (which is one of the issues we believe you raise), this is the nature of schema-processing. EXI is no different from schema-validation of XML parsers using XML schemas. In principle, grammar size is proportional to the number of definitions and constructs used in the schema. This should be a consideration when generated schema-informed EXI encodings. The question of how much schema information should be stored and applied is important, because in some cases there are two conflicting goals. The first is the desire to limit grammar footprint and the second is the desire to increase determinism of the grammar.

So this is a practical issue that each use case has to cope with, rather than an inherent issue with EXI. We have not, as a group, discussed this, nor was it in our plans to develop these kinds of implementation guidelines. While such instances of schemas may cause footprint issues for a straight-forward implementation of the specification, there have been implementation reports from within the WG that there are implementation techniques that resolve such footprint issues.

We may look into schema modeling guidelines, but only after other charter-driven tasks are completed, and if time allows. It seems that guidelines addressing issues in implementing an XML schema validation parser would be pertinent to your questions (especially with respect to your question about grammar size). Considering this, the XML Schema group may have better insight than the EXI group can provide at this time.
tocheck
LC-2227 Gengo Suzuki <suzuki.gengo@lab.ntt.co.jp> (archived comment)
Hello,

I have an question about Document Grammars.

In 8.5.1 Schema-Informed Document Grammar, there is 'SE(*)' event
which is evaluated by 'Built-in' Element Grammar.

But this rule (perhaps) can be applied under strict mode.

I think the principle of strict mode is that if XML instance has
an element which isn't defined in XML schema, encoder should stop
with some error.

I feel lacking of consistency between Schema-Informed Document
Grammar specification and strict mode.

How do you think about it?
Or were there any arguments?

Regards,

//---------------------------------------------------------------
NTT Cyber Space Laboratories
Gengo Suzuki <suzuki.gengo@lab.ntt.co.jp>
TEL: +81-46-859-3412 FAX: +81-46-859-2768
----------------------------------------------------------------//
http://lists.w3.org/Archives/Public/public-exi-comments/2009Jul/0002.html tocheck
LC-2165 ISHIZAKI Tooru <ishizaki.tooru@canon.co.jp> (archived comment)
Dear EXI members,

I have a feedback of EXI specification.
In chapter 4, what's the advantage of local-element-ns flag?

Best Regrads,
Tooru Ishizaki.
The advantage of the local-element-ns flag is improved compactness for
representing element QNames when Preserve.prefixes is true. Without the
local-element-ns flag, the prefix encoding for each element QName would have
to account for the possibility that the prefix is declared by an NS event
that follows the associated SE event (i.e., a prefix that has not yet been
encountered in the stream). In the common case, where there is only one
prefix declared for each namespace, the prefix encoding would increase from
zero bits per element to one bit per element. If there were two prefixes
declared for each namespace, the prefix encoding would increase from one bit
per element to two bits per element. Since there can be a lot of SE events
in a document, this can have a significant impact on compactness.
tocheck
LC-2167 ISHIZAKI Tooru <ishizaki.tooru@canon.co.jp> (archived comment)
Dear EXI members,

I have a feedback of EXI specification.
In chapter 9(EXI Compression), I would like to ask you to write
the example of the decoding algorithm.

Best Regards,
Tooru Ishizaki
http://lists.w3.org/Archives/Public/public-exi-comments/2009Mar/0001.html tocheck
LC-2192 Jochen Darley <joda@upb.de> (archived comment)
Hallo EXI WG,

I'll just start with an example scenario:

Let's assume www.markmail.org wants use an XML compression for their
services. Markmail offers personalized feeds of news and mailing lists
to it's customers (as RSS/Atom feed). The goal is to allow customers to
receive their personalized feed as a compressed XML stream.

If markmail implemented the compressed streams by compressing each
personalized stream by itself then they need a lot resources. My
assumption is that they will have to use a separate EXI compressor for
each (of the thousand) compressed customer streams.

The solution would be to pre-compress the feed's entries and just copy
them into the customized streams. Markmail can't use a single continuous
EXI stream because:

1) EXI has a global string table which can't be reset per block
2) EXI enforces a fixed blocksize "n" except for the last block

My solution would be to pre-compress multiple XML fragments and then
copy compressed fragments into the customers personalized stream.

My questions:

1) How will EXI support such a compressed, streaming
scenario?

2) Should EXI support this scenario ?

3) What are the design intentions for the fixed blocksize?

4) Is it acceptable to remove the fixed blocksize?

5) Can a mode be added to EXI which resets the string table
for each block?

6) What are design choices/constraints which require a global
string table or the fixed blocksize?

Regards,
Jochen Darley
The blocksize of EXI compression is constant for a single EXI stream so that
an EXI decoder get an approximate idea of how much memory is necessary for
buffering events before it starts decompression. One of the inclinations of
EXI design was enabling those devices with limited hardware resources, often
with very limited network resources to successfully tap into XML family of
technologies.

For the scenario you described, the best solution may be to feed a flow of
separate EXI documents or EXI fragments, each of which has been compressed
independently.

The other thing that may be relevant and worth mentioning is that EXI
compression is designed to cost much less CPU cycle than gzip does, yet
consistently with better results.

This trait is known to enable certain scenarios to apply EXI compression for
their output feeds, wherein application of gzip has been perceived prohibitive
given the exhaustion of server CPU resources due to the high processing cost
entailed in compression.
tocheck
LC-2186 SHIMIZU Wataru <shimizu.wataru@canon.co.jp> (archived comment)
According to the specification, if I use a user-defined datatype
representation, I have to specify a datatype representation map. How do
I use my original encoding only for the specific element or attribute?

For example, I want to use my original float encoding only for b
elements.

<a>
<b>1.2</b> <!-- xsd:float -->
<c>3.4</c> <!-- xsd:float -->
</a>

If I define a datatype representation map as follows, c element will
aslo be encoded with myfloat encoding.

<datatypeRepresentationMap xmlns:myenc="http://example.org/myenc">
<xsd:float/>
<myenc:myfloat/>
</datatypeRepresentationMap/>
In order for only the content of element "b" but not "c" to be encoded
as your original representation, please define a new type derived from
xsd:float and use that type for element "b" so that the type can be
captured in the datatype representation map such as follows.

<datatypeRepresentationMap xmlns:mytypes="http://example.org/mytypes" xmlns:myenc="http://example.org/myenc">
<mytypes:myfloat/>
<myenc:myfloat/>
</datatypeRepresentationMap/>

where the type mytypes:myfloat would be defined such as follows.

<xsd:simpleType name="myfloat">
<xsd:restriction base="xsd:float"/>
</xsd:simpleType>

<xsd:element name="b" type="mytypes:myfloat"/>


The first items in datatype representation map entry pairs identify types.
Types are in general considered more inherent to data than elements are.
For example, each atomic data value in a list has its immediate datatype,
whereas the containing element only indicates the list datatype. Thus, types
allow for finer control over the use of custom datatype representation. The
use of types in datatype representation map also has the beauty of insulating
element names from the nuisances of datatype representations. For example,
you can define two local elements of the same names, yet can allow them to
be encoded using different datatype representations by associating them with
different datatypes. This all comes with the slight burden of the need for
defining approproate types in the schema. It should be worth noting that when
you have a good reason for using different representations for two elements,
in general, they carry somewhat different semantics or expectation at some
level, which should justify the effort and the need to define a new type for
use in the datatype representation map for differentiating them.
tocheck
LC-2188 SHIMIZU Wataru <shimizu.wataru@canon.co.jp> (archived comment)
EXI specification has a lot of features that can reduce document size.
However it seems too complex for small embedded devices and I will have
to implement partial implementation. Of course it will lose
interoperability. I think additional conformance level or tiny profile
is useful for small devices and interoperability. Is there plan to
define like it?
The importance of a small footprint has been kept in mind while
developing the EXI format. The working group has considered the
number and complexity of mandatory features (affecting code size) and
the initial data that must be available to support the format
(affecting initialized data segment size). Additionally, some of our
members have successful EXI implementation experience in
resource-constrained environments.

We concluded, however, that we should not define different levels of
conformance to accommodate subsets of EXI capability. Defining
the right profiles depends very much upon the use case(s) in mind.
The decision as to what capabilities/features could be omitted
and which should be retained was best left to the user/implementer.
We have tried, though, to make sure that terminology defined in
the specification is rigorous enough to discuss EXI features. This
should help in discussing what EXI functionality is critical in
a given environment and which are not.

But we still maintain that a conformant EXI processor must implement the EXI
specification in its entirety. We do understand, however, that there may
be partial implementations of the EXI specification in use. We want to strongly
caution that any restricted profiles for EXI functionality be used sparingly,
and in closed environments in which all participants are aware of the supported
and unsupported EXI capabilities. It is also critically important that any
profiles be compatible with an implementation conformant to the EXI spec.
In other words, a standard EXI processor should be able to handle any encoded
document that a given EXI profile implementation generates.
tocheck
LC-2189 SHIMIZU Wataru <shimizu.wataru@canon.co.jp> (archived comment)
EXI documents do not include data type identifier of each value. Thus
all data types other than string can be used only in schema-informed
documents. Is it impossible to encode attribute values as integer
without schema? Fast Infoset document has data type identifier of each
values and I think it's a good approach.
EXI allows one to provide the same typing information as XML. In XML
documents you can provide hints about the type by using the attribute
xsi:type. EXI makes use of this information in the same fashion.

EXI being type-aware with attributes and any other type other than the
built-in types provided by XML Schema requires external information
such as an XML schema document.
For a more detailed explanation please take a look at our response to
a related question (see also LC-2175).
tocheck
LC-2185 TAMIYA Keisuke <tamiya.keisuke@canon.co.jp> (archived comment)
EXI dose not support the XML declaration - character encoding scheme,
standalone, version. (ref. B.1).
But why does not it support the XML declaration?
I think "character encoding scheme" is not necessary, but I cannot
understand why the "standalone", "version" is not suppoerted.
The version of XML that occurs in the XML declaration is for indicating the
slightly different syntax rules implied by each XML version (i.e. XML 1.0
vs XML 1.1 as of this writing).

EXI format is a representation of XML Information Set [1]. We are aware that
the Document Information Item [2] in Infoset provides a "version" property
that corresponds to the XML version. However, the value of that property does
not imply different semantics that need to be captured at the Infoset level.
These are the reasons that explain why EXI is, as well as should, be agnostic
about the version of XML.

In some anticipated scenarios of EXI use, application programs are concerned
only of the infoset, with no involvement of serialization in XML at any point
of the processing and communication chains. In such applications, "version"
property of Document Information Item would not provide any benefits.

Also, in applications where serialization of infoset in XML is involved in
conjunction with EXI along the way of computing chains, the preservation of
the original XML version is rarely concerned. This is because the programs
that consume the data are again are, more often than not, only concerned of
the infoset, not particularly of the subtle discrepancy of the XML 1.x syntax.
The recent publication of XML 1.0 5th edition [3] in a sense has made this
argument more indisputable, given that the one single most outstanding
discrepancy that was present between the XML 1.0 and 1.1, the repertoire of
characters, is now essentially dissolved.

Yet, we understand that there are use cases where the use of a particular
version of XML is required when serializing infoset into XML. On such
occasions, it is the program that subsequently consumes the serialized XML
that calls for a particular XML version. We consider XML version as the
artifact of XML serialization, and therefore is the function of XML
serializer implementations, instead of being something that has to be
inherited from the source XML if any that was fed into the computing chain
as an input.

As described above, we do not foresee critical issues to be caused by
not providing the placeholder field in EXI format for carrying text XML version
numbers. On the other hand, there could be substantive cost if EXI supports
XML version numbers in the grammar system, because doing so would
cause every instance of EXI streams to grow slightly in size even when the
XML version value is absent. One of the major uses of EXI, that is, frequent
exchange of tiny documents could suffer from this, because it is typical that
such tiny documents are designed very carefully to pinch on bits to maximize
efficiency. Considering those balances, we decided to forgo the "version"
property of Document Information Item of Infoset.

[1] http://www.w3.org/TR/xml-infoset/
[2] http://www.w3.org/TR/xml-infoset/#infoitem.document
[3] http://www.w3.org/TR/2008/REC-xml-20081126/
tocheck
LC-2194 TAMIYA Keisuke <tamiya.keisuke@canon.co.jp> (archived comment)
Dear W3C EXI WG members,

I have a question about this draft specification.

EXI dose not support the XML declaration - character encoding scheme,
standalone, version. (ref. B.1).
But why does not it support the XML declaration?
I think "character encoding scheme" is not necessary, but I cannot
understand why the "standalone", "version" is not suppoerted.

Regards,
Keisuke Tamiya (tamiya.keisuke@canon.co.jp)
http://lists.w3.org/Archives/Public/public-exi-comments/2009Oct/0004.html tocheck
LC-2193 Youenn Fablet <Youenn.Fablet@crf.canon.fr> (archived comment)
5) ..............

Additionaly, while EXI provides great flexibility in the amount of schema put in grammars,
the schemaID mechanism seems very minimal.
It seems that interoperable uses of schema-informed EXI will greatly restrain the use of this flexibility.
Is there some additional work in that area that could or will be further conducted?
The spec is intentionally made abstracted from the implementation
of schemaID use. It is presupposed that it is up to use cases, applications,
or other specifications that leverage EXI format to define the syntax
and semantics of the schemaID field, which has led to the approach.

For example, there are cases where strings of a couple of characters
length would be used as schemaID, whereas URIs may be suited in some
other use cases. In addition, the schema identified with schemaID may be
described in a schema language other than XML Schema, such as Relax NG,
as long as there has been defined a well-known schema-binding method
for that schema language in use.

The specification also stops short of defining any mechanism to assure
the matching of instances and schemas. Again, it's up to use cases and
applications to define schema identity in connection with their own
schemaID semantics, or even to determine whether such mechanism is
required at all. Either meta-data managed out of bound, or [user defined]
header options field could be used for assuring the level of schema identity
that each use case requires for addressing integrity issues such as
false positive incidents.
yes
LC-2130 Yuri Delendik <yury_exi@yahoo.com> (archived comment)
Hello,

From 7.1.10.1 Restricted Character Sets:
"... If the restricted character set for a datatype contains at least 255 characters or contains non-BMP characters, the character set of the datatype is not restricted and can be omitted from further consideration..."

Appendix E Deriving Character Sets from XML Schema Regular Expressions explains how to build character sets. It enumerates character groups that if they are contained in regular expression atom, the charset of the whole expression is defined to be the entire set of XML characters. One of the exceptions is multi-character escape "\d". By XSD definition it is equivalent to category escape "\p{Nd}". But according Unicode 5.0.0's UnicodeData.txt data file this category contains 290 characters (230 BMP and 60 non-BMP).

The exception of "\d" (and "\p{Nd}") is in correct: after all processing the expression "\d" becomes non-suitable for datatype encoding using restricted character set since the set has more than 255 and contains non-BMP characters.

There are a totals from UnicodeData.txt:
Category BMP non-BMP Total chars Excl.in EXI
\p{Cc} 65 0 65
\p{Cf} 33 105 138 ?
\p{Co} 2 4 6 X
\p{Cs} 6 0 6
\p{Ll} 1102 532 1634 X
\p{Lm} 167 0 167
\p{Lo} 6009 1954 7963 X
\p{Lt} 31 0 31
\p{Lu} 836 484 1320 X
\p{Mc} 167 8 175 ?
\p{Me} 10 0 10
\p{Mn} 602 278 880 X
\p{Nd} 230 60 290 ?
\p{Nl} 51 159 210 ?
\p{No} 252 84 336 ?
\p{Pc} 10 0 10
\p{Pd} 18 0 18
\p{Pe} 65 0 65
\p{Pf} 9 0 9
\p{Pi} 11 0 11
\p{Po} 260 18 278 ?
\p{Ps} 66 0 66
\p{Sc} 41 0 41
\p{Sk} 99 0 99
\p{Sm} 904 10 914 X
\p{So} 2350 608 2958 X
\p{Zl} 1 0 1
\p{Zp} 1 0 1
\p{Zs} 18 0 18
Regards,
Yuri Delendik
We will mention in the spec that only BMP characters indicated by
each category are included in the set of characters for use in restricted
character set computation. This should make '\d' still relevant, because
the category "Nd" contains only 230 BMP characters.

Shown down below is the number of characters (both total and BMP) contained
in each category, derived from version 5.0.0 of Unicode. Based on this,
category names that cause to stop the computation should now consist of
the followings.

'L'[ulo]?, 'M'[n]?, 'N', 'P'[o]?, 'S'[mo]? or 'C'[o]? .

Thanks!


65 characters in Cc (65 BMP chars)
138 characters in Cf (33 BMP chars)
137468 characters in Co (6400 BMP chars) x
2048 characters in Cs (2048 BMP chars) x
1634 characters in Ll (1102 BMP chars) x
167 characters in Lm (167 BMP chars)
89344 characters in Lo (44681 BMP chars) x
31 characters in Lt (31 BMP chars)
1320 characters in Lu (836 BMP chars) x
175 characters in Mc (167 BMP chars)
10 characters in Me (10 BMP chars)
880 characters in Mn (602 BMP chars) x
290 characters in Nd (230 BMP chars)
210 characters in Nl (51 BMP chars)
336 characters in No (252 BMP chars)
10 characters in Pc (10 BMP chars)
18 characters in Pd (18 BMP chars)
65 characters in Pe (65 BMP chars)
9 characters in Pf (9 BMP chars)
11 characters in Pi (11 BMP chars)
278 characters in Po (260 BMP chars) x
66 characters in Ps (66 BMP chars)
41 characters in Sc (41 BMP chars)
99 characters in Sk (99 BMP chars)
914 characters in Sm (904 BMP chars)x
2958 characters in So (2350 BMP chars) x
1 characters in Zl (1 BMP chars)
1 characters in Zp (1 BMP chars)
18 characters in Zs (18 BMP chars)
tocheck
LC-2177 FABLET Youenn <Youenn.Fablet@crf.canon.fr> (archived comment)
6) Is it conformant to not follow the attribute order in the case of a schema-informed grammar encoded element in deviation mode?
As stated in section 6, it seems not conformant.
In some cases, grammars can support attributes in no particular order, such as the example below (correct me if I got something wrong).
<xs:complexType name="test">
<xs:attribute name="name" type="xs:string"/>
<xs:anyAttribute namespace="#any"/>
</xs:complexType>
<xs:element name="test" type="test"/>

While the benefit of ordering the attributes at the grammar level and the general compression benefit for encoders to follow the given order are obvious, I do not see compelling reasons of including this constraint in the format itself.
At the encoder side, the encoder may decide to order attributes or not.
If encoding fails due to bad ordering (in strict mode) or if the compression ratio is bad, the encoder can always decide to order the attributes.
At the decoder side, the decoder is only following the grammars so it does not really care about the ordering.
There is even a drawback as this is one (major ?) difference between schema-informed and schema-less processing.
Am I missing something obvious?
You are not missing anything. There is indeed no reason, when seen
purely from the interoperability point of view, to require a specific
ordering of the attributes when strict is false. When the encoder does
not order the attributes, the out-of-place attributes will just get
encoded as deviations, which is understandable by the decoder, as you
say.

We will change the specification so that it no longer requires the
attributes to be sorted in either kind of stream, excepting naturally
the case when strict is true. Also, the xsi:type and xsi:nil
attributes still have to come first, since their presence affects the
grammar used for the rest of the element content.

The specification will still strongly recommend the attributes to be
ordered for any element that is encoded with a schema-informed
grammar, as not ordering them will hurt compactness. In particular,
when there are multiple mandatory attributes, not ordering them may
cause also the content of that element, not just the attribute list,
to be encoded as deviations.
tocheck
LC-2179 FABLET Youenn <Youenn.Fablet@crf.canon.fr> (archived comment)
9)

Section 8.5.4.4.1:

When adding production:

AT (qname) [schema-invalid value] Element?,?

to Elementi,j

Which next Symbol should be used?

Spec says Elementi,j

It would be more logical to use the symbol from the production:

AT (qname) [schema-valid value] Elementi,k
You are quite right. This is a good catch and will be fixed in the next version of the specification. The following specification revision has been proposed to address this issue and is being reviewed by the working group.

----------------------------------
For each non-terminal Element i, j , such that 0 ≤ j ≤ content , with zero or more productions of the following form:

Element i, j :
AT (qname 0 ) NonTerminal 0
AT (qname 1 ) NonTerminal 1
â‹®
AT (qname x-1 ) NonTerminal x-1

where x represents the number of attributes declared in the schema for this context, add the following productions:

Syntax Event Code

Element i, j :
AT (*) Element i, j n.m
AT (qname 0 ) [schema-invalid value] NonTerminal 0 n.(m+1).0
AT (qname 1 ) [schema-invalid value] NonTerminal 1 n.(m+1).1
â‹® â‹®
AT (qname x-1 ) [schema-invalid value] NonTerminal x-1 n.(m+1).(x-1)
AT (*) [schema-invalid value] Element i, j n.(m+1).(x)


where n.m represents the next available event code with length 2.
tocheck
LC-2181 FABLET Youenn <Youenn.Fablet@crf.canon.fr> (archived comment)
11)

Section 8.4.3

In Schema-less mode, EE productions should be promoted to event code 0 when used (if no EE production with an event code length of 1 already exist).
This is a good point and you are absolutely right. The next version of the
specification will include this semantic.
tocheck
LC-2182 FABLET Youenn <Youenn.Fablet@crf.canon.fr> (archived comment)
12)

Section 8.4.3

In Schema-less mode, when using the SE(*) production, should the creation of the SE(qname) production be done before the evaluation of the element content?



In most case, this has no impact. In case of recursive elements, this leads to better compaction.

Moreover, in case or recursive elements, the current specification seems to imply creating several SE(qname) productions.
Yes, you make a very good point. Thank you for catching this. The next
version of the specification will be updated to add the SE(qname) production
before evaluating the element content.
tocheck
LC-2184 FABLET Youenn <Youenn.Fablet@crf.canon.fr> (archived comment)
14)
Section 7.3.3
Empty strings can occur as attribute values.
Section 7.3.3 suggests that these empty strings are to be added in indexing tables.
The current litteral EXI encoding being compact enough, it is reasonnable not to add them in the table.
Yes, this is another very good point. Thanks again for your very thorough
review of the specification. The next version of the specification will be
updated to avoid adding the empty string to the string tables.
tocheck
LC-2105 Mohamed ZERGAOUI <innovimax@gmail.com> (archived comment)
In 7.1.8 Date-Time

You choose to encode Date-Time, the following way

Year Offset from 2000
Integer ( 7.1.5 Integer)
MonthDay Month * 32 + Day 9-bit
Unsigned Integer (7.1.9 n-bit Unsigned Integer) where day is a value
in the range 1-31 and month is a value in the range 1-12.
Time ((Hour * 60) + Minutes) * 60 + seconds 17-bit Unsigned
Integer (7.1.9 n-bit Unsigned Integer)
FractionalSecs Fractional seconds Unsigned Integer ( 7.1.6 Unsigned
Integer) representing the fractional part of the seconds with digits
in reverse order to preserve leading zeros
TimeZone TZHours * 60 + TZMinutes 11-bit Unsigned
Integer (7.1.9 n-bit
Unsigned Integer) representing a signed integer offset by 840 ( = 14 *
60 )
presence Boolean presence indicator Boolean (7.1.2 Boolean)




Since you have aligned MonthDay on 5 bits with <<Month * 2^5 + Day>>
I propose to align the Hour/Minute/Seconds on 6 bits

with

Time : (Hour * 2^6 + Minutes) * 2^6 + seconds so as to replace
multiplication by 60 with SHL
and
TimeZone : TZHours * 2^6 + TZMinutes
We found your points valid, however, at the same time we believe that
the suggested change would achieve only negligible performance
gain in the context of the whole EXI processing.

Since we expect that it would bring no noticeable performance
difference from the end user's point of view thus no compelling
benefit everyone can harness from, at this point we found ourselves
reluctant to make the requested change for now, given its impact on
generated test data and implementations. Having said that,
we would like to leave the issue still open, and may consider including
the change in the future if we find a good chance to do it.

On Jan. 7th the WG resolved to make the change.
tocheck
LC-2191 TAMIYA Keisuke <tamiya.keisuke@canon.co.jp> (archived comment)
In the future, the version number will be over 1.
When an EXI parser read the EXI data having a version number other than
1, how should it process the EXI data ?
- Stop or continue parsing?
- Pass error or warning event to the application?
- Not define in the specification?
Text relating to this case when considering potential future final EXI
versions was indeed missing from the format specification. Currently,
the specification requires an EXI processor to process any final
version that it understands. For preview versions, the specification
already says that the behavior is implementation-dependent.

As the working group cannot know what potential final EXI versions
will look like in the future, we have decided not to constrain the
processors in any manner in this case. We will therefore add text to
the specification stating that the behavior on seeing a version number
unknown to the processor (either preview or final) depends on the
implementation, and possible behaviors include rejecting the stream.
yes
LC-2132 Yuri Delendik <yury_exi@yahoo.com> (archived comment)
Hello,

Complex Ur-type Grammar (8.5.4.1.3.3) and Schema-informed Element Fragment Grammar (Section 8.5.3) section define element grammars, but do not define grammars for corresponding TypeEmpty_i. Section 8.5.4.4 Undeclared Element uses the empty grammars in explanation how the xsi:nil form is evaluated.

It is unclear how the xsl:nil form is evaluated when TypeEmpty_i grammar is not present.

Thanks
You are right. That's an omission in the specification which will be
fixed with the next update.

e.g. TypeEmpty for ur-Type is going to look like

Ur-Type_0 :
AT(*) Ur-Type_0
EE
tocheck
LC-2174 FABLET Youenn <Youenn.Fablet@crf.canon.fr> (archived comment)
3) DataTypeRepresentationType question
I would like a confirmation of the current DataTypeRepresentationType behaviour.
Let's have a schema with the following attribute definition:
<xs:attribute name="test" type="xs:string"/>
In that case, the only way to change the encoding for @test1 values with the DataTypRepresentationType feature
is to redefine xs:string which may have great impact.
If we only want to change the @test values with the DataTypRepresentationType feature, we would need to
change the schema as follow:
<xs:simpleType name="mystring">
<xs:restriction base="xs:string"/>
</xs:simpleType>
<xs:attribute name="test" type="mystring"/>
DataTypeRepresentationType could then be used to redefine mystring.
Is it correct?
If so, the interoperability will generally be lost, since interoperable DataTypeRepresentationType use is currently limited to XML Schema part 2 predefined types redefinition (end of section 7.4).
What about extending that behaviour to all simple types that have been gathered by consuming the schema in use?
Is there any rationale behind that specific constraint?
Any named simple types can be used as the names of the first elements in
datatype representation maps, as long as they are defined in the schema.
So, "mystring" type defined in your example schema snippet can be used in
datatype representation maps, in no less interoperable manner than built-in
xsd datatypes can be used.

The note given at the end of section 7.4 is about the mechanism of sharing
user-defined datatype representation, not about the types. Schemas are
supposed to be shared among the parties before exchanging documents, so
the types in the schemas are shared knowledge at that point as xsd built-in
types also are. The note only warns that extra caution should be paid before
sending documents that use user-defined datatype representation to make sure
the recipient knows how to decode the value that was encoded using that
custom representation. We plan to improve some language there to make
it clear that it is about user-defined datatype representations.
tocheck
LC-2176 FABLET Youenn <Youenn.Fablet@crf.canon.fr> (archived comment)
5) EXI schema-less/schema-informed modes
Based on internal discussions and internal feedback, there is a general assumption that the EXI specification somehow defines two separate modes (schema-less and schema-informed).
While this is clearly stated in the specification that both modes easily coexist in a single EXI stream,
additional advertisement (maybe in the primer) of that feature may be good for adoption.
The latest published primer (dec 2007) could maybe be improved with that respect.
You are right in assuming that schema-informed and built-in grammars
may coexist in the same EXI stream. The first published EXI primer
document [1] uses an incorrect terminology in that regard.
A revised version will be available soon and integrate your comments,
beside other improvements and spec consistency issues.
tocheck
LC-2178 FABLET Youenn <Youenn.Fablet@crf.canon.fr> (archived comment)
7) RDF/XMP use case
This is more a general comment on specific XML/EXI use cases, notably RDF or XMP documents where
no standard, well defined XML schemas are available.
These documents generally have some defined structures and types (RDF schema, XMP schemas…) but no
well defined XML schemas.
What would be the recommendation from the WG to enable good interoperable EXI compression? Stick with schema less encoding? Create a XML schema, publish it and use it?
If you wish to use EXI's schema-informed capabilities in compressing
these documents, the best short-term solution does indeed appear to be
to create an XML Schema for such documents. Longer term, it should be
possible to define another mapping to EXI grammars from whichever
schema language is being used for such documents, and use that in
compression. The existing mapping for XML Schema will undoubtedly
prove useful in such work, by showing how different constructs map to
grammars. Note, though, that the EXI Working Group has no intention of
defining such a mapping for any other schema language than XML Schema.
tocheck
LC-2180 FABLET Youenn <Youenn.Fablet@crf.canon.fr> (archived comment)
10)

Section 9.3

"Value channels that contain no more than 100 values" seems to mean: with *strictly* less than 100 values.

In this paragraph, all comparison should be made clearer using 'greater or equal' and 'strictly greater'.
In this case, the spec. currently describes the desired behavior. In
particular "no more than 100 values" means <= 100 values. To reduce future
confusion, we've updated the specification to use the terminology "at most"
and "more than" to mean <= and > respectively throughout this section.
tocheck
LC-2183 FABLET Youenn <Youenn.Fablet@crf.canon.fr> (archived comment)
13)

Section 8.4.3

xsi:schemaLocation attributes seems to be removed from the infoset before encoding in agile delta streams.
Is it by design or is it implementation related?
Good question. The xsi:schemaLocation and xsi:noNamespaceSchemaLocation
attributes are not permitted in EXI streams when the strict option is set to
true. Based on your comments, we are including some additional text in the
next version of the specification to clarify this. However, this does not
completely explain the issue you are seeing in the EXI reference encodings.
This was due to a problem with the encoder configuration used to generate
the encodings. The correct configuration will be used next time the group
updates the reference encodings. Thank you for reporting this!
tocheck
LC-2198 FABLET Youenn <Youenn.Fablet@crf.canon.fr> (archived comment)
Dear EXI WG,

I would like to have some clarification on two cases regarding SE(* ) grammar selection.

0) A schema with several element definitions for the same QName.
We can have a schema with several local element definitions and at most one global element definition with the same QName.
I assume that we generate as many grammars as needed for the same QName element and that the selection of the right grammar in schema-informed mode is done using scope information. Is that assumption right or is a different approach being used?

1) Wildcard SE(*).
Which grammar should I peak for a SE(*) belonging to a wildcard term?

- If I have a global element definition and one or more local element definition, should I peak the global element grammar?

- If I have only one local element definition, should I peak the local element grammar or peak/create a built-in grammar?
I did not found much description on the wildcard section related to that. Some guidance may be good there.

2) Built in SE(*).
Which grammar should I peak for a SE(*) belonging to a built-in grammar?
If I have a global element definition (plus maybe local element definitions), should I peak/create a built-in grammar or the global element grammar ?
If I have a local element definition, should I peak a built-in grammar or the local element grammar ?
My understanding of the current spec (see the semantics section of 8.4.3) is that a SE(*) belonging to a built-in grammar may only lead to a built-in grammar for its content but my understanding may be too restrictive?
Since we can go from built-in grammar to schema-informed grammar using xsi:type, I would hope that at least when we have a GED grammar, we are able to go from built-in to schema-informed grammar directly through the SE mechanism.

Regards,
Youenn
http://lists.w3.org/Archives/Public/public-exi-comments/2009Jul/0000.html

There is a follow-up comment, which is processed
separately as LC-2248.
yes
LC-2248 FABLET Youenn <Youenn.Fablet@crf.canon.fr> (archived comment)
In the same spirit, it may be good to tighten the wording
concerning the typing of global attributes.

Currently, section 8.5.4.4.1 (strict = false section) states
that:

"when using schemas [...] If a global attribute definition
exists for qname, represent the value of the attribute
according to its datatype"

First, it seems that only section 8.5.4.4.1 is dealing with
this, while this seems quite applicable to strict mode in
the case of attribute wildcards.

Second, the "when using schemas" wording seems vague to me,
at least for that particular sentence.

A quick reading made me thought that this meant "when some
schema information is available to the EXI processor", but
I think it is actually meaning "when using a schema-informed
grammar".

The last interpretation would also lead to the fact that a
schema containing only global attribute definitions would be
useless for typing attribute values.
http://lists.w3.org/Archives/Public/public-exi-comments/2009Jul/0007.html tocheck
LC-2197 Gengo Suzuki <suzuki.gengo@lab.ntt.co.jp> (archived comment)
Hello,

I have a question about Datatype Representation Map

In 7.4 of Working Draft, Example 7-2 shows single type representation
map. Below the example an outline to implement it are described, and
I can image to use it.

Example 7-3 shows (perhaps) representation map for complex type.
There is no detailed explanation about it.

I feel that it is not trivial to implement complex type map, because
in a complex type case event code determination rules are unclear
to treat child types. Mapping influence grammar production.

Is it responsibility for implementor of EXI processor?
Or can datatype representation map be used only in single type
case?

Sincerely,

//---------------------------------------------------------------
NTT Cyber Space Laboratories
Gengo Suzuki <suzuki.gengo@lab.ntt.co.jp>
TEL: +81-46-859-3412 FAX: +81-46-859-2768
----------------------------------------------------------------//
Thank you for providing a feedback that allows the WG to improve
the language of the spec in a way to make it clearer to all the readers.

As you alluded in your comment, the types that are elligible are
only simple types. Therefore, the implementors of user-defined
datatype representations not only do not need to be concerned about
event-codes chore, but also indeed have no access to them.

We acknowledge that some of the sentences in that section can be
improved for clarity, as well as Example 7-3 in which the type
name "geo:geometricSurface" might have indicated that it represented
a complex type thus was misleading. We will make changes to make it
unequivocal both in the language and the example.
tocheck
LC-2107 Rick Jelliffe <rjelliffe@allette.com.au> (archived comment)
I think the terminology used In Appendix E can be improved.

From the title, you would expect this appendix to give a function for
guessing the character set: e.g. ISO 9959-1, or Latin1 or whatever.

It is much better to use the industry jargon from the Unicode Encoding
Character Model: http://unicode.org/reports/tr17/

Using that terminology, the appendix should substitute "character
repertoire" where is currently uses "character set", and "repertoire"
instead of "charset".

Flowing through, 7.1.10.1 Restricted Character Sets should have "coded
character set (CCS) for restricted character repertoire" rather than
"restricted character set".

Cheers
Rick Jelliffe
Regarding the title of appendix E "Deriving Character Sets from XML Schema
Regular Expressions", I agree that the use of "Character Sets" can certainly
be misleading as you pointed out. Since appendix E depends on XSD regex
which in turn depends on Unicode, it might be worthwhile to reuse the same
language that is used by XSD regex to indicate the same. It appears that
XSD uses the term "set of characters" to indicate a collection of characters
with associated UCS code points. Therefore, we intend to change "character
set" in appendix E to "set of characters" to align with XSD regex description,
which makes the title "Deriving a set of characters from an XML Schema
Regular Expression".

On the other hand, we think the use of term "Character Set" in section
7.1.10.1 Restricted Character Sets is accurate. Unlike appendix E
which computes a set of characters with UCS code points, this section
creates a new character set with its own code points.
tocheck
LC-2196 Simon Parker <simon.parker@polarlake.com> (archived comment)
5.4 EXI Options
Last paragraph, definition of valuePartitionCapacity option
Replace "enitiries" with "entries"

_Last paragraph before 6.1
Replace "effect " by "affect"

_Introductory paragraph of 8.5 Schema-informed grammars
Replace "schema-deviated" with "schema-derived"

_8.5.4.1.6 Element Terms
Replace "itself if was" with "itself if it was"

_10 Conformance
Review use of English throughout this section, particularly:
plural and singular forms, and
the phrase "that EXI stream decoders are prepared with"

_B.1 Document Information Item
[unparsed entities] Delete the last occurrence of "to".

_B.11 Namespace Information Item
Replace "ismaps toa NS event" with "maps to a NS event"

_E Deriving Character Sets from XML Schema Regular Expressions
In sentence 2 replace "lexically matches" with "lexically match"

_E Deriving Character Sets from XML Schema Regular Expressions
In the notes on rules [3] and [4] replace "causes to conclude the
charset of the regExp"
with "causes the charset of the regExp"
The EXI WG is grateful for your interest in the EXI documents, your attention to
the language details in particluar, and the time and care to report them back to us.
It will surely make a difference, and such a report is valuable to the continuous
effort of improving the quality of EXI deliverables.

We have worked on the suggested changes in our internal editor's draft copy of
the documents. You will see the changes in the public draft when the documents
are published next time around.

There is one suggestion in your report that might have been a result of
confusion. The wording of "Schema-deviated" in section "8.5 Schema-informed grammars"
was indeed meant to be phrased that way. To make the intent clearer, we plan to
modify the sentence as follows.

"Of particular note is that built-in grammars that are invoked for schema-invalid
occurrences of elements or the elements that matched either SE(*) or SE(uri:*)
but are not declared in the schema are still subject to dynamic grammar learning
during the rest of the EXI stream processing as is described in 8.4.2 Built-in
Fragment Grammar. "
tocheck
LC-2109 Yuri Delendik <yury_exi@yahoo.com> (archived comment)
Hello,

In some instances during elimination of productions with no terminal symbol (8.5.4.2.1) infinite loops can appear in forms:
G_(i,j):
G_(i,j)

Or
G_(i,j):
G_(i,k)
G_(i,k):
G_(i,l)
G_(i,l):
G_(i,k)

Eliminating them using only algorithm is not trivial and produce variations therefore may produce different grammars for same XSD schema on different implementations.

Source of those productions is particle {max occurs} = unbound.

Also, in paragraph when additional copy of Term_0 generated for unbound particle restrictions “k > 0” is missing. When
G_({min occurs}, 0):
EE

is replaced by:
G_({min occurs}, 0):
G_({min occurs}, 0)

Which is circular production with no terminal symbol and I cannot find well-documented way to eliminate it.

Could you illustrate how to convert following schema to EXI normalized grammars?

<xsd:element name="el1">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="el1_1" minOccurs="1" maxOccurs="unbounded">
<xsd:complexType />
</xsd:element>
</xsd:sequence>
<xsd:attribute name="at1" type="xsd:string" >
</xsd:complexType>
</xsd:element>

Thank you.
In 8.5.4.2.1, we will describe that you need to remove a production
'G_i, j -> RHS(G_i,k)_h' when that would generate either a self-loop
or a production that has been previously replaced.

http://lists.w3.org/Archives/Public/public-exi-comments/2008Nov/0019.html
tocheck
LC-2110 Yuri Delendik <yury_exi@yahoo.com> (archived comment)
Hello,
Event code assignment section (8.5.4.3) describes sorting of the events in normalized EXI grammar. It does not show where SE(*), SE(uri:*) and AT(uri:*) events will be in sorting order.
Thanks.
You are absolutely right.

I have added text to section 8.5.4.3 Event Code Assignment to address this
problem. The event order is now specified as:

1. all productions with AT(qname) on the right hand side sorted
lexically by qname localName, then by qname uri, followed by

2. all productions with AT(urix : *) on the right hand side sorted
lexically by uri, followed by

3. any production with AT(*) on the right hand side, followed by

4. all productions with SE(qname) on the right hand side sorted in
schema order, followed by

5. all productions with SE(urix : *) on the right hand side sorted in
schema order, followed by

6. any production with SE(*) on the right hand side, followed by

7. any production with EE on the right hand side, followed by

8. any production with CH on the right hand side.

This change will show up in the next draft of the EXI specification.
tocheck

Developed and maintained by Dominique Hazaël-Massieux (dom@w3.org).
$Id: single.html,v 1.1 2017/08/11 06:44:24 dom Exp $
Please send bug reports and request for enhancements to w3t-sys.org