Public document·View comments·Disposition of Comments·
Efficient Extensible Interchange Working Group Other specs in this tool
In the table below, red is in the WG decision column indicates that the Working Group didn't agree with the comment, green indicates that a it agreed with it, and yellow reflects an in-between situation.
In the "Commentor reply" column, red indicates the commenter objected to the WG resolution, green indicates approval, and yellow means the commenter didn't respond to the request for feedback.
Commentor | Comment | Working Group decision | Commentor reply |
---|---|---|---|
LC-2104
Mohamed ZERGAOUI <innovimax@gmail.com> (archived comment) |
|
One of the advantages of using a base 10 representation is that it avoids rounding issues when moving floating point data between EXI and text XML and between EXI and an application that use the standard XML interfaces. XML, EXI and the standard XML interfaces all use a base 10 representation for floating point numbers, so no rounding issues will occur in these circumstances. You are correct that rounding issues may occur when moving floating point data between EXI and a base 2 representation. These rounding issues will be identical to those that occur moving floating point data between text XML and a base 2 representation, so EXI maintains the same behavior as XML in these cases. As such we avoid introducing any *new* rounding issues. Any work-arounds developed to address rounding issues for text XML will continue to work for EXI. |
no |
LC-2175
FABLET Youenn <Youenn.Fablet@crf.canon.fr> (archived comment) |
|
We feel that EXI should fit smoothly into the XML stack. To avoid any problems EXI supports type information in the same fashion as XML does. The groups intention is not to super-set XML. XML allows one to specify typed-values with the attribute xsi:type. EXI provides the same mechanism and both work on elements only. EXI makes use of XML Schema and its types form the basis for typed values in the entire W3C environment. Introducing new types or another type system is not the scope of our group and might affect XML processing in general. A newly invented typing system would be processed in the context of EXI only. Such type information would be lost when documents are converted from EXI to XML then back to EXI. This may cause an issue, since it may look like XML cannot preserve what EXI is describing. The solution for arbitrary types in EXI (as well as in XML) is using schema information. Please note that partial schema information are sufficient for EXI. In the use-case you provided a ds:SignatureValue element with the according type information for content and attributes achieves the desired result. An EXI processor picks the given type information for element content as well as the attributes. |
tocheck |
LC-2164
ISHIZAKI Tooru <ishizaki.tooru@canon.co.jp> (archived comment) |
|
You are correct in that the four options you present are the only possible ones from the alignment-compression pair. The reason why alignment and compression are separated as options is to achieve better compactness for the EXI Options header. It is expected that EXI compression is used often, so it is placed in the "common" part of the Options header, whereas the non-default alignment options are expected to be used only in very special circumstances and are therefore placed in the "uncommon" part. This reduces the size of the Options header in the usual case when the default alignment is used. But none of this prescribes any particular implementation strategy. An implementation is free to adopt the approach that you suggest, including in its API, as long as it produces and is able to consume correct Options headers. |
tocheck |
LC-2166
ISHIZAKI Tooru <ishizaki.tooru@canon.co.jp> (archived comment) |
|
Value qnames are represented as strings in EXI format. Because prefixes used in an instance and its corresponding schema are declared in each document independently from the other, enumerated values in schema are of little use for types derived from xsd:QName since value qnames in EXI streams are strings. Note, however, that markup qnames which occur as the names of elements and attributes are represented using built-in QName datatype representation [1]. The rationale for the use of strings as the representation of value qnames is primarily so as to avoid the anticipated processing efficiency overhead that is likely to be involved if we used built-in QName datatype representation for value qnames. This concern of processing efficiency overhead is particular to value qnames, and is not foreseen for markup qnames. Therefore we use built-in QName datatype representation for markup qnames. |
yes |
LC-2103
Mohamed ZERGAOUI <innovimax@gmail.com> (archived comment) |
|
Thank you for your suggestion regarding the order of the bits in the 2-bit unsigned Integer representation used to distinguish between lexical variances of the Boolean type. The current representation allows bit-testing of the first bit to provide the value of the Boolean. It is not clear that changing the representation to allow bit-testing of the second bit rather than the first bit would provide a notable benefit. Both representations seem equally capable, so our current plan is to keep the current representation. Please let us know if we've misunderstood some aspect of your suggestion. |
tocheck |
LC-2106
Mohamed ZERGAOUI <innovimax@gmail.com> (archived comment) |
|
When initially discussing the valuePartitionCapacity option, the working group analyzed several different potential methods for treating new strings when the maximum capacity is reached. These methods included Move to front, which you proposed. The group does not believe that Move to front can be sufficiently efficient to serve as the replacement method, based on analysis of the possible algorithms in this context. The string table is used constantly when processing an EXI document, so it needs to be as efficient as possible, and we do not believe that the potential improvement in compactness is worth the amount of decreased processing efficiency that Move to front is estimated to result in. |
tocheck |
LC-2187
TAMIYA Keisuke <tamiya.keisuke@canon.co.jp> (archived comment) |
|
We understand the concern you have raised over the issue grammar learning behaviour may pose on the size of memory EXI processor consumes and requires. While acknowledging it is a potential issue, we had stopped short of adding a mechanism to restrain grammar learning at the time we dicussed the feature a while back, based on implementation report of the feature shared by the WG memebers, which suggested otherwise that the issue did not surface in real deployments. One of the traits that documents for use in exchange in general is that grammars are learned a lot up front at a pace close to linear rate, and the learning becomes less and less frequent, eventually saturating to a convergence. This observation helps explain why the issue is not very likely to emerge in practice. Based on these observation and analysis, as well as the consideration of the cost that would otherwise be incurred if a restraining mechanism was introduced both in terms of compactness and runtime efficiency, we stay cautious and may require further arguments involving specific use cases before we consider a way to alleviate the potential issue. |
yes |
LC-2190
TAMIYA Keisuke <tamiya.keisuke@canon.co.jp> (archived comment) |
|
The EXI format itself offers only a hook to support selfContained subtrees, meaning that an EXI processor may access such an independent fragment without reading what came first. The working group expects many different use cases for this feature and does not restrain its use to a specific mechanism which is likely not be suitable for many environments. It is therefore up to an application or system to make use of this feature and build an index or any other solution on top of it. |
tocheck |
LC-2171
Thomas Hornig <t.hornig@highQ.de> (archived comment) |
|
The group believes that EXI should integrate well with the family of XML technologies, and therefore we would rather not define EXI-specific encryption or digital signature methods but use the existing XML Encryption and Signature specifications, applying them to EXI by using their defined extension points. As you note, the recommended rule is to first compress (or encode in EXI) and then encrypt. Accomplishing this with XML Encryption requires some additional specification work, but very little. We have already talked with the XML Security working group on the best way to accomplish this and believe we understand how to do it in a way that fits into the XML Encryption specification. After this work is done, we will publish it in one of our documents (we have not yet determined which one). |
tocheck |
LC-2168
UCHIDA Hitoshi <uchida.hitoshi@canon.co.jp> (archived comment) |
|
We understand that the IEEE float representation is one of the most frequently requested changes to the EXI specification. We would like to use this opportunity to better explain our position on this issue and share with you the rationale that supports it. As you will find outlined below, the WG had assessed both its advantages and disadvantages in the context of EXI, as well as alternative ways to assuage the concerns. The WG came to believe that the use of IEEE float fits better with the user-defined datatype representation capability [1] of EXI than is integrated as an additional mandatory physical representation of character information items that are typed as xsd:float or xsd:double. There are some aspects of IEEE float that do not lend natureally themselves to the primary role of EXI as an efficient exchange format of XML, particularly when EXI is required to remain intrinsically compatible with existing XML family of standards written to XML infoset, such as XML signatures. The issues the WG looked at in particular are, less compactness and less amenability to compression, as well as rounding issues in serialization (base-2 to base-10) and the cost of stringification required when used with the typical XML APIs (SAX, DOM, etc.) which are undeniably predominantly text-based. However, by saying this, it is not our intention to derogate the value nor it is to neglect its wide-spread use in diverse range of applications. We do not intend to dispute the merit IEEE float brings to some applications, and understand EXI might be seen as the perfect opportunity to see that merit in a way fitted well to XML technologies. Though the EXI spec calls for a deliberate assessment when considering the use of user-defined datatype representation, obviously IEEE is one of such cases that warrant the use of the facility without any question. It was not an easy discussion within the WG that led to the position mentioned above. Some argued EXI is not XML enough, and others contended EXI as not binary enough. We respected both views to find a difficult balance and better serve both needs. In a way, we do not see EXI as static, rather it is a core foundation on which things can be further built to suit use cases, which will make the use of EXI more perfect. The two notable points of innovation provided by EXI are user-defined datatype representation, and the provision that permits the definition of schema binding for other schema languages. We therefore encourage you to seek the best way to leverage user-defined datatype definitions to represent float and double data as IEEE floats. If you find any suggestions that would make it easier to use IEEE float through user-defined datatype representation, we would love to hear them. Such suggestion may well find its way into a collateral documents such as FAQ or Best Practices. [1] http://www.w3.org/TR/exi/#datatypeRepresentationMap |
tocheck |
LC-2169
UCHIDA Hitoshi <uchida.hitoshi@canon.co.jp> (archived comment) |
|
http://lists.w3.org/Archives/Public/public-exi-comments/2009Sep/0003.html | tocheck |
LC-2170
UCHIDA Hitoshi <uchida.hitoshi@canon.co.jp> (archived comment) |
|
EXI already provides one feature that can be used to achieve the functionality you request. Namely, it is possible to encode a series of XML documents as an EXI fragment, which will retain both the string table and any learned grammar content between individual documents of the series. In general, the working group believes that being able to preserve any learned content from one encoded document to the next is primarily applicable to situations where there is already an externally-defined ordering among the documents, such as over a communication channel that guarantees in-order delivery. For such situations, the solution using fragments and relying on the external ordering seems adequate, and thus we have no intention of defining an ordering internal to EXI. |
yes |
LC-2133
Yuri Delendik <yury_exi@yahoo.com> (archived comment) |
|
The XML Schema mapping to EXI grammar terms described in section 8.5.4 [2] is a normative mapping. A conformant EXI decoder must support this specified mapping, as stated in the conformance section [3]. As noted in your comment, it is possible to specify mappings from other schema languages. Requiring all conformant EXI processors to implement the W3C XML Schema mapping feature will facilitate interoperability, and we picked XML Schema because it is a W3C Recommendation. We do not intend to make any judgement or preference over various schema languages. Other mappings might be done by the EXI Working Group (or another group) in the future but it is not planned at this time. For those reasons, as well as since we do not see real semantic benefit by the changes, we feel that this section is worth staying in the main body of the specification. A second part of your comment suggests that we use EXI grammar terms to describe EXI Options. We will make it clear that the XML schema for Options is provided for clarity of the specification, and it is not required for the EXI processors to read an Options schema. Therefore we think that adding an equivalent EXI grammar, which would be more verbose, is not needed. The last part of your issue is about the use of "strict" mode. The Working Group believes that the use of the strict vs. non-strict mode is not equivalent to the use of a different schema. The processor uses the same schema but handles deviations from that schema when in non-strict mode. This enables an use case where sender and receiver agrees to a schema, and still the encoder has a liberty of arbitrarily using "strict" option instance-by-instance basis, without pre-agreed schemaID. |
tocheck |
LC-2108
<pub@upokecenter.com> (archived comment) |
|
We excluded those regular expressions that contain either wildcard (".") or negative character groups, since those expressions tend to result in large set of characters. Even those rare cases that are not the case often have better alternative ways to specify the same effect, such as [^0-] can be expressed otherwise simply as [�-/]. On the other hand, character class subtraction is retained because, unlike wildcard or negative character groups, the operation always results in a number of characters smaller than that of the 1st operand. We also found that character class subtraction does not add much computational burden if it is properly implemented. Please also note that it is our expectation that schema authors can provide some help by being aware of the general cost of each operation and specifying patterns in ways more friendly to EXI processing. |
tocheck |
LC-2172
FABLET Youenn <Youenn.Fablet@crf.canon.fr> (archived comment) |
|
The EXI encoding of simple type data, and the use and disuse of certain facets has been decided on the basis of both empirical observation of merits and its implications. We had looked at facets related to length to see whether EXI should be aware of those facets to improve the simple type value encodings. The result of implementation experiment shared by WG members indicated that the effects of leveraging those facets were not substantive enough to make it convincing to include the function into the format specification, given that the addition of which implies for EXI processors to check the presence of those facets for every occurrence of schema-bound strings before determining the method of representing the length field. In this particular incident, the concerns about the potential processing efficiency penalty outweighted the benefit observed. Encoding rules associated with strings, such as string tables, are so defined as to be adequately simple because it is a known hot spot in the processor execution, and any subtle overhead can accue into a noticeable cost in performance. Also, please note that at least for repeated strings, string tables kick in after the first occurrence of that string value. Therefore, the effect would be limited to the first occurrence of a particular string value only. We presume that this is one of the reasons why we did not see the level of benefits we originally expected as you do, out of the length-related facets. |
tocheck |
LC-2173
FABLET Youenn <Youenn.Fablet@crf.canon.fr> (archived comment) |
|
It seems that there are two aspects to your question. One pertains to design issues involved in creating XML Schemas for optimizing EXI encoding compactness. The other relates to the internal representation that an EXI implementation might use to perform schema-based encoding. Concerning designing a schema to get more compact EXI encodings, the general rule would be, "More precise content constraints and more deterministic structure lead to a more compact EXI encodings." Examples of applying this general rule are: -Carefully think about minOccurs and maxOccurs of particles, and requirement (vs. optionality) of attributes. The more you exploit those properties, the more compactness you can achieve. -Use substitutionGroup, instead of depending on xsi:type mechanism, whenever possible. -Use wildcard only when absolutely necessary. Use concrete elements and attributes if possible. The above are simple guidelines. However, some questions related to the general rule are not so simple. An example of such a consideration is: Where is the right location for a required element in a model group? For example, in the following schema fragment, element "D" comes first. But placing "D" after "C" (as shown in the second schema fragment, below) would increase determinism. The reason for this relates to the number of possible alternatives for elements which may follow "B". So, in the first example, the elements "C", "W", "X", "Y" and "Z" may each appear after "B". However, in the second example, only element "D" may follow "B". The second example is more deterministic and should result in a more compact encoding. <sequence> <element name="D" /> <element name="B" minOccurs="0" /> <element name="C" minOccurs="0" /> <choice> <element name="W" /> <element name="X" /> <element name="Y" /> <element name="Z" /> </choice> </sequence> <sequence> <element name="B" minOccurs="0" /> <element name="C" minOccurs="0" /> <element name="D" /> <choice> <element name="W" /> <element name="X" /> <element name="Y" /> <element name="Z" /> </choice> </sequence> Regarding the grammar size relative to schema size (which is one of the issues we believe you raise), this is the nature of schema-processing. EXI is no different from schema-validation of XML parsers using XML schemas. In principle, grammar size is proportional to the number of definitions and constructs used in the schema. This should be a consideration when generated schema-informed EXI encodings. The question of how much schema information should be stored and applied is important, because in some cases there are two conflicting goals. The first is the desire to limit grammar footprint and the second is the desire to increase determinism of the grammar. So this is a practical issue that each use case has to cope with, rather than an inherent issue with EXI. We have not, as a group, discussed this, nor was it in our plans to develop these kinds of implementation guidelines. While such instances of schemas may cause footprint issues for a straight-forward implementation of the specification, there have been implementation reports from within the WG that there are implementation techniques that resolve such footprint issues. We may look into schema modeling guidelines, but only after other charter-driven tasks are completed, and if time allows. It seems that guidelines addressing issues in implementing an XML schema validation parser would be pertinent to your questions (especially with respect to your question about grammar size). Considering this, the XML Schema group may have better insight than the EXI group can provide at this time. |
tocheck |
LC-2227
Gengo Suzuki <suzuki.gengo@lab.ntt.co.jp> (archived comment) |
|
http://lists.w3.org/Archives/Public/public-exi-comments/2009Jul/0002.html | tocheck |
LC-2165
ISHIZAKI Tooru <ishizaki.tooru@canon.co.jp> (archived comment) |
|
The advantage of the local-element-ns flag is improved compactness for representing element QNames when Preserve.prefixes is true. Without the local-element-ns flag, the prefix encoding for each element QName would have to account for the possibility that the prefix is declared by an NS event that follows the associated SE event (i.e., a prefix that has not yet been encountered in the stream). In the common case, where there is only one prefix declared for each namespace, the prefix encoding would increase from zero bits per element to one bit per element. If there were two prefixes declared for each namespace, the prefix encoding would increase from one bit per element to two bits per element. Since there can be a lot of SE events in a document, this can have a significant impact on compactness. |
tocheck |
LC-2167
ISHIZAKI Tooru <ishizaki.tooru@canon.co.jp> (archived comment) |
|
http://lists.w3.org/Archives/Public/public-exi-comments/2009Mar/0001.html | tocheck |
LC-2192
Jochen Darley <joda@upb.de> (archived comment) |
|
The blocksize of EXI compression is constant for a single EXI stream so that an EXI decoder get an approximate idea of how much memory is necessary for buffering events before it starts decompression. One of the inclinations of EXI design was enabling those devices with limited hardware resources, often with very limited network resources to successfully tap into XML family of technologies. For the scenario you described, the best solution may be to feed a flow of separate EXI documents or EXI fragments, each of which has been compressed independently. The other thing that may be relevant and worth mentioning is that EXI compression is designed to cost much less CPU cycle than gzip does, yet consistently with better results. This trait is known to enable certain scenarios to apply EXI compression for their output feeds, wherein application of gzip has been perceived prohibitive given the exhaustion of server CPU resources due to the high processing cost entailed in compression. |
tocheck |
LC-2186
SHIMIZU Wataru <shimizu.wataru@canon.co.jp> (archived comment) |
|
In order for only the content of element "b" but not "c" to be encoded as your original representation, please define a new type derived from xsd:float and use that type for element "b" so that the type can be captured in the datatype representation map such as follows. <datatypeRepresentationMap xmlns:mytypes="http://example.org/mytypes" xmlns:myenc="http://example.org/myenc"> <mytypes:myfloat/> <myenc:myfloat/> </datatypeRepresentationMap/> where the type mytypes:myfloat would be defined such as follows. <xsd:simpleType name="myfloat"> <xsd:restriction base="xsd:float"/> </xsd:simpleType> <xsd:element name="b" type="mytypes:myfloat"/> The first items in datatype representation map entry pairs identify types. Types are in general considered more inherent to data than elements are. For example, each atomic data value in a list has its immediate datatype, whereas the containing element only indicates the list datatype. Thus, types allow for finer control over the use of custom datatype representation. The use of types in datatype representation map also has the beauty of insulating element names from the nuisances of datatype representations. For example, you can define two local elements of the same names, yet can allow them to be encoded using different datatype representations by associating them with different datatypes. This all comes with the slight burden of the need for defining approproate types in the schema. It should be worth noting that when you have a good reason for using different representations for two elements, in general, they carry somewhat different semantics or expectation at some level, which should justify the effort and the need to define a new type for use in the datatype representation map for differentiating them. |
tocheck |
LC-2188
SHIMIZU Wataru <shimizu.wataru@canon.co.jp> (archived comment) |
|
The importance of a small footprint has been kept in mind while developing the EXI format. The working group has considered the number and complexity of mandatory features (affecting code size) and the initial data that must be available to support the format (affecting initialized data segment size). Additionally, some of our members have successful EXI implementation experience in resource-constrained environments. We concluded, however, that we should not define different levels of conformance to accommodate subsets of EXI capability. Defining the right profiles depends very much upon the use case(s) in mind. The decision as to what capabilities/features could be omitted and which should be retained was best left to the user/implementer. We have tried, though, to make sure that terminology defined in the specification is rigorous enough to discuss EXI features. This should help in discussing what EXI functionality is critical in a given environment and which are not. But we still maintain that a conformant EXI processor must implement the EXI specification in its entirety. We do understand, however, that there may be partial implementations of the EXI specification in use. We want to strongly caution that any restricted profiles for EXI functionality be used sparingly, and in closed environments in which all participants are aware of the supported and unsupported EXI capabilities. It is also critically important that any profiles be compatible with an implementation conformant to the EXI spec. In other words, a standard EXI processor should be able to handle any encoded document that a given EXI profile implementation generates. |
tocheck |
LC-2189
SHIMIZU Wataru <shimizu.wataru@canon.co.jp> (archived comment) |
|
EXI allows one to provide the same typing information as XML. In XML documents you can provide hints about the type by using the attribute xsi:type. EXI makes use of this information in the same fashion. EXI being type-aware with attributes and any other type other than the built-in types provided by XML Schema requires external information such as an XML schema document. For a more detailed explanation please take a look at our response to a related question (see also LC-2175). |
tocheck |
LC-2185
TAMIYA Keisuke <tamiya.keisuke@canon.co.jp> (archived comment) |
|
The version of XML that occurs in the XML declaration is for indicating the slightly different syntax rules implied by each XML version (i.e. XML 1.0 vs XML 1.1 as of this writing). EXI format is a representation of XML Information Set [1]. We are aware that the Document Information Item [2] in Infoset provides a "version" property that corresponds to the XML version. However, the value of that property does not imply different semantics that need to be captured at the Infoset level. These are the reasons that explain why EXI is, as well as should, be agnostic about the version of XML. In some anticipated scenarios of EXI use, application programs are concerned only of the infoset, with no involvement of serialization in XML at any point of the processing and communication chains. In such applications, "version" property of Document Information Item would not provide any benefits. Also, in applications where serialization of infoset in XML is involved in conjunction with EXI along the way of computing chains, the preservation of the original XML version is rarely concerned. This is because the programs that consume the data are again are, more often than not, only concerned of the infoset, not particularly of the subtle discrepancy of the XML 1.x syntax. The recent publication of XML 1.0 5th edition [3] in a sense has made this argument more indisputable, given that the one single most outstanding discrepancy that was present between the XML 1.0 and 1.1, the repertoire of characters, is now essentially dissolved. Yet, we understand that there are use cases where the use of a particular version of XML is required when serializing infoset into XML. On such occasions, it is the program that subsequently consumes the serialized XML that calls for a particular XML version. We consider XML version as the artifact of XML serialization, and therefore is the function of XML serializer implementations, instead of being something that has to be inherited from the source XML if any that was fed into the computing chain as an input. As described above, we do not foresee critical issues to be caused by not providing the placeholder field in EXI format for carrying text XML version numbers. On the other hand, there could be substantive cost if EXI supports XML version numbers in the grammar system, because doing so would cause every instance of EXI streams to grow slightly in size even when the XML version value is absent. One of the major uses of EXI, that is, frequent exchange of tiny documents could suffer from this, because it is typical that such tiny documents are designed very carefully to pinch on bits to maximize efficiency. Considering those balances, we decided to forgo the "version" property of Document Information Item of Infoset. [1] http://www.w3.org/TR/xml-infoset/ [2] http://www.w3.org/TR/xml-infoset/#infoitem.document [3] http://www.w3.org/TR/2008/REC-xml-20081126/ |
tocheck |
LC-2194
TAMIYA Keisuke <tamiya.keisuke@canon.co.jp> (archived comment) |
|
http://lists.w3.org/Archives/Public/public-exi-comments/2009Oct/0004.html | tocheck |
LC-2193
Youenn Fablet <Youenn.Fablet@crf.canon.fr> (archived comment) |
|
The spec is intentionally made abstracted from the implementation of schemaID use. It is presupposed that it is up to use cases, applications, or other specifications that leverage EXI format to define the syntax and semantics of the schemaID field, which has led to the approach. For example, there are cases where strings of a couple of characters length would be used as schemaID, whereas URIs may be suited in some other use cases. In addition, the schema identified with schemaID may be described in a schema language other than XML Schema, such as Relax NG, as long as there has been defined a well-known schema-binding method for that schema language in use. The specification also stops short of defining any mechanism to assure the matching of instances and schemas. Again, it's up to use cases and applications to define schema identity in connection with their own schemaID semantics, or even to determine whether such mechanism is required at all. Either meta-data managed out of bound, or [user defined] header options field could be used for assuring the level of schema identity that each use case requires for addressing integrity issues such as false positive incidents. |
yes |
LC-2130
Yuri Delendik <yury_exi@yahoo.com> (archived comment) |
|
We will mention in the spec that only BMP characters indicated by each category are included in the set of characters for use in restricted character set computation. This should make '\d' still relevant, because the category "Nd" contains only 230 BMP characters. Shown down below is the number of characters (both total and BMP) contained in each category, derived from version 5.0.0 of Unicode. Based on this, category names that cause to stop the computation should now consist of the followings. 'L'[ulo]?, 'M'[n]?, 'N', 'P'[o]?, 'S'[mo]? or 'C'[o]? . Thanks! 65 characters in Cc (65 BMP chars) 138 characters in Cf (33 BMP chars) 137468 characters in Co (6400 BMP chars) x 2048 characters in Cs (2048 BMP chars) x 1634 characters in Ll (1102 BMP chars) x 167 characters in Lm (167 BMP chars) 89344 characters in Lo (44681 BMP chars) x 31 characters in Lt (31 BMP chars) 1320 characters in Lu (836 BMP chars) x 175 characters in Mc (167 BMP chars) 10 characters in Me (10 BMP chars) 880 characters in Mn (602 BMP chars) x 290 characters in Nd (230 BMP chars) 210 characters in Nl (51 BMP chars) 336 characters in No (252 BMP chars) 10 characters in Pc (10 BMP chars) 18 characters in Pd (18 BMP chars) 65 characters in Pe (65 BMP chars) 9 characters in Pf (9 BMP chars) 11 characters in Pi (11 BMP chars) 278 characters in Po (260 BMP chars) x 66 characters in Ps (66 BMP chars) 41 characters in Sc (41 BMP chars) 99 characters in Sk (99 BMP chars) 914 characters in Sm (904 BMP chars)x 2958 characters in So (2350 BMP chars) x 1 characters in Zl (1 BMP chars) 1 characters in Zp (1 BMP chars) 18 characters in Zs (18 BMP chars) |
tocheck |
LC-2177
FABLET Youenn <Youenn.Fablet@crf.canon.fr> (archived comment) |
|
You are not missing anything. There is indeed no reason, when seen purely from the interoperability point of view, to require a specific ordering of the attributes when strict is false. When the encoder does not order the attributes, the out-of-place attributes will just get encoded as deviations, which is understandable by the decoder, as you say. We will change the specification so that it no longer requires the attributes to be sorted in either kind of stream, excepting naturally the case when strict is true. Also, the xsi:type and xsi:nil attributes still have to come first, since their presence affects the grammar used for the rest of the element content. The specification will still strongly recommend the attributes to be ordered for any element that is encoded with a schema-informed grammar, as not ordering them will hurt compactness. In particular, when there are multiple mandatory attributes, not ordering them may cause also the content of that element, not just the attribute list, to be encoded as deviations. |
tocheck |
LC-2179
FABLET Youenn <Youenn.Fablet@crf.canon.fr> (archived comment) |
|
You are quite right. This is a good catch and will be fixed in the next version of the specification. The following specification revision has been proposed to address this issue and is being reviewed by the working group. ---------------------------------- For each non-terminal Element i, j , such that 0 ≤ j ≤ content , with zero or more productions of the following form: Element i, j : AT (qname 0 ) NonTerminal 0 AT (qname 1 ) NonTerminal 1 ⋮ AT (qname x-1 ) NonTerminal x-1 where x represents the number of attributes declared in the schema for this context, add the following productions: Syntax Event Code Element i, j : AT (*) Element i, j n.m AT (qname 0 ) [schema-invalid value] NonTerminal 0 n.(m+1).0 AT (qname 1 ) [schema-invalid value] NonTerminal 1 n.(m+1).1 ⋮ ⋮ AT (qname x-1 ) [schema-invalid value] NonTerminal x-1 n.(m+1).(x-1) AT (*) [schema-invalid value] Element i, j n.(m+1).(x) where n.m represents the next available event code with length 2. |
tocheck |
LC-2181
FABLET Youenn <Youenn.Fablet@crf.canon.fr> (archived comment) |
|
This is a good point and you are absolutely right. The next version of the specification will include this semantic. |
tocheck |
LC-2182
FABLET Youenn <Youenn.Fablet@crf.canon.fr> (archived comment) |
|
Yes, you make a very good point. Thank you for catching this. The next version of the specification will be updated to add the SE(qname) production before evaluating the element content. |
tocheck |
LC-2184
FABLET Youenn <Youenn.Fablet@crf.canon.fr> (archived comment) |
|
Yes, this is another very good point. Thanks again for your very thorough review of the specification. The next version of the specification will be updated to avoid adding the empty string to the string tables. |
tocheck |
LC-2105
Mohamed ZERGAOUI <innovimax@gmail.com> (archived comment) |
|
We found your points valid, however, at the same time we believe that the suggested change would achieve only negligible performance gain in the context of the whole EXI processing. Since we expect that it would bring no noticeable performance difference from the end user's point of view thus no compelling benefit everyone can harness from, at this point we found ourselves reluctant to make the requested change for now, given its impact on generated test data and implementations. Having said that, we would like to leave the issue still open, and may consider including the change in the future if we find a good chance to do it. On Jan. 7th the WG resolved to make the change. |
tocheck |
LC-2191
TAMIYA Keisuke <tamiya.keisuke@canon.co.jp> (archived comment) |
|
Text relating to this case when considering potential future final EXI versions was indeed missing from the format specification. Currently, the specification requires an EXI processor to process any final version that it understands. For preview versions, the specification already says that the behavior is implementation-dependent. As the working group cannot know what potential final EXI versions will look like in the future, we have decided not to constrain the processors in any manner in this case. We will therefore add text to the specification stating that the behavior on seeing a version number unknown to the processor (either preview or final) depends on the implementation, and possible behaviors include rejecting the stream. |
yes |
LC-2132
Yuri Delendik <yury_exi@yahoo.com> (archived comment) |
|
You are right. That's an omission in the specification which will be fixed with the next update. e.g. TypeEmpty for ur-Type is going to look like Ur-Type_0 : AT(*) Ur-Type_0 EE |
tocheck |
LC-2174
FABLET Youenn <Youenn.Fablet@crf.canon.fr> (archived comment) |
|
Any named simple types can be used as the names of the first elements in datatype representation maps, as long as they are defined in the schema. So, "mystring" type defined in your example schema snippet can be used in datatype representation maps, in no less interoperable manner than built-in xsd datatypes can be used. The note given at the end of section 7.4 is about the mechanism of sharing user-defined datatype representation, not about the types. Schemas are supposed to be shared among the parties before exchanging documents, so the types in the schemas are shared knowledge at that point as xsd built-in types also are. The note only warns that extra caution should be paid before sending documents that use user-defined datatype representation to make sure the recipient knows how to decode the value that was encoded using that custom representation. We plan to improve some language there to make it clear that it is about user-defined datatype representations. |
tocheck |
LC-2176
FABLET Youenn <Youenn.Fablet@crf.canon.fr> (archived comment) |
|
You are right in assuming that schema-informed and built-in grammars may coexist in the same EXI stream. The first published EXI primer document [1] uses an incorrect terminology in that regard. A revised version will be available soon and integrate your comments, beside other improvements and spec consistency issues. |
tocheck |
LC-2178
FABLET Youenn <Youenn.Fablet@crf.canon.fr> (archived comment) |
|
If you wish to use EXI's schema-informed capabilities in compressing these documents, the best short-term solution does indeed appear to be to create an XML Schema for such documents. Longer term, it should be possible to define another mapping to EXI grammars from whichever schema language is being used for such documents, and use that in compression. The existing mapping for XML Schema will undoubtedly prove useful in such work, by showing how different constructs map to grammars. Note, though, that the EXI Working Group has no intention of defining such a mapping for any other schema language than XML Schema. |
tocheck |
LC-2180
FABLET Youenn <Youenn.Fablet@crf.canon.fr> (archived comment) |
|
In this case, the spec. currently describes the desired behavior. In particular "no more than 100 values" means <= 100 values. To reduce future confusion, we've updated the specification to use the terminology "at most" and "more than" to mean <= and > respectively throughout this section. |
tocheck |
LC-2183
FABLET Youenn <Youenn.Fablet@crf.canon.fr> (archived comment) |
|
Good question. The xsi:schemaLocation and xsi:noNamespaceSchemaLocation attributes are not permitted in EXI streams when the strict option is set to true. Based on your comments, we are including some additional text in the next version of the specification to clarify this. However, this does not completely explain the issue you are seeing in the EXI reference encodings. This was due to a problem with the encoder configuration used to generate the encodings. The correct configuration will be used next time the group updates the reference encodings. Thank you for reporting this! |
tocheck |
LC-2198
FABLET Youenn <Youenn.Fablet@crf.canon.fr> (archived comment) |
|
http://lists.w3.org/Archives/Public/public-exi-comments/2009Jul/0000.html There is a follow-up comment, which is processed separately as LC-2248. |
yes |
LC-2248
FABLET Youenn <Youenn.Fablet@crf.canon.fr> (archived comment) |
|
http://lists.w3.org/Archives/Public/public-exi-comments/2009Jul/0007.html | tocheck |
LC-2197
Gengo Suzuki <suzuki.gengo@lab.ntt.co.jp> (archived comment) |
|
Thank you for providing a feedback that allows the WG to improve the language of the spec in a way to make it clearer to all the readers. As you alluded in your comment, the types that are elligible are only simple types. Therefore, the implementors of user-defined datatype representations not only do not need to be concerned about event-codes chore, but also indeed have no access to them. We acknowledge that some of the sentences in that section can be improved for clarity, as well as Example 7-3 in which the type name "geo:geometricSurface" might have indicated that it represented a complex type thus was misleading. We will make changes to make it unequivocal both in the language and the example. |
tocheck |
LC-2107
Rick Jelliffe <rjelliffe@allette.com.au> (archived comment) |
|
Regarding the title of appendix E "Deriving Character Sets from XML Schema Regular Expressions", I agree that the use of "Character Sets" can certainly be misleading as you pointed out. Since appendix E depends on XSD regex which in turn depends on Unicode, it might be worthwhile to reuse the same language that is used by XSD regex to indicate the same. It appears that XSD uses the term "set of characters" to indicate a collection of characters with associated UCS code points. Therefore, we intend to change "character set" in appendix E to "set of characters" to align with XSD regex description, which makes the title "Deriving a set of characters from an XML Schema Regular Expression". On the other hand, we think the use of term "Character Set" in section 7.1.10.1 Restricted Character Sets is accurate. Unlike appendix E which computes a set of characters with UCS code points, this section creates a new character set with its own code points. |
tocheck |
LC-2196
Simon Parker <simon.parker@polarlake.com> (archived comment) |
|
The EXI WG is grateful for your interest in the EXI documents, your attention to the language details in particluar, and the time and care to report them back to us. It will surely make a difference, and such a report is valuable to the continuous effort of improving the quality of EXI deliverables. We have worked on the suggested changes in our internal editor's draft copy of the documents. You will see the changes in the public draft when the documents are published next time around. There is one suggestion in your report that might have been a result of confusion. The wording of "Schema-deviated" in section "8.5 Schema-informed grammars" was indeed meant to be phrased that way. To make the intent clearer, we plan to modify the sentence as follows. "Of particular note is that built-in grammars that are invoked for schema-invalid occurrences of elements or the elements that matched either SE(*) or SE(uri:*) but are not declared in the schema are still subject to dynamic grammar learning during the rest of the EXI stream processing as is described in 8.4.2 Built-in Fragment Grammar. " |
tocheck |
LC-2109
Yuri Delendik <yury_exi@yahoo.com> (archived comment) |
|
In 8.5.4.2.1, we will describe that you need to remove a production 'G_i, j -> RHS(G_i,k)_h' when that would generate either a self-loop or a production that has been previously replaced. http://lists.w3.org/Archives/Public/public-exi-comments/2008Nov/0019.html |
tocheck |
LC-2110
Yuri Delendik <yury_exi@yahoo.com> (archived comment) |
|
You are absolutely right. I have added text to section 8.5.4.3 Event Code Assignment to address this problem. The event order is now specified as: 1. all productions with AT(qname) on the right hand side sorted lexically by qname localName, then by qname uri, followed by 2. all productions with AT(urix : *) on the right hand side sorted lexically by uri, followed by 3. any production with AT(*) on the right hand side, followed by 4. all productions with SE(qname) on the right hand side sorted in schema order, followed by 5. all productions with SE(urix : *) on the right hand side sorted in schema order, followed by 6. any production with SE(*) on the right hand side, followed by 7. any production with EE on the right hand side, followed by 8. any production with CH on the right hand side. This change will show up in the next draft of the EXI specification. |
tocheck |