Re: Substitution group handling

Hi Taki,

Thanks for the clarification. However, I am still not convinced that 
relying on the schemaID option to define the set of in-scope 
namespaces/schemas for a document exchange is feasible in complex 
distributed deployment scenarios, where servers and clients may use 
different versions of application software and therefore different sets 
of namespaces: while it seems relatively straightforward to ensure that 
a client and a server have all necessary schemas available for a given 
exchange, it is more than likely that they will have extra schemas 
available, and that they will never know which ones should be considered 
in scope for a given exchange unless this information is directly 
provided by the EXI stream.

The set of in-scope schemas for a document exchange must be built from 
two sources:

* The "static" set contains the schema of the root element of the 
document, plus all schemas that are directly or indirectly imported by 
this "root" schema. Exchanging the namespace of the "root" schema either 
out-of-band or through the schemaID option seems a practical and 
interoperable approach to determine the static set. Note that schemas of 
substitution elements are usually not imported by the "root" schema, and 
therefore cannot be automatically added to the static set.

* The "dynamic" set contains the schemas of namespaces not present in 
the static set, that are introduced in the document through only four means:
1) Attributes matching the anyAttribute wildcard
2) Elements matching the any wildcard
3) Substitution elements
4) Deviations from the schemas

The current EXI spec supports cases 1), 2) and 4) (when the strict 
option is off) through the use of SE(*) or AT(*) events, which allow the 
addition of the relevant namespaces to the dynamic set. However, in case 
3) - and only in that case - , we have to rely on complex out-of-band 
information to add appropriate schemas to the static set. I think this 
is an inconsistency that may create interoperability problems in the 
future. Note also that substitution elements are a relatively obscure 
feature that is seldom used (at least in our experience), so relative 
inefficiencies introduced at this level will have a limited impact on 
the overall EXI performances.

If you consider that my previous proposal is too complex, what about 
adding an option that would allow EXI encoders to use SE(*) for all 
substitution elements, thus avoiding the interoperability problem 
described above, at the expense of slightly less efficient compression 
(this would basically consider all substitution elements as deviations 
from the schema)? I actually think this should be the default behavior, 
and that the mechanism in the current spec should only be used in cases 
where the encoder is certain that it shares the same set of in-scope 
schemas with the decoder for a particular document exchange.

Cheers

Antoine


Taki Kamiya a écrit :
> Hi Antoine,
>
> Thanks for the comment and your careful attetion to the details of spec.
>
> The EXI schema-informed grammar system is described in a way that is
> solely concerned with the abstract schema model which is agnostic about
> the physical schema composition (i.e. imports, includes and redefines)
> that is in the separate realm of the XML Schema specification.
>
> The schema information in effect for individual EXI stream is either
> communicated out-of-band or through the schemaID option. This is described
> in section "5.4 EXI Options". However, your suggestion to make the correlation
> explicit is well taken, and we will add a sentence in "8.5 Schema-informed
> Grammars" to that effect with reference to that description.
>
> EXI does not try to leverage every feature of XML Schema exhaustively to
> wring every potential efficiency out of schemas. Instead, those schema
> features that EXI capitalizes on have been selected to achieve the best
> use of the schema. This is based on empirical judgement on the effect and
> broadness of the feature application while being keenly aware of the need to
> balance between the benefit of extra compactness and the accrued complexity
> that may adversely affect the code footprint and the processing efficiency.
>
> In the case of the abstract element case you brought to the attention,
> it is expected to cause only a slightest improvement if any in general
> given the log_2(n) formula used in the Unsigned Integer representation.
> We hope this helps to explain why EXI does not take advantage this XML
> Schema feature.
>
> Thanks!
>
> -taki
>
>
> -----Original Message-----
> From: Antoine Mensch
> Sent: Monday, September 28, 2009 12:55 AM
> To: public-exi-comments@w3.org
> Subject: Substitution group handling
>
>   
>> The following definition (section 8.5.4.1.6) of the list of valid
>> members of an element declaration substitution group seems underspecified:
>>
>>     Let S be the set of element declarations that directly or indirectly
>>     reaches the element declaration PTi through the chain of
>>     {substitution group affiliation} property of the elements, plus PTi
>>     itself if it was not in the set.
>>
>>
>> The actual contents of S cannot be determined by only looking at the XML
>> Schema in which PTi is declared and the additional XML schemas it
>> imports. Rather, the complete set of XML Schemas in scope must be
>> considered to build S, as members of S can be contributed by each XML
>> Schema that imports the XML Schema in which PTi is declared.
>>
>> It is therefore important to determine the set of XML Schemas in scope
>> for a given EXI encoder/decoder, as shown in the example below:
>>
>> Let
>> - "a" be an element declaration in XML Schema A,
>> - "b" an element declaration in XML Schema B which has "a" as
>> {substitution group affiliation} property,
>> - "c" an element declaration in XML Schema B which has "a" as
>> {substitution group affiliation} property.
>>
>> Let P1, P2 and P3 be three EXI processors which respectively have {A, B,
>> C}, {A, B} and {A, C} as known XML Schemas.
>>
>> While in theory P1 and P2 could exchange schema-informed documents using
>> both A and B, P1 and P3 could exchange documents using both A and C, and
>> P2 and P3 could exchange documents using A, this will not be possible
>> unless a precise and shared definition of the set S for element
>> declaration "a" can be determined for each exchanged document. Indeed, a
>> naive static implementation would generate incompatible sets S1={"a",
>> "b", "c"}, S2={"a", "b"} and S3={"a", "c"} for
>> P1, P2 and P3.
>>
>> Is it the intention of the WG that this issue be addressed using the
>> SchemaId option? The current version of the spec leaves the use of this
>> option completely open in such cases, and that could lead to
>> interoperability issues. If it is nevertheless the case, it could at
>> least be useful to clarify in section 8.5.4.1.6 that S depends on the
>> SchemaId option.
>>
>> The WG could perhaps consider an alternative approach where members of
>> an element declaration substitution group are encoded as SE(*) the first
>> time their namespace appear in the document, and using the scheme
>> outlined in section 8.5.4.1.6 afterwards. This would allow both the
>> encoder and decoder to build the same set of in-scope namespaces for the
>> document, thus guaranteeing interoperability if both processors share
>> schemas for those namespaces. On the other hand, this would require the
>> dynamic construction of the set S for all elements that are potential
>> heads of substitution groups, thus deviating from the static approach
>> used so far for schema-informed grammars.
>>
>> Still about section 8.5.4.1.6, a minor optimization could probably be
>> obtained by excluding element declarations whose {abstract} property is
>> true from the set S, as such elements should never occur in valid documents.
>>
>> Best regards,
>>
>> Antoine Mensch
>>
>>     
>
>   
> ------------------------------------------------------------------------
>
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com 
> Version: 8.5.421 / Virus Database: 270.14.15/2434 - Release Date: 10/13/09 19:11:00
>
>   

Received on Thursday, 15 October 2009 08:43:56 UTC