4687 – Handling of DTDs when composing an IF document

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 4687 - Handling of DTDs when composing an IF document

Summary: Handling of DTDs when composing an IF document

Status:	RESOLVED FIXED

Alias:	None

Product:	SML
Classification:	Unclassified
Component:	Interchange Format (show other bugs)
Version:	unspecified
Hardware:	PC Windows XP

Importance:	P2 normal
Target Milestone:	LC
Assignee:	Virginia Smith
QA Contact:	SML Working Group discussion list

URL:
Whiteboard:
Keywords:	resolved

Depends on:
Blocks:

Reported:	2007-06-21 17:48 UTC by Sandy Gao
Modified:	2008-01-17 19:52 UTC (History)
CC List:	0 users

See Also:

Attachments

Description Sandy Gao 2007-06-21 17:48:23 UTC

Section "The Basics" describes how to compose an IF document from model documents. DTDs are currently discarded. This may be problematic because the DTD may have entity declarations and default attributes.

Comment 1 Virginia Smith 2007-08-09 19:55:08 UTC

this refers to section 3.3.1

Comment 2 Kumar Pandit 2007-09-20 19:59:25 UTC

Michael to write a proposal for a possible solution as decided in conf call on 9/19.

Comment 3 C. M. Sperberg-McQueen 2007-10-17 16:06:55 UTC

An email message discussing several possible technical solutions to this problem
was sent to the WG on 17 October 2007:
http://lists.w3.org/Archives/Public/public-sml/2007Oct/0116.html

Comment 4 Sandy Gao 2007-10-18 00:24:02 UTC

Discussed at 2007-10-17 F2F. The following proposal surfaced:
1. For producer: 
When packaging documents with DTDs, producer [should|must] do one of the following: 
1.1 normalize (default value, expand entity, etc.) to make it standalone 
1.2 use base 64 to encode the entire document 
note: normalization doesn't always work for the schema entity types 
2. For consumer: 
When unpackaging embedded documents:
2.1 when document/data/@xsi:type="xs:base64", then decode it and process it as a separate document 
2.2 every other model document (embedded as XML) is processed as if they had the same DTD as the one specified on <model>. 

Known decision points:
a. should vs. must in (1)
b. whether to support base64 encoding (notice the note about schema entity type)

Comment 5 Virginia Smith 2007-11-01 18:50:28 UTC

RESOLUTION: conditional consent for 1.2 only. pending Kumar's research result.

Comment 6 Kumar Pandit 2007-11-28 04:10:11 UTC

I finished my investigation on this issue. I agree to supporting base64 encoding of documents with DTD. Since most members were in favor of supporting only 1.2, I removed the other option and selected MUST from MUST|SHOULD.
 
Here is the resultant proposal:

1. For producer: 
When packaging documents with DTDs, a producer MUST use base 64 to encode the entire document.

2. For consumer: 
When unpackaging embedded documents:
2.1 when document/data/@xsi:type="xs:base64", then decode it and process it as
a separate document 
2.2 every other model document (embedded as XML) is processed as if they had
the same DTD as the one specified on <model>.

Comment 7 Sandy Gao 2007-11-28 13:50:50 UTC

One point to clarify. When I (and I hope others) used document/data/@xsi:type="xs:base64", I did not really mean using "xsi:type", but rather a shorthand for "this document uses base64 encoding".

xsi:type=xs:base64Binary is actually invalid, because xs:base64Binary is not derived from sml:dataType.

So I would suggest to add another element "base64Data", whose type is xs:base64Binary. And then add it to the content of <document>, which becomes a choice among "data", "locator", and "base64Data".

Comment 8 Virginia Smith 2007-11-28 19:22:01 UTC

My understanding of the consensus reached is that some members agreed to adding support for 1.2 (in comment #4) and not 1.1 in light of the fact that option 1.1 was allowed anyway (that is 1.1 is not precluded in the current spec) so mentioning it was not necessary. But the proposal in comment #6 now does preclude 1.1. Is there a good reason to specifically exclude 1.1?

Comment 9 Kumar Pandit 2007-11-29 00:01:18 UTC

I agree with comment #7.

Here is some clarificatin on comment# 8:
The current proposal does not preclude DTD normalization. SML-IF does not define document life-cycle. That is, it does not define or restrict any transformations that can be applied to documents during their lifetime. It only defines what must be done just before embedding the document in an IF document.

In this case, if one wants to preserve DTD in a document, one must convert the document to base64. However, if a producer normalizes or simply discards DTD then the resulting document does not have a DTD just before adding to SML-IF. In that case, the producer does not need to convert to base64.

To summarize, the proposal does not preclude normalization. As long as everyone agrees on the concepts, the editors can phrase them correctly to articulate the concepts.

Comment 10 Virginia Smith 2007-11-29 00:26:58 UTC

The clarification in comment #9 makes sense. I agree with the proposal in comment #6 and comment #7.

Comment 11 Valentina Popescu 2007-11-29 14:40:34 UTC

+1 for the proposal in comment #7

Comment 12 Pratul Dublish 2007-11-29 20:40:01 UTC

Fix as per #6 and #7, as amended by #9

Comment 13 Virginia Smith 2007-11-30 01:17:43 UTC

Reworked entire section 5.2 SML-IF Documents. This section now reads:

=============
5.2 SML-IF Documents

The purpose of SML-IF is to package the set of documents that constitute an SML model into a standard format so that it can be exchanged in a standard way.

An SML-IF document MUST conform to XML [XML] specification.

An SML-IF document MUST be valid under the XML Schema given in Appendix A.

An SML-IF document MAY form a valid SML model but it is not required to do so. Various uses of SML-IF may define requirements with respect to model validity and the interchange set, but this specification does not.

Each document in the interchange set MUST be represented in the SML-IF document by a separate document element as follows:

1. Each definition document in the interchange set MUST appear as a descendant of a model/definitions/document element. The order of the document children is not significant.
2. Each instance document in the interchange set MUST appear as a descendant of a model/instances/document element. The order of the document children is not significant.

Each document in the interchange set MUST be included in the SML-IF document either as an embedded document (where the document to be included is embedded in the SML-IF document) or by including a reference to the document.
5.2.1 Embedded Documents

Documents that are to be embedded in the SML-IF document MUST be embedded as follows:

1. Definition documents MUST be embedded as the content of model/definitions/document/data element. There can be at most one definition document embedded in each model/definitions/document/data element.
2. Instance documents that do not contain a DTD MUST be embedded as the content of a model/instances/document/data element. There can be at most one instance document embedded in each model/instances/document/data element.
3. Instance documents that contain a DTD MUST be embedded as follows:
1. The document MUST be encoded in base64 format.
2. The resultant data stream MUST be embedded as the content of a model/instances/document/base64Data element. There can be at most one instance document embedded in each model/instances/document/base64Data element.

When extracting an embedded document that is contained in a base64Data element, an SML-IF consumer MUST decode the content of the base64Data element first and then process the resulting document as a embedded instance document. All embedded instance documents not encoded in base64 MUST be processed as if they contained the same DTD as the one specified on the model element (if present).
5.2.2 Referenced Documents

Documents that are to be referenced rather than embedded MUST be included as follows:

1. If the document is a definition document, the location of the document MUST be included as the content of a model/definitions/document/locator element.
2. If the document is an instance document, the location of the document MUST be included as the content of a model/instances/document/locator element.

SML-IF specifies one way that MAY be used to provide the location of the referenced document, the documentURI element.

An SML-IF consumer MAY choose to locate a referenced document. If an SML-IF consumer chooses not to locate a referenced document or if it attempts to locate the referenced document and this attempt fails, then the SML-IF consumer MUST treat the referenced document as if it is not part of the interchange set. If either of these conditions occurs, the SML-IF consumer SHOULD make its invoker aware of this condition.

=============

Comment 14 Sandy Gao 2007-12-05 16:57:43 UTC

Sorry for not having reviewed this earlier and having to reopen it.

1. I'm surprised to see that definition documents don't get to use <base64Data>. I think it's better to treat definition and instance documents consistently. There is no reason to believe that producers can always "normalize" definition documents so that they don't need DTDs.

2. I'm also surprised to see "Instance documents that do not contain a DTD MUST be embedded as the content of a model/instances/document/data element". A producer may choose to embed all documents using base64Data, for simplicity. And there may be cases where the embedded document does *not* want to get the "global" DTD. Then "base64Data" is a good solution.

3. It may help to clarify what's actually encoded in "The document MUST be encoded in base64 format." It could be interpreted as encoding the Unicode characters. I think we really mean the octet stream (in the original encoding).

4. "as a embedded instance document" -> "as *an* embedded instance document"

Again, please accept my apologies for not having reviewed this before the call.

Comment 15 Sandy Gao 2007-12-05 18:31:29 UTC

Removing the dependency on bug 4562, which seems totally irrelevant. Not sure what we intended to say.

Comment 16 Virginia Smith 2007-12-05 21:40:30 UTC

Added keyword 'needsAgreement' since this bug has been reopened.

I agree with suggestions in comment #14.

Comment 17 Kumar Pandit 2007-12-07 07:45:44 UTC

Marking as 'editorial' per resolution in conf call on Thu 12/6/07

resolutions: move #4687 to editorial and make changes listed in comment #14 of 4687

Comment 18 Kumar Pandit 2007-12-12 08:42:33 UTC

In comment# 17 I mentioned "Marking 'editorial'..." but somehow the bug was not marked that way.

Really marking it editorial this time.

Comment 19 Virginia Smith 2007-12-12 17:13:31 UTC

The updated section reads:

5.2.1 Embedded Documents

Documents that are to be embedded in the SML-IF document MUST be embedded as text or in an encoded format as follows:

1. If the document is embedded as text, it must be included as the content of a model/definitions/document/data element if it is a definition document or a model/instances/document/data element if it is an instance document. There can be at most one document embedded in each model/*/document/data element.
2. If the document is embedded in an encoded format, then the octet stream representing the document MUST be encoded in base64 format. The resultant data stream MUST be embedded as the content of a model/definitions/document/base64Data element if it is a definition document or a model/instances/document/base64Data element if it is an instance document. There can be at most one document embedded in each model/*/document/base64Data element. Documents that contain a DTD MUST be embedded in this encoded format.

When extracting an embedded document that is contained in a base64Data element, an SML-IF consumer MUST decode the content of the base64Data element first and then process the resulting document as an embedded instance document. All embedded instance documents not encoded in base64 MUST be processed as if they contained the same DTD as the one specified on the model element (if present). If model/*/document/data contains no child element or model/*/document/base64Data has empty content then the SML-IF consumer MUST treat the document as if it is not part of the interchange set.

Comment 20 Kumar Pandit 2007-12-13 09:01:12 UTC

I agree with the changes described in comment# 19.

Comment 21 Kirk Wilson 2007-12-13 16:11:06 UTC

I will add my consent to this as well.

Comment 22 Sandy Gao 2007-12-13 17:44:32 UTC

+1 to changes made for this bug.

Comment 23 Valentina Popescu 2007-12-13 18:28:51 UTC

+1 for changes as described in comment #19

Comment 24 Virginia Smith 2007-12-13 20:23:47 UTC

Resolution - ok with amended change "There can be at most one document" change "can" to "MUST". Apply this kind of change to entire text where appropriate.

Comment 25 Virginia Smith 2008-01-03 16:31:21 UTC

section now reads:

5.2.1 Embedded Documents

Documents that are to be embedded in the SML-IF document MUST be embedded as text or in an encoded format as follows:

1. If the document is embedded as text, it MUST be included as the content of a model/definitions/document/data element if it is a definition document or a model/instances/document/data element if it is an instance document. There MUST be at most one document embedded in each model/*/document/data element.
2. If the document is embedded in an encoded format, then the octet stream representing the document MUST be encoded in base64 format. The resultant data stream MUST be embedded as the content of a model/definitions/document/base64Data element if it is a definition document or a model/instances/document/base64Data element if it is an instance document. There MUST be at most one document embedded in each model/*/document/base64Data element. Documents that contain a DTD MUST be embedded in this encoded format.

When extracting an embedded document that is contained in a base64Data element, an SML-IF consumer MUST decode the content of the base64Data element first and then process the resulting document as an embedded instance document. All embedded instance documents not encoded in base64 MUST be processed as if they contained the same DTD as the one specified on the model element. If model/*/document/data contains no child element or model/*/document/base64Data has empty content then the SML-IF consumer MUST treat the document as if it is not part of the interchange set.

Comment 26 Sandy Gao 2008-01-10 14:13:42 UTC

A minor editorial comment on the new text:

> All embedded instance documents not encoded in base64 MUST be processed as if
> they contained the same DTD as the one specified on the model element.

"DTD ... specified on the ... element" suggests that DTDs are associated with elements, but they are really associated with XML documents.

Suggest to replace that part with "DTD as the one associated with the SML-IF document".

Comment 27 Virginia Smith 2008-01-15 00:53:07 UTC

Last paragraph of the section now reads:

When extracting an embedded document that is contained in a base64Data element, an SML-IF consumer MUST decode the content of the base64Data element first and then process the resulting document as an embedded instance document. All embedded instance documents not encoded in base64 MUST be processed as if they contained the same DTD as the one associated with the SML-IF document. If model/*/document/data contains no child element or model/*/document/base64Data has empty content then the SML-IF consumer MUST treat the document as if it is not part of the interchange set.