5047 – Content of "dataType" in the IF schema

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 5047 - Content of "dataType" in the IF schema

Summary: Content of "dataType" in the IF schema

Status:	RESOLVED FIXED

Alias:	None

Product:	SML
Classification:	Unclassified
Component:	Interchange Format (show other bugs)
Version:	FPWD
Hardware:	PC Windows XP

Importance:	P2 normal
Target Milestone:	---
Assignee:	Virginia Smith
QA Contact:	SML Working Group discussion list

URL:
Whiteboard:
Keywords:	resolved

Depends on:
Blocks:	5247
	Show dependency tree / graph

Reported:	2007-09-13 19:57 UTC by Sandy Gao
Modified:	2007-11-08 18:27 UTC (History)
CC List:	0 users

See Also:

Attachments

Description Sandy Gao 2007-09-13 19:57:50 UTC

In Appendix A. SML-IF Schema, the content of "dataType" has

    <xs:sequence>
      <xs:any namespace="##other" processContents="skip"
              minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>

About minOccurs and maxOccurs attributes. There are a couple of questions.
1. Should "data" allow sub-elements for extension purposes?
2. Shouldn't there be exactly 1 non-extension sub-element?

Depending on the answer to question #1, the content of "dataType" should be replaced with one of the following:

(with extension points)

    <xs:sequence>
      <xs:any processContents="skip"/> 
      <xs:any namespace="##other" processContents="lax"
              minOccurs="0" maxOccurs="unbounded"/> 
    </xs:sequence>

(without extension points)

    <xs:sequence>
      <xs:any processContents="skip"/> 
    </xs:sequence>

The "skip" wildcard corresponds to the document being included. It doesn't specify a namespace constraint because one may (however unlikely it is) want to package an IF document in another IF document.

Comment 1 Pratul Dublish 2007-09-24 04:27:49 UTC

I agree with Sandy's proposal with extension point
    <xs:sequence>
      <xs:any processContents="skip"/> 
      <xs:any namespace="##other" processContents="lax"
              minOccurs="0" maxOccurs="unbounded"/> 
    </xs:sequence>

Comment 2 Kumar Pandit 2007-09-24 07:20:43 UTC

I agree with this proposal.

Comment 3 Sandy Gao 2007-09-24 13:56:22 UTC

Some more information/observation on the consequences of the 2 options. Consider:

<data>
  <!-- some comment -->
  <my:root>...</my:root>
  <!-- some other comment -->
  <some:element> ... </some:element>
</data>

Here it's clear that <my:root> is the root element of the document being packaged and <some:element> is an extension element. How about the 2 comments? Nothing can be used to indicate whether they should be considered as part of the transmitted document.

Without extension points, the transmitted document can be defined as represented by "all characters after <data> and before </data>".

With extension points, we could still define it as "all characters after <data> and before the first extension element, if any, otherwise before </data>".

Comment 4 John Arwe 2007-10-25 13:28:24 UTC

Agree.  Sandy, please add a short explanation of the reasoning for skip rather than lax.  I remember you had a good reason, but not what it was, and I am (cursed?  blessed?) with an inquiring mind.

Comment 5 Sandy Gao 2007-10-25 15:43:24 UTC

The "skip" wildcard matches the root element of the model document being packaged. SML-IF validity does not care about the content of that element (as long as it's well-formed XML), which is why the processContents is set to "skip".

Note that the proposal quoted in comment #1 is incomplete. We need to answer the question raised in comment #3. That is, if there are more than one sub-element under <data>, what marks the beginning and end of the embedded document? Are we implicitly assuming "all characters after <data> and before the first extension element, if any, otherwise before </data>"?

Comment 6 Kumar Pandit 2007-10-26 05:01:12 UTC

I am not sure why the proposal in #1 is incomplete. It is clear that anything that matches:

      <xs:any processContents="skip"/> 

is the root element of the packaged document. Anything else is not.

With this definition, the floating comments are not part of the embedded document. 

If at this point we believe that the existing proposal(s) are incomplete and there is no clear proposal, we should remove the hasProposal keyword.

Comment 7 Sandy Gao 2007-10-26 21:45:11 UTC

> the floating comments are not part of the embedded document.

Was this a conscious decision? Are you suggesting that there is no way to package:

<!-- my important comment -->
<?my-important-PI?>
<root/>
<!-- my important comment -->
<?my-important-PI?>

in an SML-IF document without losing comments and PIs?

I don't know whether having 3 proposals counts as "hasProposal". :-) Here they are:
1. Do not allow extension points in <data>. The embedded document is "all characters after <data> and before </data>."
2. Allow extension points in <data>. The embedded document is "all characters after <data> and before the first extension element, if any, otherwise before </data>".
3. Allow extension points in <data>. The embedded document is the element matching the "skip" wildcard (from '<' of the start tag to '>' of the matching end tag; ignoring comments and PIs).

I prefer #1, because we then don't need to deal with the "how about comments/PIs" question. Also note that one of the potential resolutions to bug 4687 is to embed base64 text in <data>, which would make "extensions" look awkward.

Comment 8 Kumar Pandit 2007-10-30 23:42:02 UTC

The analysis presented makes sense. #1 sounds reasonable to me. If this is the proposal then I agree with it. 

The current schema allows for dataType content to by empty and we should continue to allow it. Thus, the effective definition becomes:

  <xs:complexType name="dataType" mixed="false">
    <xs:sequence>
      <xs:any namespace="##other" processContents="skip" minOccurs="0"/>
    </xs:sequence>
  </xs:complexType>

Comment 9 Sandy Gao 2007-10-31 16:14:26 UTC

> The current schema allows for dataType content to by empty and we should
> continue to allow it.

Trying to understand what it means if the content is empty. Is this described anywhere in the IF spec? Does it mean "No data for this document"? This seems very strange to me. How can a "document" be in an SML model (the interchange set) when it is not even an XML document? (And even if we do want to support it, I would think it's better to not have <data> at all in this case.)

> <xs:any namespace="##other" processContents="skip" minOccurs="0"/>

Note that the "without extension points" alternative doesn't not specify the "##other" namespace constraint:

  <xs:any processContents="skip"/> 

One reason for not to specify it is mentioned in the bug report. There is actually a more natural and important reason.

"##other" does not match elements without a namespace, so it would not be possible to package a document (by value) in IF if its room element has no namespace. This is not a reasonable requirement.

Comment 10 Kumar Pandit 2007-11-01 04:32:01 UTC

I agree about the need to match elements without a namespace. I am ok with the definition being:

  <xs:complexType name="dataType" mixed="false">
    <xs:sequence>
      <xs:any processContents="skip" minOccurs="0"/>
    </xs:sequence>
  </xs:complexType>

We must have the ability to have empty dataType content. It is used to represent tombstoned documents in one of our scenarios.

Comment 11 Pratul Dublish 2007-11-06 16:31:49 UTC

Pls fix as per Comment #10

Comment 12 Virginia Smith 2007-11-08 18:27:18 UTC

Edited schema as specified. Note that, when the spec is built, the appendix actually shows the following:

<xs:complexType name="dataType" mixed="false">
  <xs:sequence>
    <xs:any processContents="skip" minOccurs="0" namespace="##any" maxOccurs="1"/>
  </xs:sequence>
</xs:complexType>