This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 5546 - Reconcile SML-IF with RFC 2557
Summary: Reconcile SML-IF with RFC 2557
Status: RESOLVED WONTFIX
Alias: None
Product: SML
Classification: Unclassified
Component: Interchange Format (show other bugs)
Version: LC
Hardware: PC Windows NT
: P2 normal
Target Milestone: ---
Assignee: C. M. Sperberg-McQueen
QA Contact: SML Working Group discussion list
URL:
Whiteboard:
Keywords: externalComments, reviewerSatisfied
Depends on:
Blocks:
 
Reported: 2008-03-07 01:38 UTC by Pratul Dublish
Modified: 2008-06-25 12:37 UTC (History)
3 users (show)

See Also:


Attachments

Description Pratul Dublish 2008-03-07 01:38:44 UTC
From http://lists.w3.org/Archives/Member/w3c-xml-schema-ig/2008Mar/0001.html 

 Note I have only skimmed the SML Interchange Format document, which I couldn't
really get my head around and on which I have no comment to make,
except that it appears to be largely duplicating the work of RFC 2557,
and they should either abandon SML-IF and use multipart/related per
that RFC, or explain why they haven't done so.
Comment 1 Pratul Dublish 2008-03-31 23:56:13 UTC
Resolution in F2F meeting on 3/31

We can use MIME since
1. MIME data streams are not XML documents and therefore can't be schema validated
2. MIME does not support multiples aliases for a body part
3. MIME format is not easily extensible 

The change in status should cause email to be sent to the originator of this
issue, to whom the following request is addressed.

Please review the changes adopted and let us know if you agree with this
resolution of your issue, by adding a comment to the issue record and changing
the Status of the issue to Closed. Or, if you do not agree with this
resolution, please add a comment explaining why. If you wish to appeal the WG's
decision to the Director, then also change the Status of the record to
Reopened. If you wish to record your dissent, but do not wish to appeal the
decision to the Director, then change the Status of the record to Closed. If we
do not hear from you in the next two weeks, we will assume you agree with the
WG decision.
Comment 2 Pratul Dublish 2008-03-31 23:57:32 UTC
Correction to Comment #2
The resolution is

We CANNOT use MIME since
1. MIME data streams are not XML documents and therefore can't be schema
validated
2. MIME does not support multiples aliases for a body part
3. MIME format is not easily extensible 
Comment 3 Kumar Pandit 2008-04-17 19:08:10 UTC
resolution (conf call on 4/17/2008): Resolve as 'wont-fix' and remove the 'decided' keyword because the 2 week response period has elapsed (see comment# 1).

Comment 4 Henry S. Thompson 2008-04-18 13:52:42 UTC
I don't understand comment 2 at all.  An interchange format is by definition a package, and multipart/related already has detailed considerations in it of how URI mapping needs to be done in a package of interlinked documents.  MIME data streams are certainly XML documents, if they are of the right media type.  I strongly urge you to reconsider: you are at risk of re-inventing the wheel here, and getting it less round than your predecessors.
Comment 5 Kumar Pandit 2008-06-13 00:04:08 UTC
Here is the email I had sent to public-sml regarding this topic:

------------------
From: Kumar Pandit 
Sent: Tuesday, April 29, 2008 3:48 PM
To: public-sml@w3.org
Cc: Kumar Pandit
Subject: RFC 2557 & SML-IF

During the last conf call, I volunteered to send notes on comparison between RFC 2557 & SML-IF and start an email thread for discussing the issue. This was recorded as action-item# 182.

I studied RFC 2557 and some other RFCs referenced by this RFC to see how it compares to SML-IF. I found several incompatibilities that make RFC 2557 based packaging not suitable for representing an SML model and the associated metadata/semantics. I also searched for an XML based packaging RFC but I didnt find one. 

Here is a summary of what I found. 

RFC 2557 (MIME Encapsulation of Aggregate Documents) is aimed at e-mail transfer of multi-resource HTML multimedia documents. The RFC mentions that the conventions defined in that RFC can also be used for self-contained multi-resource HTML multimedia documents retrieved by other protocols such as HTTP and FTP. The root resource in such a document must be of type text/html. RFC 2557 allows hierarchical packaging of entities for which a mime type is defined. It also allows encoding some or all of the body parts as base64.

In order to simulate SML-IF like structure using RFC 2557, one may package XML documents inside an html root resource. That is, something like below:
1.	root: text/html
a.	metadata (such as rule/schema bindings, etc.): multipart/related
b.	definition documents: multipart/related
c.	instance documents: multipart/related

Each of the multipart/related part above (1b/1c) can then contain a collection of documents some of which may be encoded as base64. However, this representation is simplistic at best. It does not permit many higher level abstractions and some semantic restrictions that are currently used in SML-IF.

RFC 2557 does not permit URIs that cross multipart/related boundary. Thus items such as those defined in the rule/schema bindings will not be allowed to refer to documents in 1.b/1.c. One solution may be to put all items in 1.a/1.b/1.c into a single multipart/related group but it has problems of its own. 

RFC 2557 mandates that each body part must have only one URI. SML-IF allows multiple aliases per document.

SML-IF allows different handling of URI references depending upon whether a schema document retrieval is involved. For example, if the target document of an SML reference is not in the model then that reference is deemed unresolved. On the other hand,  an implementation is allowed to fetch a schema document from outside the model in certain cases. This type of target specific URI reference resolution is not permitted in RFC 2557. It treats resolution of all URI references in the same way. There is no way to override this.

In RFC 2557, URI reference resolution always has a baseURI (either explicitly specified or the implicit thismessage:/). This means that certain invalid-model cases in SML-IF cannot be represented using RFC 2557.

A document formatted according to RFC 2557 cannot be validated against XML schema. 

Body parts of an RFC 2557 document can be semantically extended using the start-info parameter. This parameter can have any free form value. SML-IF defines extension points using XML schema wildcards. The extension values can be validated against a schema.

In SML-IF, reference resolution behavior is governed by the category of the target. For example, an SML reference pointing to a schema document must be treated as unresolved. In RFC 2557 there is no provision to define target category specific URI reference resolution.


Comment 6 Kumar Pandit 2008-06-13 00:05:48 UTC
John's email on the same topic:

-------------------------
From: public-sml-request@w3.org [mailto:public-sml-request@w3.org] On Behalf Of John Arwe
Sent: Wednesday, April 30, 2008 2:37 PM
To: public-sml@w3.org
Subject: Re: RFC 2557 & SML-IF


While I didn't have an action item assigned to me for this (that I am aware of, at least :-) ), I was able to fit a thorough reading of 2557 into my travel last week.  It certainly lead me to better understand Henry's comments about the parallel in function provided, and to ask some questions whose answers might lead me to form an opinion with some conviction.  Since my opinion, in my role as chair, matters not a whit, I simply offer them up here for the wider group.  Today I was able to skim (not thoroughly read) the 5 MIME RFCs 2557 refers to as well, so it is possible some of my questions are answered in them. 
Root must have a media type of text/html: 
Citations: 
[Abstract] This document a) defines the use of a MIME multipart/related structure to aggregate a text/html root resource and the subsidiary resources it references 
[1. Introduction] there is no requirement that implementations claiming conformance to this standard be able to handle any URI linked document representations other than those whose root is HTML. 
(I assume, based on what I read in 3023, that text/html+xml would be allowed although 2557 does not explicitly state this).  This implies that in order to have any hope of using MIME to package a set of related SML documents (i.e. a model), at least one of them would have to be (X)HTML.  In theory one could artificially construct a root html document as part of the packaging process, but this does seem to stretch the process a bit far (in the same way that we encountered while considering how one might build an HTML ref scheme for <img> that involved the manufacture of XML proxies for non-XML media types). 
If one were to artificially construct a root xhtml document as part of the packaging process, and then stipulate that it would be destroyed as part of receive processing so that this root was not construed as being part of the interchange model, that seems to me like it would require a new media type to be registered.  Feels a bit heavyweight, although I have not studied the registration process. 
Regardless of the approach taken, support for XHTML will require the known implementations to change substantially. 
Since RFC 2557 requires text/html but not text/html+xml, a conservative reader would have to assume that HTML (sic, not XHTML) support is required.  The inability of implementations to rely on off the shelf XML components is likely to significantly impact the known implementations. 
When Henry raised the XHTML ref scheme issue, I read into that a decision on his part to specifically say Xhtml rather than HTML.  It is possible he assumed MIME allowed one to support XHTML not HTML (thus a smaller impact, since all the docs are still XML), but that is not how I'm reading 2557. 
2557 scoped to email (?) 
Citations: 
[Abstract] In order to transfer a complete HTML multimedia document in a single e-mail message, it is necessary to: 
[1. Introduction] 
There is an obvious need to be able to send such multi-resource documents in e-mail [SMTP], [RFC822] messages. 
The standard defined in this document specifies how to aggregate such multi-resource documents in MIME-formatted [MIME1 to MIME5] messages for precisely this purpose. 
I'm honestly not sure how narrowly to read this.  While I doubt the SML working group would say this use case is wholly unreasonable, I have never heard it to be anyone's focus. 
Read narrowly, it would appear to say that 2557 only applies for the purpose of email.  That reading seems a bit narrow for me.  I note that I can find no later normative statements that exclude non-HTML, and the introduction seems to go out of its way to encourage re-use for other documents.  It does however appear to qualify that encouragement with "for email" more often than a generous (re-use oriented) reader might expect.  Later sections appear to studiously refer to HTML as an examplar, weakening the case that the intended usage is limited to email.  Overall it's difficult to find persuasive support for one reading over the other. 
I did check with some IBM web services folks, and they say that MIME as an underlying format is allowed, e.g. by SOAP over HTTP.  More on that later, however. 
Existence of a single Root 
Citations: 
[Abstract] This document a) defines the use of a MIME multipart/related structure to aggregate a text/html root resource and the subsidiary resources it references 
[1. Introduction] there is no requirement that implementations claiming conformance to this standard be able to handle any URI linked document representations other than those whose root is HTML. 
Unless one artificially constructs a root to reference all documents in the interchange model, this assumption in the MIME RFC is not always met in SML.  It would be common enough for schema documents to have no explicit URI references to them, i.e. they are referenced implicitly through namespace URI matching, yet they are part of the model.  Similar situations exist for unbound rule documents and instance documents without any references to them in the interchange set.  In other words, an SML model may have 0..n root documents if one defines "root" as having explicit outbound URI-based references. 
Limited number of URIs per model document (equivalent to SMLIF alias) 
Citations: 
[4.2 The Content-Location Header] 
A single Content-Location header field is allowed in any message or content heading, in addition to a Content-ID header 
A Content-Location header can thus be used to label a resource which is not retrievable by some or all recipients of a message. For example a Content-Location header may label an object which is only retrievable using this URI in a restricted domain, such as within a company-internal web space. A Content-Location header can even contain a fictitious URI. Such an URI need not be globally unique. 
Multiple Content-Location header fields in the same message heading are not allowed. 
[RFC 2045] Content-ID values must be generated to be world-unique. 
Aside from the limitation on number (0..2 in MIME, 0..n in SMLIF) I see no functional difference.  Note, I used a generous reading here to say 0-2 rather than 0-1.  There might be other dragons hiding in content-id that I did not find yet. 
The difference in number is troubling, however.  In the domain of IT resource management, it is not unusual for a single enterprise to have multiple data repositories, each of which assigns a local name for a resource.  In order to address problems of the sort that the CMDB-Federation work now in DMTF states its intent to address, while preserving digital signatures (i.e. without forcing updates to the component subsets of data to all use a single URI to refer to the named resource), it must be possible to associate resource instances with more than one URI.  Since SML was originally targeted at solving problems in this domain, it seems likely that this would be an issue for potential adopters. 
Note that this same consideration, to avoid rewriting of legacy source text, influenced 2557 itself. 
[1. Introduction] The reason why this standard does not only recommend the use of Content-ID-s is that it should be possible to forward existing web pages via e-mail without having to rewrite the source text of the web pages. Such rewriting has several disadvantages, one of them that security checksums will probably be invalidated. 

URI reference scope limitations 
Citations: 
[7. Use of the Content-Type "multipart/related"] 
If a message contains one or more MIME body parts containing URIs and also contains as separate body parts, resources, to which these URIs (as defined, for example, in HTML 2.0 [HTML2]) refer, then this whole set of body parts (referring body parts and referred-to body parts) SHOULD be sent within a multipart/related structure as defined in [REL]. 
Even though headers can occur in a message that lacks an associated multipart/related structure, this standard only covers their use for resolution of URIs between body parts inside a multipart/related structure. This standard does cover ... [Arwe paraphrasing now, for brevity] ... URIs referring to other resources 
To me, the first cited text suggests (SHOULD) that an SML-IF document would act as the root (note: this does conflict with 2557's requirement that the root have media type text/html), i.e. the entire interchange model would correspond to a single multipart/related container.  Given Henry's initial response to our original response that using 2557 would make certain cross-document references impossible, I suspect he was reading it similarly. 
The issue then becomes how to impose the requirements on references required by SMLIF today (we think for good reason), but not imposed by the model of "1 SMLIF doc == 1 multipart/related container", as Kumar pointed out. 
CID URL Scheme support 
Citations: 
[8.3 Use of the Content-ID header and CID URLs] 
When URIs employing a CID (Content-ID) scheme as defined in [URL] and [MIDCID] are used to reference other body parts in an MHTML multipart/related structure, they MUST only be matched against Content-ID header values, and not against content-Location header with CID: values. 
I have not tried to track down all the dependencies, but it seems likely to me that somewhere we would end up being required to support the CID URL scheme during URI resolution.  Another potential impact to known implementations. 
SML-IF in web services 
As mentioned earlier, I looked into the applicability of MIME for web services exchanges.  I was told that it is allowed, provided that both the service provider and the Web service client support it.  I was also told that none of the protocol bindings currently in wide use require it, so in practice MIME is not a common format. 
I think it likely that one or more of the anticipated uses of SML involves the transfer of models as part of Web service message exchanges.  In that context, if we revert to MIME then we would appear to be doing one or both of two things: 
1. Implicitly adding a requirement for MIME support in the Web services stack (when one wishes to transfer SML models) and in Web service clients (...same). 
This seems unpalatable for the stack providers, and destined for failure on the client end which is much more distributed.  At best it will slow SML adoption in WS as the subset of providers (of both stack and client code) roll out upgrades/updates through all the existing deployed base components. 
2. Encouraging the WS users to define their own ME-specific SML model encapsulation syntax when MIME is not considered a practical option. 
I'm pretty sure the context specificity makes this a bad idea.  It works _against_ wide interoperability, if anything.  I know of one case already where a standard now in DMTF did not use SMLIF due to timing and a desire to not have dependencies on a spec "far away [organizationally]", and a second in OASIS with SMLIF currently on their agenda to assess for solving existing problems.  In both cases, reasonably complete technical assessments have not identified anything they needed beyond what SMLIF does. 
That's the list of issues that caught my eye reading through 2557.  I also saw a number of places where we could probably benefit from re-using some text.  I tried to do a mental comparison for the set of issues each was addressing to look for gaps. 
Moving on to the set of issues Kumar raised: 
I suspect we will have a more useful and congenial discourse with Henry (who is not, remember, an expert in our spec content) if we supply at least one example for each case where we say "2557 cannot do X".  "We cannot see how to get behavior X using 2557's facilities, did you have a solution in mind for this you can share?" might be a more cordial way to phrase those.  This motif seems to apply to (1) the limitations on scope of reference targets (in our parlance) vs the multipart/related structure assumed (2) >1 (or >2) aliases per document (3) how to handle refs targeting schema docs resolve differently than "normal" refs.  People sometimes infer an agenda behind phrases like "but it has problems of its own" if they are not resolved, whether an agenda actually exists or not. 
The base URI issue Kumar alluded to doesn't come to mind.  I looked at it as: SMLIF requires one whenever it would be needed (to absolutize a relative ref), it seems like any usage of MIME for SMLIF's function would impose the same requirement; as a consequence, one would never appeal to the implicit "thismessage:/" base URI.  Admittedly I did not wrestle with this very much, so there might well be a subtlety that gave me the slip (failing that, I blame whatever virus has me home all this week so far). 
The requirements for XML Schema could easily be met with a "so what?" response, potentially at two levels.   
(1) While it is true that a "MIME not SMLIF" format document cannot be schema-assessed, "so what?  you can still assess the model documents..." 
(2) While it is true that a "MIME not SMLIF" format extension cannot be schema-assessed, "so what?  you can still parse it..." 
To a skeptic, using the "cannot be assessed against a Schema" argument on its own is unlikely to persuade.  This might be a relatively simple scoping answer.  Just as 2557 concerned itself first and foremost with email transfer of sets of documents, other use cases free to pile on if they happen to work but not otherwise considered, we might likewise either become scoped (if not already, and I'd be skeptical to hear we missed this scoping in the charter) or else point to (existing text) stating that some constrained set of use cases, e.g. programmatic exchange of SML-based models, is our first priority ... other use cases free to pile on if they happen to work but not otherwise considered.  We might even explicitly say something like "for other use cases, e.g. the exchange of SML models via email, other existing specifications like RFC 2557 may be used as appropriate".  It might be perfectly appropriate in a context like email exchange to make _different_ decisions than we made in SMLIF (eg one could rewrite all sml refs to use a single alias, perhaps by using CID URLs).  One could also raise the (appropriate, I think) topic of implementation cost - certainly a big part of the reason we all chose to sit SML squarely on an XML base is the huge amount of existing componentry we can all leverage. 

Best Regards, John

Comment 7 Henry S. Thompson 2008-06-25 10:03:08 UTC
I am grateful to the WG for taking my suggestion seriously and exploring the possible use of RFC2557.  I accept the WG's conclusion that it doesn't address all their requirements.