This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 3251 - how to introduce primitives
Summary: how to introduce primitives
Status: RESOLVED FIXED
Alias: None
Product: XML Schema
Classification: Unclassified
Component: Datatypes: XSD Part 2 (show other bugs)
Version: 1.1 only
Hardware: PC Windows XP
: P1 major
Target Milestone: ---
Assignee: C. M. Sperberg-McQueen
QA Contact: XML Schema comments list
URL:
Whiteboard: cluster: extension
Keywords: resolved
Depends on:
Blocks:
 
Reported: 2006-05-09 11:08 UTC by Michael Kay
Modified: 2008-05-03 01:05 UTC (History)
0 users

See Also:


Attachments

Description Michael Kay 2006-05-09 11:08:08 UTC
This comment was *not* approved by QT, so I will make it as a personal comment, in rather stronger language.

I do not believe that users are crying out for another numeric data type. I think the requirement has been driven by vendors (or perhaps even by individuals within vendors), and it's not clear what they think it will achieve. It will certainly impose enormous implementation and transition costs on the whole community, both vendors and users, and of course on the groups responsible for other related specifications such as XPath, XQuery, and XSLT - these are costs which threaten the success of XML Schema 1.1 as a specification. 

There must be a better way of introducing a new primitive data type. It should be possible for one vendor who thinks the requirement exists to provide this data type, and for others to wait and see whether users take it up. This means that rather than introducing a new primitive type, the WG should be introducing an extensibility mechanism. Such a mechanism would allow extensions to be introduced experimentally by vendors, and those that prove successful can then find their way into the specification. The costs of introducing a new data type speculatively (and making its implementation mandatory) are far too high.

Furthermore, there are lots of very clearly stated user requirements for extensions to XML Schema in other areas (for example, co-occurrence constraints). Providing a feature that no-one is asking for, while failing to provide the features that users are demanding, will result in being perceived as unresponsive to user requirements, which in turn puts at risk the continued loyalty of the user community to this specification.
Comment 1 David Ezell 2007-06-26 15:50:16 UTC
The WG discussed the following at the June 2007 f2f:

This is a topic of general interest and we will discuss it with QT at our joint session.
Comment 2 Dave Peterson 2007-09-27 14:24:07 UTC
(In reply to comment #1)

> This is a topic of general interest and we will discuss it with QT at our joint
> session.

The WG presented to QT on 27 June its decision/proposal to (1) mark precisionDecimal as a "topic at risk" when the CR is released, (2) not to include precisionDecimal if IEEE has not finalized its 754 revision, and (3) not to include precisionDecimal if in the WG's opinion there was inadequate implementation support for IEEE 754.  QT [reluctantly?] accepted this decision.

Subsequent discussion under this bug should be on the proposed alternative:  that there should be vendor- or user-available mechanisms to define new primitives--primitive definition should not be confined to the Datatypes specification itself.
Comment 3 Noah Mendelsohn 2007-09-28 18:12:54 UTC
I would like to reiterate a position that IBM has taken from the start of our work on XML Schema:  the loss of interoperability that would result from allowing individual implementors or vendors to define their own primitives is too great for us to support such a proposal.  More specifically, we have not seen any such proposals that we can support. 

Take an example of a type that someone might wish to define: primeInteger, which would contain all and only the prime numbers.  While you can of course declare a type of that name, you cannot in Schema 1.0 or Schema 1.1 use our facet system to enforce primeness.  Let's say that in the interest oof making schema more useful to the mathematical community, we did allow new primitives, and some group of mathematicians wrote a specification for this type.  How would the implementations be shared.  If I just said:  OK, for each such type, you must distribute a .jar file with Java classes supporting certain interfaces, does that do it?  I suspect my friends at Microsoft will not be happy.  In fact, those of us who build C-language processors won't be happy either.  Indeed, it's quite possible that those of us who build optimized Java implementations won't be happy, since your classes may not work in our framework.

So, the new prime type will work in some processors and not others.  Now my phone starts ringing: processor X accepts my schema but IBM's doesn't.  Which is broken?  Answer:  neither.  Or maybe IBM's is broken because we didn't port that code you wrote.  

Our existing type system does not do everything that all users of schema might want, but it has the great virtue that every type that's legal is likely to be supportable in a quite compatible way in every schema processor.  That's very important.

Nothing prevents anyone from writing a new specification for XML Math Schemas that says:  "we start with the XML Schema 1.x specification, but add a primitive type primeInteger, the specification for which is.....  Schemas written in the Math Schema Language will be incompatible with W3C XML Schema if they use the new type."  That's fine.  We have a specification for the features that interoperate universally (W3C XSD) and those that work in a more limited community (Math Schema.)  Letting users define their own primitives will raise the expectation that the types will interoperate.

Unless someone can show me an implementation strategy that provides interoperable implementations, I remain strongly opposed to including user-defined primitives in our language.
Comment 4 Michael Kay 2007-09-28 20:04:23 UTC
I think primeInteger is an unfortunate example: its value space is a subset of xs:integer so it clearly should not be primitive. A better example might be gHourMinute to allow times in the format HH:MM.

I think that when users see a type in a vendor namespace they will know they are using an extension and they will not complain when they find that it isn't supported by other vendors. Equally, a third-party implementor of such extensions will not expect all schema processors to offer the same extensibility API. These facts don't mean that extensibility is bad: it is generally my experience that allowing vendor extension to a specification is uniformly good for vendors, for users, and for the health of the specification itself, since (a) it reduces pressure to standardize things that are needed only by a minority, and (b) it allows new features to prove themselves in the market before they are added to the standard. It can often lead to a virtuous circle in which one vendors' extensions if popular are copied by other vendors and then added to the standard at a later version.

It's better to define extensibility points within a language, rather than encouraging people to define variants of the language that are non-conformant, in the way that you suggest. Once you start doing the latter, people start getting very creative in the "improvements" they make to the spec, leading quickly to a complete loss of interoperability (as witness the SQL experience).
Comment 5 Noah Mendelsohn 2007-10-03 14:47:36 UTC
Michael Kay writes:

> it is generally my experience that allowing vendor
> extension to a specification is uniformly good for
> vendors, for users, and for the health of the
> specification itself

Well, my feeling is that it's at best a subtle tradeoff.  For example, I'm not convinced the issues are the same for a language like XML Query, uses of which are typically within a single organization, vs. a language like XML, which is the basis for communication across organizations.

If XML had followed your advice, it might have allowed individual implementors to add cool new features like structured attributes.  I think they would be very useful.  I also think that XML's would not have been nearly as successful if such implementation-specific extensibility had been allowed.

So the question is, where does XML Schema fit with respect to these concerns?  I do agree that it's a matter of degree, and that there some good points in favor of extensibility, e.g. to allow addition of primitive types.  On the other hand, I believe that XML Schema is used for the sort of cross-organization communication that XML itself is.  On balance, I still feel quite strongly that the best tradeoff is in favor of not allowing the creation of new primitive types, but I do understand (or think I understand) the case for the other position.

> It's better to define extensibility points within
> a language, rather than encouraging people to
> define variants of the language that are
> non-conformant, in the way that you suggest.

Again, I understand your position, but I think it's more of a tradeoff than you imply.

Noah
Comment 6 C. M. Sperberg-McQueen 2007-10-29 23:17:13 UTC
I'm changing the summary of this issue to make clearer why it's still open.
Comment 7 Michael Kay 2008-01-04 14:35:32 UTC
I was given the following action to progress the discussion on allowing implementations to add new primitive types:

ACTION 2007-11-30.01 Michael Kay to propose "legislation" that, should the WG so decide to allow them, would govern implementation of "new primitives".  Possibilities include namespace restriction and defining fallback behavior, the purpose being to provide a somewhat soft and moreover predictable landing for a processor should the proposed new type not be implemented.

Here are some suggestions:

(a) We should allow implementation-defined types that have xs:anyAtomicType as their base type.

(b) We should provide concrete syntax in XSDL for declaring such a type. This should provide options to declare its fundamental facets: (ordered, bounded, cardinality, numeric). I would add the default whitespace handling, whether the type is namespace-sensitive, and perhaps the set of constraining facets that are applicable to the type. Plus a URI which can be used in some implementation-defined way to locate an implementation of the type.

(c) However, like any other component, primitive type declarations can also appear as born-binary components.

(d) We might like to consider allowing extension types to have constraining facets other than those in our current set.

(e) Namespace discipline is left to the good sense of users. The only rules should be that certain W3C-defined namespaces (notably the XML and XML Schema namespaces) are reserved.

(f) I don't think there is a need for any special fallback mechanism. xs:import provides enough flexibility already. We should specify that it is not an error for the schema to contain a definition of an extension type for which no implementation is available; the error only occurs when someone tries to validate an instance against such a type.

(g) In the way we describe the facility, we should speak in terms of third-party type libraries. We should use language that encourages implementors to provide APIs that allow a third party to define new primitive types, rather than providing types that are burnt into one schema processor. We should use examples of types that vertical user communities might find valuable. While we should not create an expectation that implementations of extension types will ever be portable from one schema processor to another, we should not rule out the development of standard APIs that make this possible.

(h) We might describe in abstract terms the functionality that an implementation of an extension type needs to provide. As a minimum, it needs to be able to validate strings in the lexical space to determine whether they are valid, and to generate an "actual value" from the lexical value. It needs to be able to assess an actual value against the supported facets, including the ability to compare whether two actual values are equal. To be usable in XPath, it needs to provide the reverse conversion back to a string.

Michael Kay
Comment 8 Noah Mendelsohn 2008-01-04 15:42:07 UTC
I think the deep question here is about what is considered conforming.

As far as I can tell, everything that's proposed is possible today except for the new XSDL syntax.  Does that syntax do anything other than serve as a cross check or early warning that the type is indeed believed to be primitive?  The implementation is surely built into the validator either way.  

Today ship a validator that includes new primitive types, I must document it as nonconforming.  I infer that the proposed change (other than the syntax) would be to consider such processors to be fully conforming, I.e. as conforming as they would have been had they not supported the new primitive(s).

I remain very reluctant to "bless" such processors as conforming.  I think it will lead to confusion.  Worse, I worry that certain suppliers of schema software will widely deploy software that depends on types that are in some sense competitive with the ones we already have (perhaps tied into their middleware runtimes), and thus divide the schema community.  

So, I prefer to retain some distinction in the terminology about conformance.   I want to be able to say that processors that use extension types are either not conforming, or that they are conforming in some much more limited way (not sure whether I could endorse that latter approach, but I'm very glad to discuss it).    This way if someone calls up and says: "gee, how come your processor couldn't handle my schema with the "GiantCorpInteger" type?" I can say "Ah, I see, you didn't restrict yourself to features that are fully conforming." (or whatever).

Maybe or maybe not our RQ-144 mechanisms would give us some leverage for tackling this.  

So, my counter proposal (which I'm not quite ready to endorse but certainly ready to discuss a bit) would be:  
a) No new syntax.  Your schema just uses the type like any other.  If it's built into the processor with a base type of anyAtomic, so be it.
b) We document (I.e. name) a core level of conformance for processors that do not add such types, and note that interoperability is reduced insofar as users chose to refer to types that are neither provided with XSD nor declared using the facet mechanisms of XSD.

Crucially:  I am not happy about moving xs:precisionDecimal into the class of such optional types. I think that every processor that implements Schema 1.1 should implement precisionDecimal.

Noah

Noah
Comment 9 David Ezell 2008-01-23 15:04:06 UTC
The title of this bug has until now been "need for precisionDecimal / how to introduce primitives."  We are changing the name to make clear that this issue is being used to track discuss of implementation defined primitives.

The plan of record of the agreed resolution for Precision Decimal is made clear in the now closed issue bug 3120.
Comment 10 C. M. Sperberg-McQueen 2008-05-03 01:05:03 UTC
The XML Schema WG today approved the wording proposals at 

  http://www.w3.org/XML/Group/2004/06/xmlschema-1/structures.b3251.html
  http://www.w3.org/XML/Group/2004/06/xmlschema-2/datatypes.b3251.html
  (member-only links)

and believes that with that approval, this issue has now been resolved.

Recall that the original issue report raises two related technical and
design questions.  

First, whether introducing precisionDecimal is a good idea or a bad one; 
this is essentially the same issue as bug 3120 (although expressed in 
stronger terms), which was raised by the QT groups
and was resolved to their satisfaction with the plan to take industry-wide
uptake of the new IEEE precision decimal type into account when deciding
whether to progress the spec to Proposed Recommendation with or without
the precisionDecimal type.

Second, whether XSD should stipulate, as the QT specs do, that implementations
MAY support primitive datatypes other than those defined in the XSD spec.
The wording proposals adopted today make that stipulation, and provide a 
checklist of information implementations need to provide.  The Structures
change also specifies an extension to the existing conditional-inclusion
mechanism for schema documents, to allow inclusion-time inquiries about
support for particular datatypes and facets.

Michael, as the originator of the issue, would you please examine the 
wording proposals and indicate whether or not you believe the issue has
been resolved satisfactorily or not?  If the WG does not hear from you
in the next couple of weeks, we will assume the plan concerning precision
decimal satisfies the first part of the bug, and the wording proposals 
resolve the second part, to your satisfaction.