This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 3232 - Type versus Datatype
Summary: Type versus Datatype
Status: RESOLVED FIXED
Alias: None
Product: XML Schema
Classification: Unclassified
Component: Datatypes: XSD Part 2 (show other bugs)
Version: 1.1 only
Hardware: PC Windows XP
: P4 normal
Target Milestone: ---
Assignee: C. M. Sperberg-McQueen
QA Contact: XML Schema comments list
URL:
Whiteboard: cluster: terminology
Keywords: resolved
Depends on:
Blocks:
 
Reported: 2006-05-09 10:06 UTC by Michael Kay
Modified: 2008-05-16 17:17 UTC (History)
0 users

See Also:


Attachments

Description Michael Kay 2006-05-09 10:06:15 UTC
QT approved comment:

In 2.6.1.2, the second paragraph is an example of a residual use of the
word "type" rather than "datatype". This also occurs in constructs such as
"base type" and "item type". It's not clear whether the spec is trying to
make a distinction between the two words. "Union type" and "union datatype"
are used apparently interchangeably.
Comment 1 Dave Peterson 2006-05-09 18:20:39 UTC
(In reply to comment #0)

> In 2.6.1.2, the second paragraph is an example of a residual use of the
> word "type" rather than "datatype". This also occurs in constructs such as
> "base type" and "item type". It's not clear whether the spec is trying to
> make a distinction between the two words. "Union type" and "union datatype"
> are used apparently interchangeably.

We have tried in 1.1 to remove the unmodified word 'type' from running text,
but not from the names of schema component properties (which would be
a gratuitous change to the component structure, a thing roundly hated by
many implementers who use it to define their APIs and UIs).  Probably some
in text have been missed.

Comment 2 Michael Kay 2008-02-04 23:46:50 UTC
I might mention that I've been having the same problem with the words "type" and "data type" (and "datatype") in my XSLT book; the copy editors have been going crazy trying to get the usage consistent, and in the end I've given up and decided that "data" adds nothing to the sense, so it's been removed everywhere. This seems to work perfectly well once you get used to it. Types partition into complex and simple, simple types partition into union, list, and atomic: there's no room in this hierarchy for another adjective "data". (What do you call a type that isn't a data type?)

The most prominent usage of "Datatypes" is of course in the title of part 2, which should probably be "Simple Types".
Comment 3 Dave Peterson 2008-02-05 03:23:44 UTC
(In reply to comment #2)
> I might mention that I've been having the same problem with the words "type"
> and "data type" (and "datatype") in my XSLT book; the copy editors have been
> going crazy trying to get the usage consistent, and in the end I've given up
> and decided that "data" adds nothing to the sense, so it's been removed
> everywhere. This seems to work perfectly well once you get used to it. Types
> partition into complex and simple, simple types partition into union, list, and
> atomic: there's no room in this hierarchy for another adjective "data". (What
> do you call a type that isn't a data type?)

Well, SGML had element types defined by element type defintions, and they were specific subsets of the class of elements.  I generally found that in OO terminology, "types" were subsets of a class that were effectively defined by a specified mechanism applied to certain objects, which were equated with the subclass they defined.  An object *has* properties and a class restricts properties; if several objects use their property values to further restrict the instances of a class according to the same rule, they are
"types" of that class.  E.g., the class is element; the objects have certain properties; element type definitions specify values for the properties; the resulting objects are element *types* whose property values select out certain subclasses of element according to a uniform rule.

As I see it, simple and compound types are intermediate between element and the element types of XML (defined by the unfortunately renamed "element definition", really still an element type definition).  As such, they define subclasses of element (i.e., the class of which elements and only elements are instances).

Accordingly, we still have element types, we have simple types, we have complex types; they all are subclasses of element.  The things we define in Part 2 and call datatypes are different animals.  And we have four different things we call "types"; I think we should try to consistently say which kind we're talking about.

We did feel that if we always called them datatypes in text, we didn't have to rename the names of the properties, which could cause possibly significant reprogramming for those systems that choose to use our abstractions to define their user interface or API.

> The most prominent usage of "Datatypes" is of course in the title of part 2,
> which should probably be "Simple Types".

I disagree;  a simple type is a subclass of element.  Our datatypes are not classes of elements.  They are what mathematicians and logicians sometimes call "mathematical systems" and computer scientists often call "datatypes".

Comment 4 Michael Kay 2008-02-05 08:17:28 UTC
>As I see it, simple and compound types are intermediate between element and the
element types of XML (defined by the unfortunately renamed "element
definition", really still an element type definition).  As such, they define
subclasses of element (i.e., the class of which elements and only elements are
instances).

Well, I don't see it that way at all. When I define a simpleType by restriction from xs:integer, I'm not defining a subclass of elements. The simpleType might never be used as the type of an element, or for that matter an attribute. It might only be used as the type of an XQuery function parameter, for example.

As far as I can see, the words "datatype" and "simple type" are pure synonyms. If you don't think so, can you point to some objects that belong to one category and not to the other?
Comment 5 Noah Mendelsohn 2008-02-05 19:17:20 UTC
Michael Kay writes:

> As far as I can see, the words "datatype"
> and "simple type" are pure synonyms.

I always found the distinction confusing, but I vaguely recall that there is one.  Or, stated differently, I vaguely recall that when this question came up a few years ago that a resolution was proposed.  I >think< it was:

Simple Type: The combination of a lexical space, a value space, definitions of certain relations like equality, and maybe a few other things I'm forgetting.

Datatype: A component in the schema component graph, typically used to specify the definition of a Simple Type through use of facets.

That said, I was never terribly optimistic that users would like or remember this distinction.  I'm not particularly an advocate for this terminology, but neither did I object strongly nor have sufficiently better alternatives to propose.  Anyway, that's my somewhat hazy recollection of what the difference was supposed to be.   Can anyone verify that I do or don't have this right?  Thanks.

Noah
Comment 6 Dave Peterson 2008-02-05 19:52:14 UTC
(In reply to comment #4)

>Well, I don't see it that way at all. When I define a simpleType by restriction
>from xs:integer, I'm not defining a subclass of elements. The simpleType might
>never be used as the type of an element, or for that matter an attribute. It
>might only be used as the type of an XQuery function parameter, for example.

I'd say you're not defining simple types by restriction, you're defining datatypes by restriction.  I don't like conflating the two terms--that does get confusing.  I'd like to keep "simple type" and "complex type" as parallel things; I don't see an analog for "complex type" as a mathematical structure/system.

(In reply to comment #5)
> Michael Kay writes:
> 
> > As far as I can see, the words "datatype"
> > and "simple type" are pure synonyms.
> 
> I always found the distinction confusing, but I vaguely recall that there is
> one.  Or, stated differently, I vaguely recall that when this question came up
> a few years ago that a resolution was proposed.  I >think< it was:
> 
> Simple Type: The combination of a lexical space, a value space, definitions of
> certain relations like equality, and maybe a few other things I'm forgetting.

Backwards.  That's a datatype.

> Datatype: A component in the schema component graph, typically used to specify
> the definition of a Simple Type through use of facets.

And that's a simple type *definition*.

> That said, I was never terribly optimistic that users would like or remember
> this distinction.  I'm not particularly an advocate for this terminology, but
> neither did I object strongly nor have sufficiently better alternatives to
> propose.  Anyway, that's my somewhat hazy recollection of what the difference
> was supposed to be.   Can anyone verify that I do or don't have this right? 

See above.  As far as I know, we do not anywhere define "simple type" (as opposed to "simple type definition") or "complex type" (as opposed to "complex type definition").  I've tried to use "simple type" and "complex type" as analogs of SGML's "element type" (since the STD/CTDs are analogs of SGML's element type definitions--which are misnamed "element definitions" in XML), and in accordance with what I take to be at least one version of common usage in the OO community.

AFAIK, "datatype" used as I've used it in Part 2 is rather standard CS terminology for things like that. 
Comment 7 Michael Kay 2008-02-05 20:29:49 UTC
OK, I've got it now. A simple type definition doesn't define a simple type, it defines a datatype. Silly me. And silly any reader who claims that we make things unnecessarily complicated.
Comment 8 Dave Peterson 2008-02-05 20:49:47 UTC
(In reply to comment #7)
> OK, I've got it now. A simple type definition doesn't define a simple type, it
> defines a datatype. Silly me. And silly any reader who claims that we make
> things unnecessarily complicated.

What do you think a complex type definition defines?  Do you know what a complex type is?  How does that match with what you believe a simple type is?

A simple type definition defines a particular subclass of the element class and attribute class (whose instances are elements and attributes respectively), which is what I've been calling a "simple type", *as well as* a datatype.  Not true that it doesn't define a simple type:  it defines both.

Comment 9 Michael Kay 2008-02-05 21:06:50 UTC
I believe that a complex type definition defines a complex type. I think I know what a complex type is: it is a set of rules that can be used to constrain the contents of XML elements. 

Similarly I believe that a simple type definition defines a simple type; and a simple type is a set of rules that can be used, inter alia, to constrain the contents of XML elements and attributes. 

I know what datatypes are as a generic computer science term, but I don't know what they are in XML Schema, other than another name for simple types. Your last comment appears to agree with that.

I don't think we have to resort to concepts like "element class" - we have quite enough concepts already without inventing more.
Comment 10 Dave Peterson 2008-02-05 21:51:31 UTC
(In reply to comment #9)
> I believe that a complex type definition defines a complex type. I think I know
> what a complex type is: it is a set of rules that can be used to constrain the
> contents of XML elements. 
> 
> Similarly I believe that a simple type definition defines a simple type; and a
> simple type is a set of rules that can be used, inter alia, to constrain the
> contents of XML elements and attributes. 

Good.  We agree.  Now what then does a simple type have to do with mathematical structures?

> I know what datatypes are as a generic computer science term, but I don't know
> what they are in XML Schema, other than another name for simple types. Your
> last comment appears to agree with that.

In my last comment, I said "it defines both".  If they were both the same thing, I wouldn't say that.  I guess appearances are in the eye of the beholder.

What can I say to get it across that in XML Schema, datatypes *are* those things that you see going by that name in computer science?  And they are *not* used simply "to constrain the contents of XML elements and attributes", except as the XML Schema rules for validity using simple type definitions call upon the corresponding datatype for some calculations.

> I don't think we have to resort to concepts like "element class" - we have
> quite enough concepts already without inventing more.

Fear not; I wouldn't think of introducing it into the spec.  I'm just using it here to show where my interpretations of "simple type" (and "complex type") come from, and why I see them as different animals from "datatype".  We've not defined either of these and we've tried to root out any occurrences of them (without the appended "definition") in the spec itself.  For better or for worse; maybe we should have defined them just so people wouldn't wonder why we call the things that we do "datatypes"--but I'm aware of no plan to do so.
Comment 11 C. M. Sperberg-McQueen 2008-05-06 16:45:48 UTC
A draft wording proposal intended to resolve this issue is on the
W3C server at 

  http://www.w3.org/XML/Group/2004/06/xmlschema-2/datatypes.b3232.html
  (member-only link)

Comment 12 C. M. Sperberg-McQueen 2008-05-16 17:17:41 UTC
On this morning's WG telcon, the XML Schema WG adopted the proposal
mentioned in comment #11.  So I am marking the issue resolved.

Michael Kay, as the originator of the issue, would you report back to QT
on this resolution and let us know whether they accept this resolution of
the issue?  If they agree, please so indicate by changing the record's
status to CLOSED; if they disagree, REOPEN it.  If we don't hear from you
in a reasonable amound of time (say, two weeks), we will assume that silence
implies consent.