Submission to W3C/OMG Workshop on Distributed Objects and Mobile Code
June 24-25, 1996

Submitter:  Lee Scheffler
Affiliation: VMark Software, Inc.
Email:  lee@vmark.com

Unified Database Data Interchange Format(s)/Protocol(s)

Database data is the lifeblood of business.  Yet, sending it from one
place to another, transforming it and processing it, displaying it,
and editing and updating it are still unnecessarily tortuous tasks.
Data is data, at least conceptually.  Relations among data elements
are the same, whether represented by value comparisons (relational models),
pointers (navigational models) or proximity (PICK models).  Data when it
is being moved or processed has different requirements from when it is
being stored or searched (e.g., representation, efficiency, contiguity).
Database data differs from traditional IDL-style structured data in that
its structure is dynamic:  you cannot generally know data types or
structure of data until you see it.  When writing applications, a
moderate amount of self-discovery about the data is needed to navigate
its structure.

It ought to be possible to define a database data interchange format
and/or protocol with properties in the following areas:

- Independent of:  application, database brand, platform, language.
- Common universal representations of fundamental database data types:
    character strings, numbers (exact and approximate),
    dates/times/intervals, binary, reference (URL? query
    specification? OID?).
- Suitably efficient for conversion to/from different database brands,
    languages (e.g., object serialization/deserialization).
- Suitable for processing:  accessing and transforming data items,
    performing computations
- Support for versioning, and for blind pass-through of data types not
    understood by an environment (e.g., A to B to C where A and C 
    recognize a data type that B doesn't recognize).
- Perhaps capable of multiple internal data representations and
    presentations (e.g., ASCII-8 vs Unicode).
- Deferred/"thumbnail" retrieval of large data (e.g., images, sound).
- Represent tabular data:  rows and columns.
- Optionally represent common column information:  column name,
    data type/precision/scale/length, nullability, key membership,
    index membership, description, editing templates, value limits,
    live editing behavior (e.g., Java applet to validate data entry).
- Optionally represent overall data and presentation attributes:
    sorting order, indices, character set, base time zone,
    national language conventions, exploded/imploded, summary/detail.
- Can be used to interchange data among data sources, processing elements,
    targets:  programs, databases, files (e.g., enough information to
    synthesize a CREATE TABLE statement in an arbitrary database SQL
    dialect).
- Can represent multi-dimensional data:  tables within tables, shared
    elements/substructure.
- Optionally contain source information:  database identity and query
    specifications to regenerate it, timestamp, transaction ID.
- Support for varying degrees of "live"ness and isolation:  snapshot,
    reexecute query (or other data generation program), live query
    (editing rows within an open transaction), etc.
- Support for replication, identification of replicas and reconciliation.
- Support incremental streaming of data (e.g., can start database fetch
    and send rowchunks of data to receiver, where it can be processed
    incrementally).
- Support common data file formats as a subset (e.g., comma-separated-value,
    tab-separated, etc.).
- Composable:  data tables and other data types (e.g., images) within
    tables, etc.
- Can easily be incorporated as variable data in HTML documents (like
    images today).  (Anybody who has tried knows how absurdly difficult
    it is today to incorporate a few bits of variable data in an HTML
    form today!  Much less a table of data that you want to be user
    editable!)  It should be possible for a document to contain a tag
    that obtains the data either from a file or generated live.
    Support for passing dynamic parameter values to such a tag (e.g.,
    to run a parameterized query based on the values of variables/fields.)
- Optionally incorporate presentation advice (e.g., column widths and
    styles, HTML attributes, scrolling regions like column names).
- Support multiple data entry styles (e.g., menu, text, menu+text,
    checkbox, radio).
- Support flushing of modified data back to database (e.g.,
    identification of changed data and sufficient attribute information
    to enable generation of SQL INSERT or UPDATE statements).