Submission to W3C/OMG Workshop on Distributed Objects and Mobile Code June 24-25, 1996 Submitter: Lee Scheffler Affiliation: VMark Software, Inc. Email: lee@vmark.com Unified Database Data Interchange Format(s)/Protocol(s) Database data is the lifeblood of business. Yet, sending it from one place to another, transforming it and processing it, displaying it, and editing and updating it are still unnecessarily tortuous tasks. Data is data, at least conceptually. Relations among data elements are the same, whether represented by value comparisons (relational models), pointers (navigational models) or proximity (PICK models). Data when it is being moved or processed has different requirements from when it is being stored or searched (e.g., representation, efficiency, contiguity). Database data differs from traditional IDL-style structured data in that its structure is dynamic: you cannot generally know data types or structure of data until you see it. When writing applications, a moderate amount of self-discovery about the data is needed to navigate its structure. It ought to be possible to define a database data interchange format and/or protocol with properties in the following areas: - Independent of: application, database brand, platform, language. - Common universal representations of fundamental database data types: character strings, numbers (exact and approximate), dates/times/intervals, binary, reference (URL? query specification? OID?). - Suitably efficient for conversion to/from different database brands, languages (e.g., object serialization/deserialization). - Suitable for processing: accessing and transforming data items, performing computations - Support for versioning, and for blind pass-through of data types not understood by an environment (e.g., A to B to C where A and C recognize a data type that B doesn't recognize). - Perhaps capable of multiple internal data representations and presentations (e.g., ASCII-8 vs Unicode). - Deferred/"thumbnail" retrieval of large data (e.g., images, sound). - Represent tabular data: rows and columns. - Optionally represent common column information: column name, data type/precision/scale/length, nullability, key membership, index membership, description, editing templates, value limits, live editing behavior (e.g., Java applet to validate data entry). - Optionally represent overall data and presentation attributes: sorting order, indices, character set, base time zone, national language conventions, exploded/imploded, summary/detail. - Can be used to interchange data among data sources, processing elements, targets: programs, databases, files (e.g., enough information to synthesize a CREATE TABLE statement in an arbitrary database SQL dialect). - Can represent multi-dimensional data: tables within tables, shared elements/substructure. - Optionally contain source information: database identity and query specifications to regenerate it, timestamp, transaction ID. - Support for varying degrees of "live"ness and isolation: snapshot, reexecute query (or other data generation program), live query (editing rows within an open transaction), etc. - Support for replication, identification of replicas and reconciliation. - Support incremental streaming of data (e.g., can start database fetch and send rowchunks of data to receiver, where it can be processed incrementally). - Support common data file formats as a subset (e.g., comma-separated-value, tab-separated, etc.). - Composable: data tables and other data types (e.g., images) within tables, etc. - Can easily be incorporated as variable data in HTML documents (like images today). (Anybody who has tried knows how absurdly difficult it is today to incorporate a few bits of variable data in an HTML form today! Much less a table of data that you want to be user editable!) It should be possible for a document to contain a tag that obtains the data either from a file or generated live. Support for passing dynamic parameter values to such a tag (e.g., to run a parameterized query based on the values of variables/fields.) - Optionally incorporate presentation advice (e.g., column widths and styles, HTML attributes, scrolling regions like column names). - Support multiple data entry styles (e.g., menu, text, menu+text, checkbox, radio). - Support flushing of modified data back to database (e.g., identification of changed data and sufficient attribute information to enable generation of SQL INSERT or UPDATE statements).