Position Paper for DISW'96

Mic Bowman

While the value of structure--i.e. discrimination of (meta)data by field--to improve the precision of query resolution is undeniable, it is rarely used effectively in the centralized Web super-indexes like Altavista and Lycos.

The current Web architecture requires a tradeoff between depth of analysis and breadth of scope. To be used effectively, structural definitions must span the collected resources. Within the scope of a single domain--either topical or administrative--an index can specify and enforce a standard schema for data with rich syntactic and semantic meaning. In the global web, however, autonomy vastly limits the acceptance of any standard. A centralized index is forced to use a "least common representation" for any structure that exists. In practice this means plain text.

We propose a framework for representing shared structure through a hierarchical type system. At the top level, very general document types specify a restricted set of structural elements; e.g. the fields specified in the Dublin core. Restricted domains such as the HPCC Software Reuse Interoperability Groupt (RIG) define domain-specific subtypes of the very general global types. For example, the RIG has a metadata standard called the Basic Interoperability Data Model (BIDM). Subtypes like those in the BIDM inherit the structural definition (i.e. the schema) of their parents and add additional fields. At the lowest levels of the hierarchy, individual users add personal elements to the structured data that is collected.

This approach has several benefits:

It enables a full spectrum of search from deep analysis to broad scope. A search begins with the most general types; i.e. those high in the hierarchy. For refinement the results of the search can be restricted to a particular domain to increase the expressiveness of the query.
It can be implemented and deployed. The short-term requirement for implementation is a syntax for describing a type. Simple extensions to SOIF like those proposed by Netscape for their catalog server, can already achieve this. For a high quality service, tools for type validation and evolution are required.
It can be extended without need for global agreement. Given a standard representation, any organization can define and enforce a collection of types. To share less detailed representations globally, the organization should choose types that are derived from the most general, global types.
It enables cross-domain access through customized translation operations. When necessary, deep analysis of independent type hierarchies is possible through the use of translation functions. This technique is commonly used by the designers of federated databases. The type hierarchy is used for most shared access since the necessary translation functions are expensive to design and implement.

My intent for this workshop is to begin the definition of standards that increase the availability and usefulness of structured data. I believe that a common representation for structured schemas like the hierarchical system we prpose is possible and would be highly advantageous.

This page created by Mic Bowman (mic+@transarc.com).
Last modified: Fri May 17 13:31:11 EDT 1996

This page is part of the DISW 96 workshop.
Last modified: Thu Jun 20 18:20:11 EST 1996.