Querying Business Documents

Position Paper for W3C Query Language Workshop, QL’98

Gail Mitchell

GTE Laboratories Incorporated

Waltham, MA 02454

gmitchell@gte.com

Introduction

With increasing familiarity and use of the web at all levels of an organization, an intranet with web technology is becoming the mode of choice in business for delivering and presenting information. For example, the web is increasingly being used in business as an infrastructure for report distribution. In this context, a report is a document representing the extraction, manipulation and explanation of some business data. For example, the report used for illustration in this paper is the bill for a residential telephone customer.

A report might be delivered to a reader as a pointer in an email message, or might be selected by a reader from a list displayed in a web page (perhaps as pull-down menus), by filling out a form presented by a desktop application, by initiating a search, etc. In any case, web delivery of a report means that the content of the report may be assembled according to a predefined schedule (and stored in a web page), assembled at the time of request (and presented as a web document), or assembled in some combination of these time frames. XML and related technologies need capabilities for producing and presenting such dynamic report documents, and must also be able to model and query these documents.

This note explores some of the assembly and presentation characteristics of a business report that need to be addressed by web technologies. Although we do not propose any solutions at this point, we suggest some sample queries and issues to be considered when designing a query model and language for the web.

Example Business Report

As a simple example of a report, consider a bill for a residential telephone customer. This report will have a variable number of sections and pages, depending on the number of calls made, special services billed, variety of long distance carriers, etc. The content of a section will depend in part on the style determined by the application producing the report, and partly on the data that needs to be displayed to the reader. For example, the style determined for presenting ‘Calling Services’ may include subsections for ‘Directly Dialed’ and ‘Operator Assisted’ calls, but a section will not appear in the report unless there are applicable calls to report. 

Much of the report consists of boilerplate, i.e., section headings (e.g., ‘Account Summary’, ‘Itemized Calls’), data descriptors (e.g., ‘Total Amount Due’, ‘Previous Balance’), column headings (e.g., ‘date’, ‘time’, ‘place called’, etc.), and other general information (e.g., the phone company logo, the title ‘Important Consumer Information’ and the succeeding explanatory text). The inclusion of some of this text may be standard for all reports (e.g., all have ‘Basic Local Services’ and ‘Total’s) or may depend on the data being reported (e.g., the time frame of the report, subscribed calling services, long distance carrier(s) used, whether the account is overdue). The data being reported will be extracted from one or more databases (or other sources) of information about customers, accounts, usage, line characteristics, long distance carriers, etc.

A basic issue that needs to be addressed by web technology is how to model such a document for presentation on the web. Most importantly, the model must be able to capture the mix of standard text and database data.

Querying the Report

When a business document is presented on the web, a reader will (hopefully) not be aware of how the web document was constructed or from where the displayed data was obtained. Indeed, a reader will tend to view the contents of the document as data that ‘sits’ on the web and can be searched or queried. For example,

These are only a few of the queries that could be asked of web documents, but they illustrate some of the situations that can arise in querying business reports. A query language must support the ability to pose such queries on business reports, both individually and collectively, regardless of how the document is constructed or stored (e.g., whether it is stored as a web page or compiled from data sources at the time of the query).

In addition, an application designer may want to take advantage of the web presentation to provide special query capabilities to the reader. For example, most phone bills have one or more places where 1-800 numbers are given for the customer to obtain additional information or ask questions. When the bill is presented on paper, these numbers are printed as boilerplate text in the relevant locations. However, when a bill is presented on the web, some of the information numbers could be replaced with links to pages providing the requested information. For example, payment information service could be obtained by displaying a form for requesting the information and constructing a query from the form data; selecting the ‘Carrier Line Charge’ report heading could display a page explaining the meaning of that term; a ‘Questions?’ button on a billing page for a long distance carrier could return a specific number for the customer to call based on the customer location, billing status, average charges, etc. (Perhaps it would go to the home page of the carrier’s customer service site.)

In these cases the sources for answering the query are not necessarily the web documents perceived by the reader, but are additional data sources accessed in preparing those documents. Access to these sources is embedded in the web document (perhaps as a query) and the data is manifested when the reader requests the information. This mode of data access is reminiscent of database views.

Summary

A query model and language for web documents must 1) be able to contribute to the production as well as the presentation of business reports, and 2) provide the ability to access and search these reports as web documents. Some of the issues that should be researched in designing such a model and language include:

Acknowledgements

The author would like to thank Sandy Heiler and Wang-Chien Lee (GTE Laboratories) and Stan Zdonik (Brown University) for discussions leading to the submission of this position paper.