Blue Angel Technologies, Inc., has developed commercial software (tool-kits and applications) for use in searching distributed heterogeneous information sources on the Web. This position paper reflects a set of our user's requirements and experiences with this technology.
Background. Users arrive from different information communities, such as the library, museum, government, geographic, scientific, earth observation, travel, health care, and real estate communities. The information within each community is typically maintained by different organizations that are geographically distributed. For example, within and across governments, information is maintained at multiple levels of jurisdiction (e.g., federal, state, county), where each organization maintains its local information source. Many information communities have standardized on a set of data elements (metadata) they interchange, where the standardized set is specific to the information community. For instance, the real estate market has standardized on the multiple listing service. ISO 15046-15 is the international standard defining the data elements for geographic information and MARC is the standard for interchanging bibliographic information.
Requirements. The most basic search requirement requested is the ability to search distributed information sources using a single query. For example, it should be possible to construct a query to locate residential properties constructed within the past 10 years, and located within 50 miles of the Green Acres School District, sorted in order of decreasing lot size. This query should be automatically targeted to the most geographically appropriate information sources within the information community, and a unified set of search results should be presented to the user, regardless of whether the information originated from one or several information sources. In addition, duplicate entries should be removed from the consolidated search results.
Each community also expects some nominal amount of search interoperability with other communities. For example, users expect the ability to construct a single query to search the title field in collections of government publications, museum archives, and bibliographic data sources. Users implicitly expect an automated mapping of the title search to the semantically equivalent element in each data source.
Additional Considerations. Although the development of a Web-based query language is important, there are a number of additional capabilities that must also be considered. For example, once a query is submitted, there needs to be some means of obtaining information about the search results. There needs to be a standard means of determining the number of search results that matched the user's query, and a means for the user to request arbitrary sets of documents from the search results. In addition, there needs to be a way to request a specific selection of elements from a structured document. For example, a typical scenario is to request a "brief" set of elements, such as title, abstract, author, and date of last modification, from the first 10 documents that matched the user's query.
Other capabilities include a rich set of diagnostics in the event of error and failure. For example, there needs to be a standard way to indicate that an information source does not exist or is temporarily unavailable, or that the search failed because one of the search fields in the query is not supported. Additional capabilities to consider include scanning the words or fields in an information source, sorting search results according to a specific set of sort criteria, and determining the information sources available at a particular site.
Experiences and recommendations. Our software currently uses the ISO 23950 (ANSI/NISO Z39.50) search and retrieval protocol standard. In practice, we have demonstrated interoperability of our software with a number of compliant systems. Due to the inherent controversial nature of query language standardization, the rising market pressures for interoperable search technology in the short term, and the large installed base of existing ISO 23950 systems, we would recommend leveraging the many capabilities specified in the ISO 23950 standard.
For more information, contact Margaret St. Pierre.