Margaret St. Pierre, Blue Angel Technologies
Searching for information from heterogeneous data sources on the Web is often performed by searching for words in the raw text of documents. Such a practice typically results in a hit-or-miss scenario and the relevancy of the search results is often misleading. Gathering search results from heterogeneous data sources on the Web is also difficult, since results are prepared for display purposes only. This practice makes it virtually impossible to compare and contrast the search results gathered from each data source.
To achieve accuracy and precision in a distributed search, there is a need to establish semantic interoperability among the heterogeneous data sources. This semantic interoperability must be established in both the search criteria and in the retrieved search results (metadata).
Although a universal set of search criteria and metadata could theoretically be developed and applied when searching all data sources, it is unlikely to accommodate the specialized needs of every discipline. Specialized disciplines also need to agree on a standard set of search criteria and retrieved metadata, and assign precise definitions to each. Only then will the level of accuracy and precision of a distributed search be guaranteed.
For example, if a client or user agent were to search for information about a specific historical painting from a set of distributed museum resources, it would include an explicit indication that the search criteria is to be interpreted within the context of the museum discipline. Thus, the search criteria may specify that the usage of a search term be restricted to a specific artist or time period, the context restricted to within a copyright notice, and the authority restricted to a well-known art and architecture thesaurus. The client may also specify that the retrieved metadata also be based on the museum discipline. The requested search results may thus be restricted to retrieving the artist's name, the time period of the painting, the location of the original work, the location of reproductions, and a list of related paintings.
The Z39.50 search and retrieval protocol has been designed for the purposes of achieving semantic interoperability among heterogeneous data sources. Z39.50 has a well-defined and well-developed mechanism for specifying search criteria and for delivering metadata in search results.
Although Z39.50 has predefined a global set of search criteria and metadata, Z39.50 also offers a means for specialized disciplines to define their own set of search criteria and metadata. A number of disciplines have already established agreements, for example:
Z39.50 has a mechanism (the Explain facility) for a client or user agent to discover the discipline(s) that a server supports. Thus, a distributed search performed across heterogeneous data sources with a common universe of discourse produces accurate and precise results.