[Other Papers] [Briefing Package]
Title: Query Processing for Information Distribution Speaker: Jim Miller, W3C Abstract: We examine the problem of selective information distribution to a large user community. As information is created it is compared to interest profiles and distributed to those persons who have indicated a desire to receive it. Unlike traditional information retrieval systems we cannot take advantage of off-line analysis (indexing) of the data to be queried. Unlike transaction processing systems, the data is primarily textual. We can and do, however, preprocess the users' queries. Part of our technique, a form of common subexpression elimination, generalizes to arbitrary full boolean query languages. Another part relies on a special property of the primitive queries of our language, shared by many (but not all) other query languages, that allows our primitive queries to be efficiently indexed and retrieved based on the information being processed. We test our technique on news stories from the Associated Press using a full boolean query language with primitive queries for arbitrary words, either in a user-specified field or appearing anywhere in the article. Our technique, on a single modern workstation, allows over 35,000 average length news articles to be parsed, analyzed, and distributed in an 8-hour work day to a user community of 100,000 (133,000 queries per second). We contrast this to a performance of under 30 articles per day (100 queries per second) for a simple query interpreter.