Knowledge-Guided Resource Location

John C. Mallery
Intelligent Information Infrastructure Project
Artificial Intelligence Laboratory
Massachusetts Institute of Technology

December 10, 1995

The spectacular explosion of the World Wide Web from an experiment at CERN into a global information infrastructure over the past two years has created a tremendous need for intelligent systems to connect information seekers with resources offered by providers. For the Artificial Intelligence community, this historic development affords an opportunity to deliver intelligent systems that fulfill real world needs. Smart computer agents should help people navigate global information Webs, as they harvest knowledge to make themselves more effective assistants.

This position statement suggests steps required to introduce knowledge representation into the infrastructure. The objective is to annotate opaque Web resources with descriptions that are transparently understood by computers. These intelligent assistants will help people locate and manage information. Over time, I expect the World Wide Web to gradually shade into the World Wide Knowledge Web as increasing quantities of assertional knowledge become associated with Web resources. Here, I focus on knowledge-guided resource location as an appropriate starting point, and defer many other exciting possibilities for later.

Problem: Finding Relevant Information

To use the Web effectively, people need to be able to locate information resources using retrieval methods that meet the following criteria:

Complete (Recall): All relevant resources are considered.
Correct (Precision): The task of weeding out extraneous resources is minimized by not over generating (i.e., not making mistakes)
Current: The information located needs to be timely, not months or years out of date.
Felicitous: Retrieved resources are at the appropriate level of abstraction or detail and they coherently match people's information seeking goals or requirements.
Transparent: Use of retrieval tools must be transparent to non-experts who will be using these systems.

Current retrieval methods on the Web do not meet all these criteria. They offer no guarantees of completeness or correctness. They are usually well out of date because walking the large contemporary Web takes so long. They do not consider felicity, but typing in a few words is quite transparent for users, even if the search results returned are unintuitive.

Opportunity: Intelligent Indexation & Retrieval

Ultimately, centralized indices do not scale well. Consequently, a superior indexation architecture must be distributed around the Web and built into the infrastructure. Knowledge representation techniques go beyond the shortcomings of keyword search because they have sufficient expressive power to represent natural language. Significantly, these techniques preserve much of the grammatical and referential structure of text. Consequently, they index knowledge with fine enough granularity to answer questions about the information, and even make simple inferences about implicit knowledge. Most importantly, complete and correct retrieval systems can be built with certain knowledge representation systems based on ternary relations.

Solution: Knowledge-Based Name Service

Networked Representations: Servers, proxies, and middleware infrastructure work together to yield a wide-area knowledge representation, which can be introduced in two phases:
- Assertion Servers: Web servers operate as usual but offer a new assertion service that allows providers to assert descriptions about their Web resources and users (or their agents) to find resources by querying a server's knowledge base containing these descriptions. Under this model, users must first find the provider's server, and then, they can query it to locate resources of interest on the server. This simplest model does not require any support beyond the individual server because there are no references to external assertions. This autarchic model allows people to develop and test knowledge-level systems right away.
- Interleaved Assertion Infrastructure: Once we allow references across server knowledge bases, the picture becomes substantially more complex. Knowledge is readily exchanged across servers, and perhaps, brokered by proxies. The infrastructure can provide indices and caching. Naturally, the ability to locate information without knowing the server on which it resides makes the information infrastructure dramatically more powerful. For example, buyers may formulate a query constraints to find the product most closely meeting their requirements.
User Interface Agents: These expert assistants help people describe their information offerings and navigate global knowledge webs.
- Intelligent Annotation Systems: Providers will need high-level tools to help them make assertions about their resources. These tools will not only simplify the task but they will also insulate providers from any changes in the underlying representation technology. Although menu-driven interfaces can be used immediately, natural language systems can help quite soon, even as their understanding remains largely syntactic and semantically shallow.
- Intelligent Query Systems: Similarly, users seeking information will have even less desire to type in quantified logical formulae than they now have to compose boolean expressions for conventional keyword search systems. They will need effective search agents that not only formulate queries for them, but also build models of the goals, plans and preferences of users in order to taylor retrieval to that information most appropriate for their needs.

Action: Steps Towards Knowledge Webs

HTTP Method: Create a specification for an HTTP extension method that allows Web servers to operate as assertion servers, accepting assertions and responding to queries.
Light-Weight Knowledge Base: Develop a reference knowledge base to backend the HTTP extension method.
Knowledge-based Name Scheme: Create a specification for a Universal Resource Name (URN) scheme that allows users to resolve resource names based on constraint descriptions. The constraint descriptions are resolved against assertions about the names made by providers and others. (See also WWW95 Developers Day Panel)
Indexation: Develop an indexation mechanism that allows URN queries to be resolved quickly using shared indices in the infrastructure.
Caching: Develop a caching mechanism that migrates assertional knowledge towards users, and thus, minimizes latency and bandwidth consumption.

Issues: Architectural Challenges

Caching NP-Hard: Minimizing latency and bandwidth consumption in a general caching mechanism reduces to two VLSI layout problems every time slice, only one of which is NP-Hard. Heuristic methods will be required.
Multiple Semantics: Because control is distributed, different areas of the Web will use different assertional semantics. Some semantics will be defined by standards bodies whereas other will be defined according to language or specialized communities of providers and users.
Reflexive Semantics: Self-descriptive semantics will enable systems to find out how to interpret assertions in different semantic regions.
Hermeneutics Problem: Because knowledge may be encoded differently in various semantics regions, facilities for translation and interpretation across these regions will be required. Reduction to a single set of categories will not be practical outside controlled vocabulary regions. In general, translation across semantics will need to be constructive and interpretive.
Inferring Implicit Knowledge: Performing all possible inferences from given axiom sets explodes space exponentially. Practical systems need good focus mechanisms so they can recognize obvious inferences tractably at runtime.

Conclusions

At the M.I.T. Artificial Intelligence Laboratory, the 1994 wide-area collaboration experiment for the Vice President's Open Meeting on the National Performance Review used taxonomic structure and argument connectives in a persistent knowledge representation to connect people with information and each other. Already, Boris Katz has a user interface agent that answers English questions about Web resources. Try out his START System. In contrast, the RELATUS Natural Language Environment implements completely self-indexed knowledge base and provides a constraint-guided reference system. All of these systems use ternary relations to represent arbitrarily-expressive knowledge found in natural languages. I have argued elsewhere that the design principles behind complete self-indexation and efficient constraint-guided reference can be extended into the infrastructure to support wide-area knowledge representations. A World Wide Web Consortium working group is developing proposals along these lines. Contact me for further information.

References

Roger Hurwitz & John C. Mallery, ``The Open Meeting: A Web-Based System for Conferencing and Collaboration,'' Proceedings of The Fourth International Conference on The World-Wide Web, Boston: MIT, December 12, 1995.

John C. Mallery, ``Semantic Content Analysis: A New Methodology for The RELATUS Natural Language Environment,'' in Artificial Intelligence and International Politics, V. Hudson, ed., Boulder: Westview Press, 1991. Postscript.

John C. Mallery, ``Wide-Area Knowledge Representation: A Foundation for the Noosphere,'' invited presentation in the Distributed Objects and Procedures session at the ACM SIGCOMM'95 Workshop on Middleware, Cambridge, August 28-29, 1995.Postscript.

John C. Mallery, Roger Hurwitz & Gavan Duffy, ``Hermeneutics,'' The Encyclopedia of Artificial Intelligence, New York: John Wiley & Sons, 1987. Postscript.