Knowledge-Guided Resource Location
John C. Mallery
Intelligent Information Infrastructure Project
Artificial Intelligence Laboratory
Massachusetts Institute of Technology
December 10, 1995
The spectacular explosion of the World Wide Web from an experiment at CERN
into a global information infrastructure over the past two years has created a
tremendous need for intelligent systems to connect information seekers with
resources offered by providers. For the Artificial Intelligence community, this
historic development affords an opportunity to deliver intelligent systems
that fulfill real world needs. Smart computer agents should help people
navigate global information Webs, as they harvest knowledge to make themselves
more effective assistants.
This position statement suggests steps required to introduce knowledge
representation into the infrastructure. The objective is to annotate opaque
Web resources with descriptions that are transparently understood by
computers. These intelligent assistants will help people locate and manage
information. Over time, I expect the World Wide Web to gradually shade into
the World Wide Knowledge Web as increasing quantities of assertional knowledge
become associated with Web resources. Here, I focus on knowledge-guided
resource location as an appropriate starting point, and defer many other
exciting possibilities for later.
Problem: Finding Relevant Information
To use the Web effectively, people need to be able to locate information
resources using retrieval methods that meet the following criteria:
- Complete (Recall): All relevant resources are considered.
- Correct (Precision): The task of weeding out extraneous resources
is minimized by not over generating (i.e., not making mistakes)
- Current: The information located needs to be timely, not months or
years out of date.
- Felicitous: Retrieved resources are at the appropriate level of
abstraction or detail and they coherently match people's information seeking
goals or requirements.
- Transparent: Use of retrieval tools must be transparent to
non-experts who will be using these systems.
Current retrieval methods on the Web do not meet all these criteria. They
offer no guarantees of completeness or correctness. They are usually well out
of date because walking the large contemporary Web takes so long. They do not
consider felicity, but typing in a few words is quite transparent for users,
even if the search results returned are unintuitive.
Opportunity: Intelligent Indexation & Retrieval
Ultimately, centralized indices do not scale well. Consequently, a
superior indexation architecture must be distributed around the Web and built
into the infrastructure. Knowledge representation techniques go beyond the
shortcomings of keyword search because they have sufficient expressive power
to represent natural language. Significantly, these techniques preserve much
of the grammatical and referential structure of text. Consequently, they index
knowledge with fine enough granularity to answer questions about the
information, and even make simple inferences about implicit knowledge. Most
importantly, complete and correct retrieval systems can be built with certain
knowledge representation systems based on ternary
relations.
Solution: Knowledge-Based Name Service
- Networked Representations: Servers, proxies, and middleware
infrastructure work together to yield a wide-area knowledge representation,
which can be introduced in two phases:
- Assertion Servers: Web servers operate as usual but offer a new
assertion service that allows providers to assert descriptions about their Web
resources and users (or their agents) to find resources by querying a server's
knowledge base containing these descriptions. Under this model, users must
first find the provider's server, and then, they can query it to locate
resources of interest on the server. This simplest model does not require any
support beyond the individual server because there are no references to
external assertions. This autarchic model allows people to develop and test
knowledge-level systems right away.
- Interleaved Assertion Infrastructure: Once we allow references
across server knowledge bases, the picture becomes substantially more complex.
Knowledge is readily exchanged across servers, and perhaps, brokered by
proxies. The infrastructure can provide indices and caching. Naturally, the
ability to locate information without knowing the server on which it resides
makes the information infrastructure dramatically more powerful. For example,
buyers may formulate a query constraints to find the product most closely
meeting their requirements.
- User Interface Agents: These expert assistants help people
describe their information offerings and navigate global knowledge webs.
- Intelligent Annotation Systems: Providers will need high-level
tools to help them make assertions about their resources. These tools will not
only simplify the task but they will also insulate providers from any changes
in the underlying representation technology. Although menu-driven interfaces
can be used immediately, natural language systems can help quite soon, even as
their understanding remains largely syntactic and semantically shallow.
- Intelligent Query Systems: Similarly, users seeking information
will have even less desire to type in quantified logical formulae than they
now have to compose boolean expressions for conventional keyword search
systems. They will need effective search agents that not only formulate
queries for them, but also build models of the goals, plans and preferences of
users in order to taylor retrieval to that information most appropriate for
their needs.
Action: Steps Towards Knowledge Webs
- HTTP Method:
Create a specification for an HTTP extension method that
allows Web servers to operate as assertion servers, accepting assertions and
responding to queries.
- Light-Weight Knowledge Base:
Develop a reference knowledge base to
backend the HTTP extension method.
- Knowledge-based Name Scheme:
Create a specification for a
Universal Resource Name (URN) scheme that allows users to resolve resource
names based on constraint descriptions. The constraint descriptions are
resolved against assertions about the names made by providers and others.
(See also WWW95
Developers Day Panel)
- Indexation:
Develop an indexation mechanism that allows URN
queries to be resolved quickly using shared indices in the infrastructure.
- Caching:
Develop a caching mechanism that migrates assertional
knowledge towards users, and thus, minimizes latency and bandwidth
consumption.
Issues: Architectural Challenges
- Caching NP-Hard: Minimizing latency and bandwidth consumption in a
general caching mechanism reduces to two VLSI layout problems every time
slice, only one of which is NP-Hard. Heuristic methods will be required.
- Multiple Semantics: Because control is distributed, different
areas of the Web will use different assertional semantics. Some semantics
will be defined by standards bodies whereas other will be defined according to
language or specialized communities of providers and users.
- Reflexive Semantics: Self-descriptive semantics will enable
systems to find out how to interpret assertions in different semantic regions.
- Hermeneutics Problem: Because
knowledge may be encoded differently in various semantics regions, facilities
for translation and interpretation across these regions will be required.
Reduction to a single set of categories will not be practical outside
controlled vocabulary regions. In general, translation across semantics will
need to be constructive and interpretive.
- Inferring Implicit Knowledge: Performing all possible inferences
from given axiom sets explodes space exponentially. Practical systems need
good focus mechanisms so they can recognize obvious inferences tractably at
runtime.
Conclusions
At the M.I.T. Artificial Intelligence Laboratory, the 1994
wide-area collaboration
experiment for the Vice President's Open
Meeting on the National Performance Review used taxonomic structure and
argument connectives in a persistent knowledge representation to connect
people with information and each other. Already, Boris Katz has a user
interface agent that answers English questions about Web resources. Try out
his START System.
In contrast, the RELATUS Natural Language
Environment implements completely self-indexed knowledge base and provides
a constraint-guided reference system. All of these systems use ternary
relations to represent arbitrarily-expressive knowledge found in natural
languages. I have argued elsewhere that the design
principles behind complete self-indexation and efficient constraint-guided
reference can be extended into the infrastructure to support wide-area
knowledge representations. A World Wide Web
Consortium working group is developing proposals along these lines.
Contact me for further information.
References
Roger Hurwitz & John C. Mallery, ``The
Open Meeting: A Web-Based System for Conferencing and Collaboration,'' Proceedings of The Fourth
International Conference on The World-Wide Web, Boston: MIT, December
12, 1995.
John C. Mallery, ``Semantic
Content Analysis: A New Methodology for The RELATUS Natural Language
Environment,'' in Artificial Intelligence and International
Politics, V. Hudson, ed., Boulder: Westview Press, 1991. Postscript.
John C. Mallery, ``Wide-Area Knowledge Representation:
A Foundation for the Noosphere,'' invited presentation in the Distributed
Objects and Procedures session at the ACM
SIGCOMM'95 Workshop on
Middleware, Cambridge, August 28-29, 1995.Postscript.
John C. Mallery, Roger Hurwitz & Gavan Duffy, ``Hermeneutics,''
The Encyclopedia of Artificial Intelligence, New York: John Wiley &
Sons, 1987. Postscript.